Research shows that crossmodal semantic congruency plays a role in the orienting of spatial attention and visual search. However, the extent to which crossmodal semantic relationships summon attention automatically or necessitate of top-down modulation is still not entirely clear. To date, researchers have used varied methodologies and their outcomes have been inconsistent. Variations in the task-relevance of the crossmodal stimuli (from explicitly needed, to entirely task-irrelevant), the amount of perceptual load, and response modality, may account for the mixed results of previous experiments. In the present study, we address the effects of audiovisual semantic congruence on spatial attention across variations in task relevance and perceptual load. Participants had to search for visual target images amongst distractor images of common objects, paired with sounds that were characteristic of those objects (e.g., guitar image and chord sound). Under conditions of relatively low perceptual load, crossmodal semantic congruence was found to speed up visual search times regardless of the task relevance of crossmodal congruence. However, when the perceptual load is higher, audiovisual semantic congruence expedited visual search latencies only when the audiovisual object was task-relevant. These results support the conclusion that semantic-based crossmodal congruence does not attract attention fully automatically, and draws from top-down processes.