The answer may seem almost too obvious. Evidently it is – we watch a film. But on the other hand, it is what the filmmaker wants it to be. Within certain limits: neither a soundtrack nor a slide show is likely to qualify as ‘film’ or ‘motion picture’. Sound seems nonessential; a film may contain only intermittent sounds, or none at all.
Anyhow, the definition of film as a visual art seems dissatisfactory. Based on past experiences we expect it to be an audiovisual whole. Within this whole, sound, including music, is traditionally taken to play a subordinate role. A number of contemporary film theorists disagree – some speak of a ‘visual bias’ (Kalinak 1992: 20; Buhler 2018: 11).
Michel Chion (Audio-Vision, 2019: 4) thinks film’s status as visual art is an “illusion”. It is called forth by the impression that sound “comes from what is seen”, and (less obviously) that all that it may express and signify “is already contained in the image itself”. (By calling this “added value” he unintendedly seems to confirm the subordinate function of sound.)
This illusion, or mode of perception, is encouraged by speech habits. We often speak of sound as if it were a property or trace of an object. Did you see it? – No, but I heard it. ‘It’ can hardly be the same thing: we see a deer; what we hear – breaking twigs maybe, or bellowing – allows us to infer its presence, indirectly.
Even if the visual aspect is a sine qua non for film, it need not always be the main focus of interest. An essential or defining feature need not be dominant. Art forms are usually defined by their sensory medium (music as sonic art; sculpture as three-dimensional). But other factors and forces may be the motivation, the raison d’être of the image (Buhler 2018: 9). Typically, a film shows us a story, a coherent series of events, largely realized in dialogue, a text or verbal exchange that involves both audible and visible interaction.
In that sense, film need not be a visual art.
Film as a visual accompaniment to sound (particularly, music) may be untypical, but not impossible. Abstract movies from the 1920’s (such as Hans Richter’s Rhythmus series) that have been synchronized with music are likely to be experienced that way, even though the image was created first. It is different when the film shows concrete objects or agents, though there are techniques for foregrounding the music. The de-humanization of the actors through fast montage and repetitive, quasi-mechanical movement in countless music videos is a case in point.
The “Archaic” Ear
If hearing is often presented as the weaker, poorer of the two senses, it is because they are compared as if they were two languages that may convey the same meaning. As if visual perception could be translatable into sound, vice versa (Can you see the deer? – No, but I can hear it). In this way sound inevitably emerges as a language with a handicap: it doesn’t show us the objects. Sounds are notoriously ambiguous indicators of their source (and therefore a favourite object for guessing games).
But vision and sound are not at all like languages, and they do not compete for the same terrain. They don’t compete at all.
Visual perception is a process of light reaching our eyes as reflection of objects. Our animal instinct makes us believe that what we perceive are these objects directly, as ‘real’ and mostly permanent things.
Sound is the product of processes or actions, collisions, friction, causing a chain of vibrations, through cycles of growth and decay. It involves ‘sonic events’ rather than ‘sonic objects’. The case of music is much more complex. In music, we recognize sound qualities but also entities of a higher order, such as melodies and phrases; objects constructed in the imagination that may recur in a complex web of relations.
The differences between those sensory aspects of the world are amplified by the technology we use to capture and reproduce them. According to Adorno and Eisler in their classic Music for the Films (1947) …
Ordinary listening, as compared to seeing, is ‘archaic’; it has not kept pace with technological progress. One might say that to react with the ear, which is fundamentally a passive organ in contrast to the swift, actively selective eye, is in a sense not in keeping with the present advanced industrial age and its cultural anthropology. (Adorno and Eisler 2005: 20)
Hören ist, verglichen mit dem Sehen, »archaisch«, mit der Technik nicht mitgekommen. Man könnte sagen, daß wesentlich mit dem selbstvergessenen Ohr, anstatt mit den flinken, abschätzenden Augen zu reagieren, in gewisser Weise dem spätindustriellen Zeitalter und seiner Anthropologie widerspricht. (Adorno and Eisler 2006: 20)
Seeing, these authors seem to argue, is more watching than hearing is listening: hearing is inattentive, unselective, passive, dreamy, lazy. Their portrayal of eye and ear as polar opposites (active, fast, rational, ‘masculine’, against passive, slow, irrational, ‘feminine’) echoes a philosophical tradition going back to pre-Socratic times at least (Kalinak 1992: 22).
The consequence of the ears’ ‘archaic’ nature would seem to be that people are more easily manipulated through sound than through images. That seems highly contestable to me.
The philosophy of perception still suffers from a lack of scientific data. That hearing is relatively slow has some plausibility. Light travels faster than sound; sound needs time to take shape. (Chion 2019: 10, exceptionally, takes the ear to be faster.) Speed difference most clearly matters when we’re dealing not with ordinary sound but with music; not so much due to our slowness of hearing, as to the duration of the musical process itself. Of course, any accurate estimation of relative processing speeds would presuppose a method of quantifying visual and auditory information.
For this reason it is one of the most common functions of film music to give a sense of cohesion to a sequence of shots.
Flat Images and Spatial Sounds
Whereas sound may seem immaterial due to its ephemeral nature, it is concrete in its spatiality. Even though the light emanating from the film or TV screen reaches our eyes rather like sound waves reach our ears, we’re in the sound, not in the projected visual world, which beyond the screen is merely a poor ghostly glow. By contrast, sound is almost plastic, rounded; it envelops us in pleasant or disturbing ways.
As Adorno and Eisler put it, “Motion-picture music corresponds to the whistling or singing child in the dark” (2005: 75). (Or: “Film music behaves like a child singing in the dark,” Kinomusik hat den Gestus des Kindes, das im Dunkeln vor sich hinsingt (2006: 68-69). The rhetorical qualities of Adorno’s German are largely lost in the original English edition.)
The sound pictures have changed this original function of music less than might be imagined. For the talking picture, too, is mute. The characters in it are not speaking people but speaking effigies, endowed with all the features of the pictorial, the photographic two-dimensionality, the lack of spatial depth. Their bodiless mouths utter words in a way that must seem disquieting to anyone uninformed. (Adorno and Eisler 2005: 76)
Sound may give depth to the phantoms; music helps transforming a series of isolated sounds into a coherent sound world. For modern TV-raised toddlers life before the screen may be second nature, but this union of projected images and reproduced sound remains fragile, even with present day technology. Technological advance has altered, but not remedied the fundamental imbalance of flat images and spatial sound. Film theorists have often stressed that unlike the image, movie sounds are ‘real’, even if clearly not the original (Buhler 2018: 58). Sound reproduced by speakers is much less a ‘picture’ of sound than the actors on screen are pictures of people (Adorno and Eisler 2006: 69).
Music from Nowhere
Music may be a substantial part of the soundtrack, but it mostly stands apart from the movie’s actual sound world, from the sounds produced by the action on screen.
That is an odd phenomenon. Historically the practice has evolved from practices common in opera and music theatre, though few people nowadays will rely on opera as a framework for understanding cinema. Too easily maybe we’ve grown used to music as something that’s just there, with little sense for the subtleties, paradoxes and absurdities of where and why.
In the early days of sound film directors did worry about the justification for music, introducing a source into the action rather than leaving the music unexplained: a street band, a radio, a fiddler lost in the woods (Neumeyer 2013: 27). What we hear in such instances is, in fact, music representing music, rather like a stage prop: a chair on stage represents a chair, whether it is an actual chair or not.
For music that has no source in the action, film makers have borrowed from the theatre the term ‘pit music’. The cinematic pit is, of course, merely a virtual, and rather mysterious pit.
In opera, music is the sound world exclusively, and may represent a wealth of phenomena: from speech, through gesture, emotional states, ambience, and so on. By convention, all sounds are contained in the score and ‘musicalized’; a musical hegemony that in film is only matched by some animated cartoons. Opera has no need of using stage music as an excuse. Even so, crossing the border between stage and pit music is a favourite device – exploiting the fact that music representing music is still music. Particularly those genres of musical theatre that include spoken dialogue (opera-comique, operetta, musical) exploit this ambiguity, often leaving it undecidable whether a given item is a song or monologue (for instance, the habanera in Bizet’s Carmen).
Film theorists have adopted the term ‘diegetic’ for what belongs to the world represented; stage or ‘source’ music is ‘diegetic’ (Buhler 2019: 160; Heldt 2016: 111). The term, deriving from narratology, points towards the peculiar position of film drama as drama with quasi-narrative elements. Music may take the character of a commentary, a kind of voice-over, introducing a narrative device within the dramatic framework. This too is a development from operatic practice, particularly in Wagner’s music dramas.
Exposing and breaking this convention may produce effects of alienation, ‘romantic irony’, and dada-ish absurdity. (As in Woman at War (Erlingsson 2018), where the musicians, a three man band and a Ukrainian vocal trio, are present outdoors and indoors, visible to us, but – mostly – invisible to the characters.)
The fact that the border between stage and pit music is often crossed or obscured does not imply that it is false or insignificant. If there were no distinction to play with, film music would lack an important dimension. (Chion 2019: 80 therefore calls music “cinema’s passe-muraille par excellence”).
Suppose this is the opening line of a story.
Fiona felt miserable as she was driving home.
Transformed into film, the opening shot might show Fiona driving through, say, a sunny landscape that contrasts with her inner state. Her misery might show in her face. But since there is no interaction which plausibly brings it out, appropriately moody ‘pit music’ might keep the actress from overacting. The spectator will easily project the music’s miserable mood onto the character, given that there is little else to which it may be related.
Maybe it turns out that the music is produced by the car radio (‘diegetic’). Fiona turns off the radio with a gesture betraying irritation. Why does it irritate her? Maybe because it too closely resonates with her feelings, feelings that she cannot yet acknowledge. The radio music might return later in the movie as pit music, representing Fiona’s recollection of how she felt, or as a reminder to us, the spectators, of her earlier state. It then resembles a wordless voice-over, Remember …?, an authorial, quasi-narrative intrusion.