I’m delighted to join my colleagues at Authors Alliance with this cross-posted contribution to their ongoing series on Authorship and Accessibility, an outgrowth of a collaboration between AuAll, Silicon Flatirons (where I’m a faculty director), and the Berkeley Center for Law & Technology, which held a roundtable on the topic with technologists, authors, academics, lawyers, and disability advocates in Berkeley last year, summed up in this report co-authored by my students in the Colorado Law Samuelson-Glushko Technology Law & Policy Clinic.
By random chance, my first advocacy project as a lawyer was working for Telecommunications for the Deaf and Hard of Hearing, Inc. (TDI) and a coalition of deaf and hard of hearing consumer groups and accessibility researchers on closed captions for online video as a part of the Federal Communications Commission’s implementation of the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA). Ever since, I’ve been fortunate enough to spend a lot of my life as a clinical fellow and law professor working on the law and policy side of the wonderful world of closed captioning.
Consumer groups and advocates have long been concerned about the quality of the captions that convey the aural components of video programming to viewers who are deaf or hard of hearing with video programming. While inaccurate and incomplete captions are often the butt of jokes, they aren’t so funny for people who are deaf or hard of hearing and rely on captions to understand the aural component of a video. For example, a single wrong letter on news captions might mean the difference between a story about a war in Iraq and a war in Iran.
That’s why consumer groups have fought hard for caption quality. Those efforts that culminated in the FCC’s 2014 adoption of wide-ranging caption quality standards for television, which require consideration of that require captions to be accurate, synchronous, complete, and properly placed on the screen.
The FCC’s rules aim primarily at establishing a baseline of compliance to ensure that captions deliver a transcription of a program’s soundtracks that is as close to verbatim as possible given the unique attributes of sound and text. There are lots of good reasons that advocates have focused on verbatim captions over the years; in addition to incomplete and incorrect captions, there is a lengthy and complicated history of simplifying and censoring the content of captions, which most recently entered the public eye in the context of Netflix’s censorship of captions on the rebooted Queer Eye for the Straight Guy. Verbatim is a principle that corresponds neatly to the goal of equal access: the captions should give viewers who are deaf or hard of hearing as near an equal experience in watching video programming as their hearing counterparts listening to the soundtrack.
However, advocates have also long urged their counterparts in the video industry to take captions seriously not just as a matter of accessibility, but as a matter of creativity. If filmmakers obsess over every aspect of a movie’s cinematography and sound design, why not the captions? In a production that spends millions of dollars to get all the details right, captions that are front and center for a film’s deaf and hard of hearing audience shouldn’t be an afterthought—they should be a core part of the creative process.
Sean Zdenek’s 2015 book Reading Sounds is one of the first efforts to rigorously explore the creative dimensions of captioning. Zdenek, a technical communication and rhetoric professor, endeavors to explore captioning as “a potent source of meaning in rhetorical analysis” and not simply a legal, technical, or transcription issue.
Zdenek’s exploration is an essential encyclopedia of scenarios whether captioning leaves creative choice, nuance, and subtlety to captioners and filmmakers. While captioning spoken dialogue seems on first blush to pose a relatively straightforward dialogue, Zdenek identifies nine (!) categories of non-speech information that are part of soundtracks, including:
- Who is speaking;
- In what language they are speaking;
- How they are speaking, such as whispering or shouting;
- Sound effects made by non-speakers;
- Paralanguage—non-speech sounds made by speakers, such as grunts and laughs;
- Music, including metadata about songs being played, lyrics, and descriptions of music; and
- Medium of communications, such as voices being communicated over an on- or off-screen television or public address system.
Tricky scenarios abound. What if one speaker is aurally distinct from another, but his or her identity is unknown? (Imagine Darth Vader being identified as “Luke’s Father” in the early going of The Empire Strikes Back. How should a captioner describe the unique buzzing sound made by Futurama’s Hypnotoad? How should the captioner describe an uncommon dialect that may not be familiar to a hearing viewer, or which may have been invented by the filmmaker? What are the lyrics to “Louie, Louie,” exactly?
Zdenek expands into a variety of other problematic scenarios such as undercaptioning (the omission of non-speech sounds), overcaptioning (making prominent the exact content of ancillary speech happening in the background that a hearing viewer may be unable to parse precisely), and transcending the context of a scene to convey information that the viewer shouldn’t know. Delayed captions are all too familiar to deaf and hard of hearing viewers, but Zdenek explores the subtle relationship between caption timing, punctuation, the spoilage of time-sensitive elements afforded by the ability to read ahead of the dialogue, such as reading the aural punchline to the a visual setup, and the inadvertent creation of irony by captions that linger on the screen for too long. Zdenek even highlights the need to caption silence in dynamic contexts, such as a phone ceasing to ring or a person mouthing inaudible dialogue—scenarios that call to mind the controversial “silent” scene in The Last Jedi, which many hearing theater-goers were sure was a glitch but was an intentional choice by director Rian Johnson.
Zdenek also explores the role of captions in situating video in broader cultural contexts. For example, should a captioner identify a narrator who is a well-known actor with whom the audience will likely be familiar but who is uncredited in the film? How should music, such as the iconic NBC chimes, be described in text? And how can captioners be trained to capture cultural significance—especially if a captioner is a computer program converting text to speech automatically?
Zdenek does not offer complete solutions to all these questions and scenarios. But he extrapolates in unsparing detail (much of it presented in audiovisual context on the book’s companion website) how they arise and what considerations captioners and filmmakers might take into mind in thinking not just about how to comply with captioning law, but to author captions.
In doing so, he has also created a compelling reference for lawmakers and policy advocates to develop a richer, more nuanced understanding of the role that captions can play in advancing the civil rights of Americans who are deaf or hard of hearing to access video programming on equal terms. Zdenek is identifying dimensions of captioning that the next generation of video accessibility policy needs to consider and address.