Creativity and Closed Captions

(reviewing [reading] [sounds]: Closed-Captioned Media and Popular Culture by Sean Zdenek)

I’m delighted to join my colleagues at Authors Alliance with this cross-posted contribution to their ongoing series on Authorship and Accessibility, an outgrowth of a collaboration between AuAll, Silicon Flatirons (where I’m a faculty director), and the Berkeley Center for Law & Technology, which held a roundtable on the topic with technologists, authors, academics, lawyers, and disability advocates in Berkeley last year, summed up in this report co-authored by my students in the Colorado Law Samuelson-Glushko Technology Law & Policy Clinic.

By random chance, my first advocacy project as a lawyer was working for Telecommunications for the Deaf and Hard of Hearing, Inc. (TDI) and a coalition of deaf and hard of hearing consumer groups and accessibility researchers on closed captions for online video as a part of the Federal Communications Commission’s implementation of the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA). Ever since, I’ve been fortunate enough to spend a lot of my life as a clinical fellow and law professor working on the law and policy side of the wonderful world of closed captioning.

Consumer groups and advocates have long been concerned about the quality of the captions that convey the aural components of video programming to viewers who are deaf or hard of hearing with video programming. While inaccurate and incomplete captions are often the butt of jokes, they aren’t so funny for people who are deaf or hard of hearing and rely on captions to understand the aural component of a video. For example, a single wrong letter on news captions might mean the difference between a story about a war in Iraq and a war in Iran.

That’s why consumer groups have fought hard for caption quality. Those efforts that culminated in the FCC’s 2014 adoption of wide-ranging caption quality standards for television, which require consideration of  that require captions to be accurate, synchronous, complete, and properly placed on the screen.

The FCC’s rules aim primarily at establishing a baseline of compliance to ensure that captions deliver a transcription of a program’s soundtracks that is as close to verbatim as possible given the unique attributes of sound and text. There are lots of good reasons that advocates have focused on verbatim captions over the years; in addition to  incomplete and incorrect captions, there is a lengthy and complicated history of simplifying and censoring the content of captions, which most recently entered the public eye in the context of Netflix’s censorship of captions on the rebooted Queer Eye for the Straight Guy. Verbatim is a principle that corresponds neatly to the goal of equal access: the captions should give viewers who are deaf or hard of hearing as near an equal experience in watching video programming as their hearing counterparts listening to the soundtrack.

However, advocates have also long urged their counterparts in the video industry to take captions seriously not just as a matter of accessibility, but as a matter of creativity. If filmmakers obsess over every aspect of a movie’s cinematography and sound design, why not the captions? In a production that spends millions of dollars to get all the details right, captions that are front and center for a film’s deaf and hard of hearing audience shouldn’t be an afterthought—they should be a core part of the creative process.

Sean Zdenek’s 2015 book Reading Sounds is one of the first efforts to rigorously explore the creative dimensions of captioning. Zdenek, a technical communication and rhetoric professor, endeavors to explore captioning as “a potent source of meaning in rhetorical analysis” and not simply a legal, technical, or transcription issue.

Zdenek’s exploration is an essential encyclopedia of scenarios whether captioning leaves creative choice, nuance, and subtlety to captioners and filmmakers. While captioning spoken dialogue seems on first blush to pose a relatively straightforward dialogue, Zdenek identifies nine (!) categories of non-speech information that are part of soundtracks, including:

  • Who is speaking;
  • In what language they are speaking;
  • How they are speaking, such as whispering or shouting;
  • Sound effects made by non-speakers;
  • Paralanguage—non-speech sounds made by speakers, such as grunts and laughs;
  • Music, including metadata about songs being played, lyrics, and descriptions of music; and
  • Medium of communications, such as voices being communicated over an on- or off-screen television or public address system.

Tricky scenarios abound. What if one speaker is aurally distinct from another, but his or her identity is unknown? (Imagine Darth Vader being identified as “Luke’s Father” in the early going of The Empire Strikes Back. How should a captioner describe the unique buzzing sound made by Futurama’s Hypnotoad? How should the captioner describe an uncommon dialect that may not be familiar to a hearing viewer, or which may have been invented by the filmmaker? What are the lyrics to “Louie, Louie,” exactly?

Zdenek expands into a variety of other problematic scenarios such as undercaptioning (the omission of non-speech sounds), overcaptioning (making prominent the exact content of ancillary speech happening in the background that a hearing viewer may be unable to parse precisely), and transcending the context of a scene to convey information that the viewer shouldn’t know. Delayed captions are all too familiar to deaf and hard of hearing viewers, but Zdenek explores the subtle relationship between caption timing, punctuation, the spoilage of time-sensitive elements afforded by the ability to read ahead of the dialogue, such as reading the aural punchline to the a visual setup, and the inadvertent creation of irony by captions that linger on the screen for too long. Zdenek even highlights the need to caption silence in dynamic contexts, such as a phone ceasing to ring or a person mouthing inaudible dialogue—scenarios that call to mind the controversial “silent” scene in The Last Jedi, which many hearing theater-goers were sure was a glitch but was an intentional choice by director Rian Johnson.

Zdenek also explores the role of captions in situating video in broader cultural contexts. For example, should a captioner identify a narrator who is a well-known actor with whom the audience will likely be familiar but who is uncredited in the film? How should music, such as the iconic NBC chimes, be described in text? And how can captioners be trained to capture cultural significance—especially if a captioner is a computer program converting text to speech automatically?

Zdenek does not offer complete solutions to all these questions and scenarios. But he extrapolates in unsparing detail (much of it presented in audiovisual context on the book’s companion website) how they arise and what considerations captioners and filmmakers might take into mind in thinking not just about how to comply with captioning law, but to author captions.

In doing so, he has also created a compelling reference for lawmakers and policy advocates to develop a richer, more nuanced understanding of the role that captions can play in advancing the civil rights of Americans who are deaf or hard of hearing to access video programming on equal terms. Zdenek is identifying dimensions of captioning that the next generation of video accessibility policy needs to consider and address.

Spectrum, Analogy, and the Line from Physics to Policy

In their 2017 essay Not a Scarce Natural Resource: Alternatives to Spectrum-Think, my colleague Pierre de Vries and former student Jeff Westling lay out an intriguing thesis: roughly speaking, that the oft-uttered-in-DC-circles analogy of the wireless radio spectrum as a scarce natural resource (or collection of resources) doesn’t withstand scrutiny.

According to Jeff and Pierre, ‘spectrum’ is an intellectual and legal construct amenable to multiple meanings, not a ‘resource’ that can be consumed:

The word spectrum is usually used rather loosely, with a variety of denotations. Since the word’s implications depend on which definition the writer had in mind, we propose five possible meanings (however, we suspect that writers often do not have any clear definition in mind, with
spectrum simply referring to the broad topic of radios and regulation). We find none for which all three attributes scarce, natural and resource hold simultaneously.

So the thesis goes, the relevant focus of most spectrum inquiries is not some identifiable ‘slice’ of the spectrum that is consumed like a resource, but rather the right to transmit on a particular frequency at a particular time in a particular geographic area without harmful interference occurring at the relevant receivers. (Though it’s not the focus of Pierre’s and Jeff’s inquiry, a close cousin of the spectrum-as-resource analogy, the ‘spectrum’-as-real-property analogy, is similarly pathological.)

The thesis seems obviously right to me, but I’ve never quite been able to put my finger on why it matters. I have long suspected that most spectrum policy folks who invoke  resource and property analogies do so as convenient shorthand for describing a mode of spectrum policy, but back up their real arguments with nuanced complications of the analogy that ultimately lead to similar results as those they’d reach if they started with the more accurate but unwieldy conception of spectrum rights as a right to transmit without interference.

However, it occurred to me last week in preparing to teach my telecom law students about spectrum that one particularly critical shortcoming of a reliance on the resource and property analogies is the risk that your world view about spectrum management (a political / legal question) will inform your view of how electromagnetic physics work (a scientific question). The informing should go the other direction—spectrum management is a malleable human construct that ought to be informed by the physics of transmission and reception.

The insight occurred to me in puzzling over how to convey to law students, in relatively simple but accurate terms, how radio transmission and reception works. (Luckily, my colleague Dale Hatfield offered me a brilliant analogy involving moving your hand in a pool and capturing the frequency and intensity of the generated waves with a bobber at the opposite end, which neatly illustrates or can be easily extended to illustrate all the fundamental concepts—transmission and reception, frequency and wavelength, amplitude, attenuation, interference, etc.) But it seemed critical to start with the physics because the potential for interference at the receiver ultimately dictates a significant proportion of telecommunications law and policy, including necessitating the involvement of regulators like the Federal Communications Commission to manage the practices around transmissions and reception to some degree or another. It also seems especially critical because many new-to-spectrum-policy law students, don’t understand even on an intuitive level (especially given that most of them don’t listen to AM or FM radio or watch broadcast TV) what a radio is or how it works.

Later in the week at a student-focused event I was tasked with moderating, I tried to press this point further. But the speakers (all lawyers) steadfastly declined my multiple invitations to explain transmission and reception, instead jumping immediately to resource and property analogies. It went something like this:

[Me:] How does a radio work? Please explain transmission and reception at a basic level.

[Panelist:] Well, spectrum is just like when you buy a house…

To be fair, these were smart lawyers sensibly using legal analogies to connect with a student audience at a law school. Fair enough.

But here’s the rub. When we start from the perspective of transmission and receptions, the possible modes of spectrum management unfold in a pretty obvious way—as do the advantages and shortcomings of each method. That is, when we start with the physics, the pathologies and tradeoffs of the different modes are fairly easy to understand. For example:

  • One mode for setting spectrum management, popularized by Ronald Coase in The Federal Communications Commission, is the use of property to order the right to transmit without interference, which facilitates (in theory, at least) all kinds of neat efficiencies through the ability to buy and sell licenses on the secondary market. But it’s pretty obvious to anyone with a basic understanding of the physics that it’s much more difficult to set property boundaries when you’re dealing with the complexities of interference, like out of band emissions, harmonics, noise floors, and so forth.
  • Another mode is the use of unlicensed spectrum bands, pioneered by Mike Marcus among others, which allow innovative new technologies to be developed without incurring the cost of spectrum licenses. There’s an obvious analogy to the commons, and a basic understanding of the physics quickly reveals the possibility of a tragedy-of-the-commons-like problem where over-proliferating transmitters cause so much interference to receivers that a band becomes useless for everyone.
  • Interference can proliferate under any mode of spectrum management. But a basic understanding of the physics of transmission and reception makes clear that detecting interference is a non-trivial task compared to, say, monitoring for trespassers.

With these understandings in mind, we can use property and resource analogies to a limited extent, understanding that they’ll only go so far because assigning the right to transmit without interference involves fundamentally different physics than assigning real property or natural resources.

But if we back into an understanding of how a radio works from a simplified resource or property analogy we risk thinking that the underlying radio physics work according to the analogy, which they often won’t. As a result, the pathologies of the management technique may no longer be obvious because, hey—maybe radio transmissions do fit within neatly circumscribed property lines, just like houses and yards. We then have to wrench the analogies to fit the physics in—“well, spectrum is like a natural resource, except that it’s infinitely renewable,” or “well, spectrum is just like real property, but with shifting and hard-to-measure five-dimensional boundary lines.” Some of the metaphors’ shortcomings are easy to understand, but some are quite persistent, such as the presumption that setting property boundaries is relatively easy.

This is sort of like what happens in law and econ, where we set policy based on economic theories rooted in potentially erroneous assumptions of how people act. With law and econ, though, people may start behaving according to the incentives we set for them, therefore making our models look correct even though the effect is largely observation bias, supercharged by the force of law, leading to the (more or less) expected policy outcomes.

But with spectrum policy, the physics of electromagnetism don’t change just because we misunderstand them, so flawed understanding leads to flawed policy, which leads to flawed outcomes—which will never self-correct. (Of course, spectrum management deals with macro- and microeconomic issues in addition to physics, so this overstates the case a bit. But you get the point.)

Of course, I don’t mean to argue that resource and property analogies aren’t useful for spectrum policy. The case for analogical thinking has been a mainstay in technology law ever since Judge Frank Easterbrook lambasted cyberlaw as ‘the law of the horse’. But, as Larry Lessig counsels in his rebuttal to Easterbrook, we are well served to understand how technology (or in Jack Balkin’s framing, technology’s societal salience) changes the aptness of our analogies. And understanding radio physics is critical to understanding not only why resource and property analogies are useful, but where they come up short.

Update (9.14): Pierre kindly pointed out via e-mail that this analysis ought to be complicated with the role of engineering in addition to physics. The distinction (and feedback loops) between the two are really important and interesting and probably worth a whole other post.