Software development

Tuesday, March 9, we got the next update on YouTube’s automated captioning efforts. I heard it on NPR’s “All Things Considered” afternoon program, in which Robert Siegel interviewed Ken Harrenstien of Google with a (female) interpreter providing voice for the Google engineer.

Audio and transcript are available at

Harrenstien acknowledges that automated captioning today stumbles on proper names, including trademarks and product names:  ”YouTube” that comes out “You, too!” And automated captioning has difficulty with videos that have music or other sounds in the background. But, he characterizes himself as a technology-optimist, anticipating that in 10 years things will be much improved.

Benefits of captioning

Like “curb cuts” which have become the symbol indicating that solutions for disabled people (here, those in wheelchairs) resolve needs for others (strollers, roll-aboard luggage, shopping carts), captions have benefits that extend beyond hearing impairment.

  • Deaf and hearing impaired people can enjoy the huge inventory of videos on YouTube. (The still frame that opens this post is from an announcement by President Obama in response to the Chilean earthquake. Making emergency and other time-sensitive news available to those who cannot hear meets the requirements of laws and regulations in the US. And more importantly, it meets the moral or ethical standards we expect from a civilized society where we include everyone in the polity.)
  • If you’re in a noisy environment or located close to others who will be bothered by the audio, you can figure out what the video is saying even without benefit of headphones
  • Small companies can afford to provide captions on their webcasts, often the heart of learning about new products
  • Non-native speakers of English have a much better chance of understanding speech at ordinary (rapid) rates with the assist of captions
  • Captions provide input to machine translation services, so that there soon will be captions in other languages besides English as well; as automated speech-to-text technology improves, we’re going to see other input languages as well
  • Captions provide much better input to (current) search technology than speech does, so there’s hope of finding segments of videos that might not appear in written form

Professional captioners need not despair

I read the YouTube blog post of March 4 and the comments following it, and recalled the announcement of the limited trial with selected partners last November.  James expresses concern in his comment about the recent YouTube announcement that people, like him, who earn their living as captioners for post-production houses will lose their jobs as a result of the automated captioning.  My response seconds HowCheap’s comment that professional captioners will continue to find work both as editors of the automated speech-to-text and for organizations prefer doing their own captioning. Organizations that produce professional quality video typically start from a written script, adjust for the few changes that happen in the spoken version, and then set the timing of the text with the video.

The huge number of videos on YouTube are uploaded by individuals or by small organizations who may not be aware of the benefits from captioning, and likely don’t know about the tools available.  According to YouTube’s fact sheet: “Every minute 20 hours of video is uploaded to YouTube.” That’s a volume that is beyond the capacity of professional captioners and the organizations that employ them.

A proposal for improving the quality of captions

How shall we improve the quality of automatically produced captions?

I’d like to see interpreter training programs (ITPs) make editing automated captions a course assignment, a program requirement, or a component of an internship. Engagement with spoken language, not one’s own, is a challenge.  People phrase things in ways you don’t; they use unfamiliar vocabulary and proper names (streets, towns, people, products) that I need to look up.  Both ITPs for training sign language interpreters and those for people learning to interpret between 2 spoken languages may allow entry to students whose skills in listening, writing or spelling may be lacking.  How many caption-editing assignments are enough? Shall we also coordinate quality checks by others in the same or a different program?  Such assignments will guide students toward greater appreciation for the challenges of speech in online settings, with a task that provides an authentic service.


In the case of ITPs for sign language interpreters, the improved listening to online speech is great preparation for work settings such as VRS and VRI.  Video Relay Service (VRS) in the US is regulated by the FCC: deaf signers who cannot use the telephone (because their speech is not intelligible and they cannot hear well enough to understand speech over the phone) make use of intermediaries (interpreters) to communicate with hearing non-signers. (Think of simple tasks such as calling the school to notify them that your child will be absent; scheduling a haircut; ordering a pizza for delivery, not to mention more complex transactions involving prescriptions, real estate contract negotiation, billing disputes.)  Video Remote Interpreting (where the deaf and hearing parties are physically together, with the interpreter remote from them) is a service with similar requirements for the interpreter (listening to speech over a phone or data line, and rendering accurate translations in real time).

Broad multi-disciplinary open source content quality

Programs training instructors in English as a Second Language (ESL) could also participate.  Students in speech therapy and audiology would benefit from both the direct engagement with spoken language “in the wild” and with future colleagues in other disciplines. There are advantages to engaging a variety of people who are studying for professions that emphasize expertise in spoken and written English.

Looks like an open source content development effort to me. Yes, it will require a little bit of coordination, but not terrific overhead. How about it, ITP program directors?

This aphorism is one I’ve heard since childhood from my mother and other female relatives, many of whom are excellent knitters, crocheters, needlepointers, weavers, or are skilled at other sorts of handwork. This sentiment also applies in realms other than knitting – such as software development.

First, a few knitting basics

Let me not assume that my readers are familiar with knitting. I’ll offer this brief definition (adapted from Wikipedia): Knitting is a method for turning yarn (or thread) into clothing, blankets, or similar tangible objects. Handknitting (in contrast to machine knitting) typically uses 2 needles (or a circular needle, with points at either end), and yarn to create linked loops. The loops, also called stitches, are held on the needle until the next loop is complete. Needles vary in thickness and length. The many available colors, weights, materials, and textures of yarn yield a rich variety of results.

What typically goes wrong

If you make an error in knitting, you may not notice it immediately. The meditative rhythm of the work, where the same familiar actions are repeated for a whole row, allows for some sorts of multi-tasking, such as conversation, listening to music, television or radio. The typical action calls for either looping behind or in front of the current stitch, and it’s possible to use tactile methods to push a stitch into position.  You might not detect that you’ve knit two stitches together where this action was not called for, until a few rows later (which may easily be as many as several hundred stitches later).  More complicated stitches offer more interesting ways to get in trouble.

It’s a good idea to rip out the rows back to the point where the error happened, and re-do the work.  Why?  I can think of at least 3 reasons to undo and re-do:

  • The mistake will likely jump out at you every time you look at the completed project or even the project in progress.
  • If it’s an error that changes the number of stitches, it will prevent you from completing the whole project without a work-around.
  • If the error is a dropped stitch, it may even require you to undo work further away, before the dropped stitch itself unravels all the way to the bottom (beginning) of the piece.

Why do we avoid ripping out rows when we know there’s an error back several stitches or even multiple rows?  Again, I’ll offer 3 quick reasons

  • It’s hard for a novice to pick up stitches after they’re off the needle and
  • Undoing work stitch by stitch is painfully slow going, but mostly
  • It feels like you’re not making progress when you have to rip.

Notice that the adage is both descriptive and prescriptive. Descriptively, we can say that one characteristic of a competent knitter is that you are also a skillful (and not fearful) ripper.  No one likes it, but your work will succeed, both aesthetically and functionally if you rip out work when appropriate.  For those who are still learning, it’s a prescriptive reminder that learning to rip out work is part of what will make you a good knitter.

An analogy in software development

Let’s consider work in the software world:  Does detecting and correcting small bugs quickly keep the whole project more intact or intact for a longer time? We know that it’s unlikely that a software project will be completed without any bugs.

Why do developers avoid fixing bugs when they discover them? I suspect some of the same reasons:  It’s hard to maintain the momentum of the project when you have to undo work that you thought was complete.

I won’t elaborate here on the separation of roles, where a programmer is rewarded for completing a number of lines of code or releasing a module on schedule, while the Quality department (or individual responsible for testing) is rewarded for finding (but not necessarily fixing) bugs. Rewarding coders for the number of bugs fixed may have two perverse effects. First, someone who writes sloppy code and then fixes it post-release may be rewarded for the large number of bugs fixed, but not recognized as the person who introduced them. Second, the person who writes error-free code or self-corrects quickly may not be recognized as a high-quality producer.

Of course the analogy is not perfect:  Software is different from knitting, in that modern software is often built by teams of people, rather than a single individual. Knitting rarely requires multiple participants to complete a project.

Still, I’d like to popularize the aphorism with software folks.