By Dietlinde DuPlessis
Any trained interpreter knows an exercise called “shadowing” which is used to prepare students for simultaneous interpreting. It consists of listening to spoken text and repeating spoken words as exactly as possible. It is harder than it might sound because one must listen and speak at the same time. It helps to use headphones, so your own voice does not drown out the speaker.
This practice – with a few additional requirements – is used under the name of “respeaking” or “voice writing” in captioning applications.
Captioning phones for the hearing impaired Captioning on TV in the form of same-language subtitles or captions is well-known, but another use of captions that might be less familiar is on telephones. In the US, deaf and hard-of-hearing persons are eligible to receive a free landline phone with a display that shows what the person on the other end said. Different companies regulated by the Federal Communications Commission provide this service, which is also available on tablets via an app. I have not found reliable information about other countries, where a comparable service might be available. In Canada, the Hard of Hearing Association is currently lobbying for its introduction.
While many users assume that the captions are automatically generated, speech recognition is not currently advanced enough to reliably decode and display spontaneously produced language by
an untrained speaker.
Working as a Captioning Agent
Captioning companies run call centers with hundreds of people who turn the spoken word into text. People who perform this job are called Captioning Agents/Assistants or Communication Assistants. You will regularly find these jobs on Internet job boards. But don’t expect to make a fortune – in November 2019, job ads mentioned starting wages of $11-12.50 per hour. To my knowledge, the service is available for English and Spanish as well as for American Sign Language. Bilingual captioning agents can work in English as well as Spanish, but they respeak always in
the language that is spoken; it is expressly not interpreting. What might make the position interesting for students or freelancers is the availability of part-time work with shifts at unusual hours and on weekends and holidays, since the centers operate around the clock.
The following description of working conditions is based on my experience in one center, but it might vary for other providers. In the captioning center your workplace is a cubicle with a computer to which you connect your headset.
When a phone call comes in you respeak what the hearing person says. You must repeat it in such a way that the speech recognition picks it up with the fewest possible errors. Since an error-free rendering is rarely achieved you must at the same correct the text appearing on your screen and simultaneously on the caption phone screen. When you make a correction, you must pause and only continue respeaking when you are done typing. This can make correcting mistakes stressful since you have to remember everything that was said while you were typing. No audio is transmitted from the captioning agent to the caller, so there is no way of asking the person to repeat, slow down, etc. What the hard-of-hearing person says typically can be heard very faintly, but
obviously is not repeated. During this time, you get a break from respeaking.
How to succeed as a Captioning Agent
First, you must be able to understand a wide range of dialects and accents since you cannot repeat what you don’t understand. Not every caller will speak slowly and clearly, especially since they might not know they are calling a hard-of-hearing person.
One maybe less obvious requirement is a certain typing speed which will be typically tested before anything else during your interview. You also need good grammar and spelling skills. If the
software shows “you’re” instead of “your” you must notice and correct.
Keeping information confidential is also crucial. It is prohibited to disclose anything you hear on the phone. Credit card purchases, conversations with banks or the Social Security Administration can reveal a lot of personal information.
The most important skill is getting the speech recognition software to print out exactly what you say. The lion’s share of the 1-2-week training period is therefore dedicated to building your profile, which means enabling the software to understand your way of talking. This is done with recorded phone conversations and reading lists of specific words. If you cannot bring yourself to say
swearwords, racial slurs, etc., this is not for you. On my very first day, I had to train the speech recognition on all kinds of N-words, C-words, F-words and other nasty expressions. The mandate is verbatim repetition, no glossing over anything.
The software is amazingly good in interpreting context. If you say, “this is my last word period period period”, chances are it will type out the correct, “This is my last word. Period.” In my case, despite all training attempts, the algorithm never adapted to my accent very well and showed a lot of wrong words that I had to correct on-screen.
Probably my funniest misinterpretation was when the dog breed “blue-tick heeler” appeared on the screen as “blue tequila” and the reader had a good laugh.
Lastly, you need to remember to switch back from respeaker to speaker.
Otherwise, you might come home after a long shift and ask your partner, “How was your day comma darling question mark”. It has happened to me…
Differences between shadowing/interpreting and respeaking for phone captions:
– Repetition needs to be verbatim, no paraphrasing or replacing any words
– You try to stay as close to the speaker as possible, lagging is not a virtue
– False starts are also captioned
– Punctuation needs to be spoken (“comma”, “exclamation mark”)
– Non-verbal utterances like [coughs] or [umm] also must be captioned
– For best results, you need to talk very clearly, and stay rather monotonous
– You simultaneously need to read the outcome on the screen and correct mistakes