SpinVox says humans are necessary for time being

UK-based SpinVox has hit back at the naysayers, who claim that the firm’s speech to text technology isn’t as robust as the company claims, by explaining its “world-leading breakthroughs in automatic speech recognition (ASR) combined with artificial intelligence, semantics and natural linguistics”.

Well, nearly. The firm’s announcement actually sounds like a climb down from the claims that SpinVox’s ‘brain’, known as D2 by the company, is a fully automated system.

“Having experimented with purely automatic speech conversion, SpinVox decided early on in its development that because its voice to text service converts real-life, dynamic and fast-evolving language and messages that we use and exchange every day (known in the industry as ‘free form speech’), it was essential that the system had the capability to evolve at the same rate, converting the latest words, phrases, brand names and colloquialisms to ensure a high level of accuracy. This is why it describes the system as ‘live-learning’,” the company said.

Live-learning combines SpinVox’s “rapidly evolving state-of-the art technology with human quality control and training,” to convert its messages. This seems to be an admission that humans are used in the message conversion process, and is nothing new from SpinVox, but it is still not a clarification on the extent to which humans are used. Although the company does admit that it works with five call centres for quality control purposes.

SpinVox claims that every message is dealt with initially by the automated system, but does not clarify what the automated system does. “Only in cases where speech is too indistinct to be dealt with by the system, or contains unfamiliar or new words or phrases, is the completely anonymised and encrypted message sent to a QC agent for help.”

But the patents filed by SpinVox co-founder Daniel Doulton in 2004 don’t help the company’s argument much. From the abstract of US patent application 20060223502: “One of the networked computers plays back the voice message to an operator and the operator intelligently transcribes the actual message from the original voice message…The transcribed text message is then sent to the wireless information device from the computer. Because human operators are used instead of machine transcription, voicemails are converted accurately, intelligently, appropriately and succinctly into text messages (SMS/MMS).”

Throwing fuel on the fire of  are the growing number of commentators on this story that claim to be current or former SpinVox transcribers. The common claim is that the call centre agents use a software application called Tenzing, which features predictive text to aid the operator in transcribing the message quickly.

The company claimed to be “last mile of solving the problem of reliable automatic speech conversion,” its statement said, which will doubtless be interpreted by some as meaning “the company has yet to solve the problem of reliable automatic speech conversion.”

Meanwhile First Tuesday founder and head of Ariadne Capital, Julie Meyer, who is an investor in the firm, has issued an evangelist defence of Spinvox and its founder Christina Domecq.

The full Spinvox announcement can be read here.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.