Sometimes it’s hard to know what to believe. Over the years I’ve witnessed more ‘live demos’ than I care to remember (more, probably, than I do remember) and if I had to derive one, abiding rule from them all, it would be this: Most demos don’t work – and the ones that do, more often than not, are rigged.

James Middleton

August 7, 2009

9 Min Read
SpinVox: Behind the spin
SpinVox appears to have some clever technology but is that enough?

Sometimes it’s hard to know what to believe. Over the years I’ve witnessed more ‘live demos’ than I care to remember (more, probably, than I do remember) and if I had to derive one, abiding rule from them all, it would be this: Most demos don’t work – and the ones that do, more often than not, are rigged.

This isn’t cynicism, by the way, this is reality. I’m always happy to see these things function as they’re supposed to, and I don’t revel in the failure of others (much). The pre-commercial demo of predictive text input solution T9 that I received years ago from its designer was brilliant; flawless in execution and as simple a concept as you could wish for. Soon after the demo, T9 began its surge to ubiquity.

Just as simple a concept is that touted by SpinVox. It’s a service that takes your voicemails and turns them into text messages, so you can read them and re-read them at your convenience, without having to dial into and navigate a time-consuming voicemail application. And, like T9, SpinVox works. I’ve been using SpinVox for some time and it does a decent enough job.

The difference between Spinvox and T9 is that the guy who designed T9 was happy to explain at length how his solution worked. The people at SpinVox, on the other hand, are a little cagey about the details. So when I, along with a number of other journalists were invited to the firm’s headquarters in Marlow, Buckinghamshire to see what it promised would be a full demonstration of its solution in use, I signed up straight away.

SpinVox was doing this in response to a wave of criticism from commentators and bloggers – some of whom have claimed to be current and past employees of the firm – that has suggested the firm’s technology claims are bogus and that it is in dire financial straits. Central to these claims is the accusation that SpinVox is misleading the market about the extent to which it uses human beings to transcribe users’ text messages.

The firm has never denied using people when its technology is flummoxed by factors like pace of speech or accent but the allegations suggest that the firm uses people more often than not and that its voice to text conversion software is weak. If it does rely heavily on people, the detractors say, the business cannot scale because of the rocketing wage bill. This, they say, explains the financial problems the firm faces. SpinVox doesn’t deny that times are tight – it has asked staff to take stock in lieu of wages and delayed payments to suppliers – but argues this is just down to the current economic climate.

So was I – along with the other hacks – about to find out the truth about SpinVox’ technology? Was I about to see the firm prove that its finances are sound?

Marlow is a comfortable, well-off, breezily confident sort of town full of comfortable, well-off, breezily confident people. At SpinVox HQ, though, the building positively bristled with anxiety. It was probably the most uncomfortable press briefing I’ve ever been too – tucked, as I was, into a room with two other journalists and 11 SpinVox personnel, around half of which worked in some sort of PR capacity. There were more than a few worried expressions, not least the one worn by Rob Wheatley, SpinVox’s CIO who delivered the presentation.

Filming, photography and recording were banned, which didn’t bode well.

We were promised a demo of the SpinVox system and the opportunity to ask questions, which is exactly what we got. Unfortunately I’m not sure we got the answers we were looking for.

There’s no doubt SpinVox has technology that can do impressive things with voice messages in a mobile environment, and the company has been open that its business combines a variety of human and software agents to deliver its end results. But the big question is whether this business model is scalable and cost effective at the same time.

Wheatley showed us a network diagram of the SpinVox brain, or D2 as it is known, and explained how each bit works in some detail. Apparently this kind of information is the company’s “secret sauce” and is the reason we weren’t allowed to film. I’ve no idea how much any of this info might help competitors so I won’t go into any great detail, but it seemed a sensible, even obvious, approach to the problem of converting voicemails to text messages. In what way it could be commercially sensitive I’m just not sure.

What Wheatley seemed most proud of and, indeed, what most of the company’s granted patents cover, is how the automated system fails over in real time to a human operator.

Undoubtedly, the combination of humans and machines will deliver better results than a fully automated system, and in Wheatley’s own words this is the SpinVox “balancing act” between more accurate but more expensive humans, and cheaper, but less accurate machines.

SpinVox has clearly spent a lot of time and money on development. The brains behind the underlying technology are Tony Robinson and Philip Woodland of the Cambridge University Machine Intelligence Laboratory, and Tenzing, the software used by the human agents has some pretty fancy speech prediction technology involved – the training course to use it lasts two weeks. But again, speech recognition is a complicated and costly enterprise, and it seems that, to all intents and purposes, SpinVox is still heavily reliant on human interaction.

‘The brain’ is always learning, and in the case of SpinVox, it’s learning dynamically. Whereas the traditional approach to speech recognition is to update the brain with new vocabulary and semantics offline, and periodically release a new version of the platform, SpinVox does this on the fly.

Wheatley would not talk about the number of humans involved in the business, and this is the fundamental problem that the firm has. If it cannot answer accusations that its reliance on human beings is sufficient to threaten the business model with hard facts about its number of employees, then it will never be able to dispel concerns over its feasibility. This information is “commercially sensitive” he said, perhaps meaning that it is fundamental to the firm’s commercial viability.

But he said that costs differ dramatically from product to product, carrier to carrier and country to country, and even these vary over time. For example, when SpinVox launches a new language version of its service it buys a dictionary and loads that into the brain. Then the company sets about expanding and upgrading the brain’s language abilities by hiring people to leave messages and read text, giving out free accounts (like mine), and placing ads in newspapers inviting people to call in and leave messages (presumably in exchange for something).

Then there are people whose job it is to maintain the dictionary, as well as the quality control operators, which have to be able to speak the language they are transcribing for at a native level. This suggests that SpinVox has to bring a new call centre online every time it launches into a new country, which sounds like a significant cost, especially when Wheately boasted that traffic at the company has increased ten times every year for the last three years.

The company has admitted to having five call centres on its books, but also says, somewhat unbelievably, that it doesn’t know how many other call centres these operations sub contract out to. If the firm’s not lying, then it is admitting, at the very least, to some worryingly slack business practices.

At this point founder Christina Domecq “just dropped by”, an appearance that seemed staged at the time, and even more so when we found out that the same coincidental appearance had been granted to the previous group of journalists. What was most interesting to see was the effect her presence had on the rest of the SpinVox crew. She must run a tight ship, as they looked little short of terrified. It was tense.

Domecq answered some financial questions. The company will be cash flow positive in the fourth quarter, having just received another £15m cash injection from existing investors to see it through. Yes, the credit crunch has forced the company to extend its credit terms with suppliers and SpinVox is also in legal disputes with several ‘old’ suppliers over non-payments due to “quality of service issues”. The share offer scheme under which employees substitute some or all of their paychecks for shares is to “help SpinVox manage its cashflow” and will continue. And SpinVox is most definitely not looking for a buyer. At least not yet: “Getting a company to cashflow positive is a big tickbox for an entrepreneur,” she said.

So what did we actually learn? SpinVox has some very clever technology, and claims its workflow patent is “very powerful” but is not a source of revenue. Apparently the firm will not licence its technology, and most of its core innovations will never be patented because the firm doesn’t want them to be made public, which seems to defeat the object somewhat. The firm also claims to have been approached to build other custom products using its speech technology, which may be a source of revenue.

The technology seems to work. I don’t think the demo was something out of the Wizard of Oz; there was no man behind the curtain. Rather there was a female operator and she was in full view. And whether or not the company is on the rocks financially is probably something that won’t be revealed until the last minute. But what SpinVox is claiming is that it can maintain that balancing act between expensive humans and cheap machines and keep it scalable.

The company isn’t lying when it says 100 per cent of messages are automated to some degree – a message going into the system counts as automation, even if the system then hands that message over to a human agent. However, after the visit to Marlow, I’ve come away feeling that SpinVox is much more reliant on humans than it would have you believe, and if that is the case, I have doubts as to whether it has a scalable and cost effective business model.

Read more about:

Discussion

About the Author(s)

James Middleton

James Middleton is managing editor of telecoms.com | Follow him @telecomsjames

You May Also Like