Voice Recognition Part 2 1.0
A voice in the wilderness – Part 2
So, given a bit of effort and a few old components, it is possible to start using technology in new ways. Clearly however, just because something works, that doesn’t mean anyone will want it, or be prepared to pay for it. There are a number of marketing criteria that the big guns like to apply, which boil down to basic questions such as – is it useful, is it usable, is it affordable and is it desirable?
Before considering usefulness, I thought I’d tackle usability. The solution previously described had a large number of dependencies (you’d need both a PC and a PDA, for example) so I wanted to shorten the list a little. Picking up the mantle of integrator could be fun, I thought to myself as I browsed for head mounted displays on the Web. A couple of phone calls later and I was heading to London-based high tech reseller Inition ( HYPERLINK “http://www.inition.co.uk” www.inition.co.uk) for a visit. What had particularly caught my eye (sorry) was a tiny screen (from a company called MicroOptical – www.micropticalcorp.com) that could be mounted on a pair of glasses. Inition had some other products that would slip comfortably into the “cool stuff” category, such as laptops with 3D displays, VR gloves and so on, but I tried not to get too distracted.
The micro screen plugged into a standard VGA port on my laptop, and was self-powered from a camcorder battery, so within a short period of time I was ready to go. Frankly I was worried that the experience might be an anticlimax (“two hours on the train to London – for this?”) until, like Joe 90, I put on the glasses and my world was transformed. There, in the corner of my vision, was a computer screen. Small but adequate, it floated in space like real life with a picture in picture setting, which I suppose was exactly what it was. Five seconds later I had clipped on my microphone and I was dictating into the computer in my hand. A few seconds more and I could be browsing the Web, sending and receiving email, checking the traffic news or buying a pizza – I knew this as I had already played with the voice commands available in Dragon NaturallySpeaking, and I’d found them comprehensive enough.
The little neon sign flickered on in my head – you know the one, bearing the words: “I want one”. I was sold. The whole experience gave me the impression that nothing would ever be the same again – once I could afford such a gadget, that is. But, was it really useful? The good people at Inition told me some of the reasons people used their displays – orchestral conductors reading music, surgeons consulting manuals – but the display/recognition combination seemed to have a more profound value.
As I used the voice/display combination, it felt immediately apparent that this was not some niche application, but a core productivity tool. Consider for example, auditors and surveyors who create reports containing their observations. Surveyors, for example, already use voice recognition, however they usually use some intermediary recorder, which then requires to be played back and edited. How much faster could things be done if the report could be created, edited and delivered within minutes of the observations being noted?
Indeed, there are plenty of workers who combine a dependency on the written word, with a reality that they are not always in front of a keyboard-driven computer. Meeting rooms and the corridors of power, not to mention airport lounges, planes, trains and automobiles, all so much dead time spent in transit, couldn’t this be better spent? To give you an example – following one meeting I used a twenty minute walk back to the station, to collate my thoughts and send some immediate feedback. Had I not had such a facility, the feedback would have been a couple of days, indeed if it had happened at all.
While it is clear that the computer keyboard will not be going away, equally, other input mechanisms remain largely unexploited. There are potential issues – is it safe to dictate while driving, for example, what of the eye strain, and perhaps people need empty time to keep on top of stress – but few would deny there are moments when moving from one place to another that we would love to be doing something more useful. I once told someone I was writing a report on when I was sitting at the beach at Nice. They said to me, of all the things to do on the beach at Nice, you write a report. I replied, of all the places to be writing a report, and where would you rather be but on the beach at Nice!
That’s usefulness, usability and even desirability covered to an extent, but then comes the question of cost. At 1200 pounds a pop, head mounted displays are not going to hit the mainstream anytime soon. There are cheaper versions, but this is just one component: it is the integrated package that needs to be delivered at a reasonable price. iPod sales would suggest this needs to be the sub-500 pound mark before any such package would register on peoples’ radar.
If integration is the answer, then somebody needs to start integrating, and getting products out to the early adopters. This is of course the model applied by consumer networking companies such as LinkSys, as well as credit card companies such as MBNA. The issue is not whether it is the best product, but to get as many potentially useful products to market as cheaply as possible. In this way, the market can decide which are worth having and which are not. It should be possible to do it with old technology – indeed, given the bloated size of Windows XP, newer technology would push the hardware requirement back into the unaffordable so we’d be better with the old.
Meanwhile, Microsoft has tried to achieve something similar with their tablet PC specification, but clearly something went wrong there. If this article illustrates anything, it is that we do not need some new and improved spec; instead, design shops should be concentrating on integrating what exists, and delivering it in a package that thinks more about function than form. Once this delivers a package at a price people can afford, then we might see a major advance in voice recognition use, and with it significant gains in productivity. All it needs is for the industry to get its act together.