Mobile gives Voice Recognition its killer app
1999-10-29
Slowly but surely, voice recognition technology is gaining maturity. It may still be in the domain of the comedy club, but will soon form an integral part of the services and devices that make up the IT infrastructure.
There are currently three sectors for voice recognition technology. The first, and best known, is PC-based voice recognition, aimed mainly at the consumer market. Whilst this is achieving some success in the Far East (largely due to the difficulties in providing keyboards for eastern character sets), it is still a niche market. As discussed in \link{,our previous article}, dictation and navigation are not powerful enough reasons for the large-scale adoption of products such as Dragon Dictate and IBM ViaVoice. The second sector is the embedded application sector, in which specific products such as medical and manufacturing equipment are enhanced to include voice input. Thirdly, we have telephony-based products. It is this market which is evolving the most rapidly, and from which the popularisation of voice recognition will occur.
Many communications companies have had their own voice labs for several years, aimed mainly at “enhancing the telephony experience”. As comms and IT have converged, these technologies have been applied to computer applications. For example, last year Lucent announced a pact with Unisys, in which Lucent’s technology would be used to develop an integrated speech recognition software package. Start-up companies such as SpeechWorks and Nuance have been set up with the objective of providing voice-enabled application development tools. So far, the results have been fair to middling: in general the best results come from the systems with limited scope (such as eTrade’s and BT’s voice-driven stock systems). Herein lies the danger: prospective customers remain skeptical, while there is a lack of installed systems with sufficient wow factor to sway them.
In line with the test of the application development industry, component-based development is the adopted direction of the major players. SpeechWorks have released a comprehensive set of Java and ActiveX applets to recognise common structures such as names, addresses, dates and so on, as well as applets for standard applications such as SAP. Similarly, a few days ago Nuance released their first set of voice components, as Java Beans.
Voice recognition is no longer about the quality of the recognition engine, as this technology is sufficiently advanced and improving all the time. Rather it is about the ability to automate dialogues which may occur between end user and machine. Just as with the Web, the voice caller can very easily get turned off by requests for irrelevant or badly ordered information so usability becomes a primary issue, a fact recognised by SpeechWorks’ own development methodology. We are likely to see a number of voice-controlled applications in the future, based on the component model, and the chances are they will become acceptable as their quality improves and their numbers increase. The death of the touch-tone interface is an imminent and welcome consequence of this (though it should be remembered that the bad press of DTMF is as much down to the interface design as the technology). Call centres are the primary market for this: call centre managers will find voice technology indispensable in achieving their “improved service, reduced costs” mantra.
Even so, this use of voice technology is still concentrated more on the enterprise than the individual. The real, mass market, killer application for voice comes with the merger of the mobile phone, a device which we are used to talking to, and the PDA, a device we are used to looking at. The recent announcement by IBM and Nokia serves to illustrate the potential of the technology, which will one day come as standard in handheld devices, particularly as it includes the definition of an XML-based standard for voice. Agreed, a truly portable voice solution which relies on the power of the device alone is a long way off, however advances in wireless technology mean that the recognition can occur on the server side rather than on the device itself. In the keyboard-unfriendly mobile world, the use of voice for navigation and search becomes the better option. Voice entry will sit alongside text and pointer entry, with each being used in the most appropriate manner.
For voice recognition vendors, the future is bright for those that ignore the skeptics and embrace the broadest possible vision of the future. Recognition engines and application development tools are both OEM markets, suitable for the systems integrators. There does exist a third market, which bears a striking similarity to the Web site development market currently occupied by products such as NetObjects Fusion and Microsoft FrontPage. As voice-enabled mobile devices gain presence, site developers will require tools to enable the receipt of speech commands. There are very few players in this space at the moment – SpeechWorks is one of them. However there is still plenty of time for the other voice recognition companies to catch up.
(First published 29 October 1999)