Hard of Hearing

Article 11 of 26
M-iD, January 2005
View a PDF of the original article ~ 340K

Speech recognition can be infuriating, but it is improving fast.

Careful design of the questions used to elicit responses in speech recognition applications is key to improving accuracy and reduces the amount of sampling required. By phrasing questions in particular ways, it is possible to limit the possible range of words used, reduce the size of the speech database that needs to be searched and improve both recognition accuracy and the maximum speech rate possible.

But pattern matching approaches to speech recognition require one thing in great abundance: real-time number crunching capability. In combination with the huge amounts of speech data that vendors have been collecting to improve their models, it has been the great leaps in processing power made available at low cost over the last few years that have made it possible for more sophisticated algorithms and larger speech databases to be used in real-time and to make speech recognition viable.

As well as improvements to the underlying hardware and software, changes within the industry have also made speech recognition more viable. In particular, open standards have made it easier for integrators and end-users to put together applications with a voice interface (see box, VoiceXML versus SALT) and at lower cost. Vendors have also begun to put together packaged applications oriented to particular vertical industries with appropriate vocabulary databases. The increasing number of deployments of speech technology has also made it easier for vendors and developers to build up databases of speech and applications applicable to other users.

The result is that speech recognition is being deployed in many more applications than it used to be. Simon Edwards, director of international marketing at Intervoice, highlights a few applications that his company has implemented. “Customer satisfaction surveys at the end of call centre calls get far higher response rates because of their context and it is far cheaper if speech technology is used. Manufacturers that want information on their end users are able to get more data if they provide a telephone number for product registration than if they provide an easily disposable registration card. The vast majority of calls to helpdesks are for password and PIN information, something easily automated with a speech system. And phone directories, whether customer-facing or employee-facing, are relatively easy to speech enable.”

Call routing, ticket booking and automated helpdesk enquiries are the main applications at the moment, but companies such as IBM and Aungate, a division of search engine company Autonomy, have also started to put speech recognition to use in other areas.

IBM is trialling the use of speech recognition technology with search engine technology to search call centre databases during calls with customers: the speech recognition software picks up keywords in the caller's speech to identify relevant information and displays it on the operator's screen before the caller has even finished explaining what they want.

Similarly, Aungate's technology tries to get the general meaning of the caller's conversation and then looks to see if other similar calls have been received recently. It means trends can be picked up quickly and information relevant to both management and caller can be passed on, making speech recognition a valuable business intelligence tool.

Speech enablement is now a viable option for many organisations and the list of applications is growing. Whether it will ever be as good as human speech recognition remains to be seen. But the days of Epsom being mixed up with Liverpool are long gone.

Page 1 | Page 2 | All 2 Pages

Rob Buckley – Freelance Journalist and Editor

The missing link

Semantic Rubbish

Hard of Hearing