newsobserver.com : special report

Published: Monday, May 22, 2000

Stump the geeks

Q. Trying to follow up the thread started by your May 15 discussion of machine translations, I have recently been exposed to voice recognition software (Dragon, NaturallySpeaking, specifically) and I was startled to see what it can accomplish! It made me wonder about what it might not be able to do, too. I know that Dragon has developed specialized packages for medical and legal work with vocabularies tuned to each respective field. The study of linguistics is at the root of both MT and voice recognition (VR) but will VR accept a specialized vocabulary of spoken linguistics terms? Can VR write out words in the international phonetic alphabet? Will VR allow one to dictate software code and create a software program that way?

John T. Hall

A. I've had the pleasure of watching Voice Recognition and Dragon products in particular make a very big difference in several good geeks' lives.

Back in 1995, a friend with MS made extensive use of Dragon software combined with preprogrammed function keys for his writing and programming.

In 1996 while I was in exile on the Mountain of Conceit (err, better change that one ;-<>>), I hired a programmer who was paraplegic. Although it took about two weeks to "train" the software, he was very productive, very creative and very happy as he programmed by talking to his computer.

More recently, I've had more friends with the "geek wrist disease" turn to voice recognition software with some very positive results. One added bonus is that they are given their own office -- rather than a cube -- or are encouraged to work at home so as not to disturb others planted in the same cube farm.

I've been around to hear "Computer, open new file" enough to know how annoying that can be if you are not the speaker of those timeless words.

Simon Spero, the Internet Hero, was one of the early Dragon users and until recently wrote much of his code using Naturally Speaking.

Interestingly enough, Simon is not now using voice recognition. Simon treated his long-term wrist problem by weight training instead of training his computer. Do not make him angry or you will regret it.

Speaking of special vocabularies, Simon used the British version of the software so that the letter "h" would be added properly to his words. One major improvement is that the computer could transcribe his speech better than any American could.

My doctor, George Dodds, uses Naturally Speaking in his office and is very satisfied. He did not buy the Advanced Medical Dictionary but opted for more training time with the software himself. He spent about an hour reading to the machine as if he were reading to his child -- if his child were to be a Dave Barry fan -- then he was in business. George recommends that you invest heavily in RAM, in the highest processor speed you can get and in the best microphone available. He also cautions that you should be sure to use the sound card recommended by Dragon if you want to avoid problems. George's complaints have mostly to do with homonyms -- by or buy; to, too, or two; and or in -- and that the software insists on consistency. Don't get a cold or sinus condition or you'll be retraining the software to type "nose" for "doze."

Voice Recognition is not Machine Translation though. It is more like Machine Transliteration. The software learns to turn one set of sounds into a set of letters and within certain phrases to make some very accurate guesses as to which phonics are most appropriate in certain contexts. You can train the software to know to write "photo" rather than "foto" for example. Packages that "understand" the need to spell "hassle" when Simon says "assle" still have no idea what "hassle" means nor that would be the appropriate work for "hassle" in French.

The problem with linguistically sophisticated voice recognition systems is that we need for the feedback to be rather fast and the more sophisticated the parsing, the slower the feedback. So using trainable software, while also trying your patience, can be faster and more efficient for a single user.

For multiple short-term users, say as in the case of a virtual travel agent, the software can be fairly domain-specific, allowing for various pronunciations of "Missouri," for example.

Some sophisticated work in voice recognition is going on at the University of Colorado's Center for Spoken Language Research http://cslr.colorado.edu/ and at Carnegie Mellon University's Janus project http://www.is.cs.cmu.edu/ISL.speech.janus.html . Both are working on words and answers focused particularly on the travel domain. Broader, more general work is being done with IBM's Human Language Technology group http://www.research.ibm.com/hlt/html/desktop_recognition.html . The folks at IBM and at those schools are a bit more optimistic about both voice recognition and machine translation than I am.

For more information on Dragon's products, see their Web site at http://www.dragonsys.com/ and the unofficial Naturally Speaking site http://www.synapseadaptive.com/joel/default.htm . You may also want to check out the IBM ViaVoice solutions at http://www.ibm.com/software/speech/

Paul Jones

director, Metalab

University of North Carolina, Chapel Hill

If you have a question, send e-mail to stumpthegeeks@nando.com. Include your name, e-mail address and a daytime phone number.

© Copyright 2000, The News & Observer. All material found on newsobserver.com is copyrighted The News & Observer and associated news services. No material may be reproduced or reused without explicit permission from The News & Observer.