Is Speech Recognition Finally 'Good Enough'? 313
jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."
Problems (Score:5, Insightful)
Also command execution by others in the room is a problem.
How about listening to music, or TV, and having the computer interpreting it.
Depends on what you use it for (Score:4, Insightful)
For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.
Good enough for what? (Score:5, Insightful)
TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice
For the majority of office tasks, it just isn't a good fit.
So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.
Re:Good enough for what? (Score:4, Insightful)
Seeing words laid out as text helps me think. I can compose things better, more coherently.
I'll write an email in an instant, but make me leave a voice mail, and I'll usually hang up first.
Re:I'd say so.... (Score:3, Insightful)
-Rick
Re:Good enough for what? (Score:3, Insightful)
-Rick
Maybe the question should be... (Score:5, Insightful)
Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.
Re:Hmmm.... (Score:2, Insightful)
Re:Depends on what you use it for (Score:2, Insightful)
Mod parent up! (Score:4, Insightful)
I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...
Re:Hmmm.... (Score:3, Insightful)
Dragon is no more... and hasn't been for a long time.
NaturallySpeaking has been sold a few times to various companies.
(I keep track because I worked on V1.0)
Re:Pretty good (Score:3, Insightful)
Small wording changes can make a big difference -- generally much bigger than typos, which I can assure you happen far less often than 5%. Additionally, typos are generally recognizable as the intended word, and often aren't even noticed by the reader.
Re:Pretty good (Score:3, Insightful)
Re:Maybe the question should be... (Score:3, Insightful)
I think the real issue is that speech recognition apps have focused almost exclusively on dictation, which is much harder computationally than picking commands out of a finite, known set. For the latter, speech recognition technology has long been "good enough," and the only challenge it to make effective use of spoken commands in addition to current input methods.
Re:Good enough for what? (Score:2, Insightful)
Re:Maybe the question should be... (Score:3, Insightful)
Re:Good enough for what? (Score:4, Insightful)
informal free-thought when not surrounded by other people
I think you're implying something here that is one of the major reasons people don't use speech recognition software: if anyone is around, you feel like a total moron.
You might not realize this, but you probably speak differently than you write. Most of us do, because there are some things that look good in text that sound bad spoken, and vice versa. Also, a lot of composition goes on when writing, and so if you're playing with different word choices so you can see them written out, you just end up sputtering dumb little phrases. It's easier to edit on-the-fly when using a keyboard. And let's not forget that you might not want the people around you to know what you're writing.
Re:Hmmm.... (Score:4, Insightful)
95% sounds good if you're not comparing it to a person. But 5% error rate is horrendous for business use. A secretary who missed one word out of every 20 would be fired after a few hours. A couple decades ago, when I temped for office work, I could transcribe about 80 wpm with close to 100% accuracy, and I was nowhere near the fastest.
If you got a letter from a business containing a typo on almost every line, would you do business with them?
Re:Maybe the question should be... (Score:3, Insightful)
Oh, how about evesdrop on a few thousand voice circuits and raise a flag when certain key words or phrases are mentioned?
I use it on my PDA (Score:1, Insightful)
I have used older versions of DNS in the past and the current version is a massive improvement. Basically, if you decide that it is worth spending dozens of hours training the software in order to get reasonably accurate transcriptions then I recommend the product. Make sure to always use the same microphone/headset when recording on your PDA and you'll get great results.
If your need for accuracy is high, or you have alternatives to recording dictation, then DNS is still probably not for you. Also note that it is still a very frustrating experience to train DNS for non-american accents. There are at least a few reasonably common words that seem to be simply untrainable using the Australian language model for example.
-P