Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

Is Speech Recognition Finally 'Good Enough'? 313

jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."
This discussion has been archived. No new comments can be posted.

Is Speech Recognition Finally 'Good Enough'?

Comments Filter:
  • Problems (Score:5, Insightful)

    by Tribbin ( 565963 ) on Friday May 18, 2007 @05:12PM (#19184773) Homepage
    As a foreigner it is really hard to get the pronounciation right enough.

    Also command execution by others in the room is a problem.

    How about listening to music, or TV, and having the computer interpreting it.
  • by orclevegam ( 940336 ) on Friday May 18, 2007 @05:13PM (#19184791) Journal

    Is Speech Recognition Finally 'Good Enough'?

    For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

  • by traindirector ( 1001483 ) * on Friday May 18, 2007 @05:15PM (#19184841)

    TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice

    • when you need to enter hand-written documents into a computer
    • for transcripts of a single speaker
    • informal free-thought when not surrounded by other people
    • when you have horrible typing skills

    For the majority of office tasks, it just isn't a good fit.

    So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.

  • by L. VeGas ( 580015 ) on Friday May 18, 2007 @05:20PM (#19184909) Homepage Journal
    These are some good points. I don't know what I would use speech recognition for, and I'm someone that writes a lot.

    Seeing words laid out as text helps me think. I can compose things better, more coherently.

    I'll write an email in an instant, but make me leave a voice mail, and I'll usually hang up first.
  • Re:I'd say so.... (Score:3, Insightful)

    by RingDev ( 879105 ) on Friday May 18, 2007 @05:25PM (#19185007) Homepage Journal
    To be fair, that's a problem with the IVR coder, not the voice recognition engine.

    -Rick
  • by RingDev ( 879105 ) on Friday May 18, 2007 @05:27PM (#19185039) Homepage Journal
    I would love it for a graphics editor. Being able to swap tools, zoom, bring up pallets, etc... with out having to go digging through menus or trying to remember hot keys. I think VR in desktop software has a place, but it is in augmentation, a fringe benefit, not the core functionality.

    -Rick
  • by Mahjub Sa'aden ( 1100387 ) <msaaden@gmail.com> on Friday May 18, 2007 @05:31PM (#19185079)
    Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?

    Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.
  • Re:Hmmm.... (Score:2, Insightful)

    by Mahjub Sa'aden ( 1100387 ) <msaaden@gmail.com> on Friday May 18, 2007 @05:34PM (#19185127)
    I'll be honest with you, Vista is way better at coming up with hilarious new Madlibs than you are.
  • by GustoGaiden ( 1080739 ) on Friday May 18, 2007 @05:36PM (#19185153)
    programming with voice recognition just seems stupid to me. The idea behind voice recognition is to make it easier to write natural speech, such as email, or an essay, or anything else that follows normal speech patterns. Programming is writing so a computer can understand what you want it to do. It involves TONS of punctuation, oddly named keywords and variables (var, int, _InitBlockPosX). Hell, I can barely read my code aloud to someone else without confusing MYSELF, much less confusing the other human. Case in point, if you're trying to use your voice recognition software to write code, you using the wrong tool for the wrong job.
  • Mod parent up! (Score:4, Insightful)

    by Doctor Memory ( 6336 ) on Friday May 18, 2007 @05:43PM (#19185243)
    Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.

    I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...
  • Re:Hmmm.... (Score:3, Insightful)

    by bearinboots ( 743355 ) on Friday May 18, 2007 @05:48PM (#19185305)

    Dragon is no more... and hasn't been for a long time.

    NaturallySpeaking has been sold a few times to various companies.

    (I keep track because I worked on V1.0)

  • Re:Pretty good (Score:3, Insightful)

    by Rei ( 128717 ) on Friday May 18, 2007 @05:52PM (#19185353) Homepage
    5% could be the difference between "The report confirmed that Iraq has WMDs" and "The report confirmed that Iraq had WMDs." It could be the difference between "Tell Mrs. Smith to take 20mg of neurontin" and "Tell Mrs. Smidt to take 20mg of neurontin." It could be the difference between "The magnet should not be exposed to a field greater than fifteen teslas" and "The magnet should not be exposed to a field greater than fifty teslas." And on, and on.

    Small wording changes can make a big difference -- generally much bigger than typos, which I can assure you happen far less often than 5%. Additionally, typos are generally recognizable as the intended word, and often aren't even noticed by the reader.
  • Re:Pretty good (Score:3, Insightful)

    by TheRaven64 ( 641858 ) on Friday May 18, 2007 @06:15PM (#19185553) Journal
    I wonder exactly what 95% means. Does it mean one character out of every 20 is wrong? One word out of every 20 has an error? One sentence. I average about one to two errors per page, and so all of these sound horrendous to me. Even typing with my eyes closed (which I do sometimes when my eyes are feeling tired, but generally don't because I always think I've managed to move my fingers one character across and started typing complete nonsense) I get higher accuracy than that.
  • by 644bd346996 ( 1012333 ) on Friday May 18, 2007 @06:24PM (#19185667)
    Speech recognition is obviously not universally usable, but it is useful. I've found that for many mundane tasks, the OS X speech recognition is easier than a keyboard shortcut, and much easier than using the mouse. There are a lot of applications that could be much easier if they included speech recognition for commands. Consider an app that relies heavily on both keyboard and mouse input, such as Blender. A lot of the keyboard shortcuts would be faster and easier to remember as spoken commands, and they could be implemented so as to be quite reliable. Also, most 3d modelers can probably get the privacy to use a verbal interface.

    I think the real issue is that speech recognition apps have focused almost exclusively on dictation, which is much harder computationally than picking commands out of a finite, known set. For the latter, speech recognition technology has long been "good enough," and the only challenge it to make effective use of spoken commands in addition to current input methods.
  • by amchugh ( 116330 ) on Friday May 18, 2007 @06:38PM (#19185845)
    You missed for dumping a recording of a lecture or dictation into your computer.
  • by babblefrog ( 1013127 ) on Friday May 18, 2007 @06:48PM (#19185953)
    Where I see it coming into its own is as an input method for really portable "wearable computing", where it would be extremely inconvenient to use a keyboard.
  • by nine-times ( 778537 ) <nine.times@gmail.com> on Friday May 18, 2007 @06:50PM (#19185961) Homepage

    informal free-thought when not surrounded by other people

    I think you're implying something here that is one of the major reasons people don't use speech recognition software: if anyone is around, you feel like a total moron.

    You might not realize this, but you probably speak differently than you write. Most of us do, because there are some things that look good in text that sound bad spoken, and vice versa. Also, a lot of composition goes on when writing, and so if you're playing with different word choices so you can see them written out, you just end up sputtering dumb little phrases. It's easier to edit on-the-fly when using a keyboard. And let's not forget that you might not want the people around you to know what you're writing.

  • Re:Hmmm.... (Score:4, Insightful)

    by pluther ( 647209 ) <pluther@@@usa...net> on Friday May 18, 2007 @10:58PM (#19187711) Homepage

    but what if I really said "urinalysis"?

    Then your secretary would probably get it wrong too

    No, your secretary would almost certainly get it right. Your secretary would know, from experience with you and the kind of work you do and the overall context of the letter whether the person you are dictating the letter to has recently analyzed something for you, or if you are applying for a job in a medical lab.

    95% sounds good if you're not comparing it to a person. But 5% error rate is horrendous for business use. A secretary who missed one word out of every 20 would be fired after a few hours. A couple decades ago, when I temped for office work, I could transcribe about 80 wpm with close to 100% accuracy, and I was nowhere near the fastest.

    If you got a letter from a business containing a typo on almost every line, would you do business with them?

  • by AJWM ( 19027 ) on Saturday May 19, 2007 @02:51AM (#19188751) Homepage
    I mean, is it good enough... to do what?

    Oh, how about evesdrop on a few thousand voice circuits and raise a flag when certain key words or phrases are mentioned?
  • I use it on my PDA (Score:1, Insightful)

    by Podcaster ( 1098781 ) * on Saturday May 19, 2007 @06:57PM (#19194077) Homepage Journal
    I'm using the Nuance voice recorder on my PDA to record dictation, and I've been training Dragon Naturally Speaking 9 to recognise my voice and convert it into text. When I get home I upload the voice files into the desktop computer and it crunches away for an hour running the DNS language recognition engine to turn my speech into text.

    I have used older versions of DNS in the past and the current version is a massive improvement. Basically, if you decide that it is worth spending dozens of hours training the software in order to get reasonably accurate transcriptions then I recommend the product. Make sure to always use the same microphone/headset when recording on your PDA and you'll get great results.

    If your need for accuracy is high, or you have alternatives to recording dictation, then DNS is still probably not for you. Also note that it is still a very frustrating experience to train DNS for non-american accents. There are at least a few reasonably common words that seem to be simply untrainable using the Australian language model for example.

    -P

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...