Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Technology

The Future of Speech Technologies 101

prostoalex writes "PC Magazine is running an interview with two of the research leaders in IBM's speech recognition group, Dr. David Nahamoo, manager of Human Language Technologies, and Dr. Roberto Sicconi, manager of Multimodal Conversational Solutions. They mainly discuss the status quo of speech technologies, which prototypes exist in IBM Labs today, and where the industry is headed." From the article: "There has to be a good reason to use speech, maybe you're hands are full [like in the case of driving a car]. ... Speech has to be important enough to justify the adoption. I'd like to go back to one of your original questions. You were saying, 'What's wrong with speech recognition today?' One of the things I see missing is feedback. In most cases, conversations are one-way. When you talk to a device, it's like talking to a 1 or 2 year old child. He can't tell you what's wrong, and you just wait for the time when he can tell you what he wants or what he needs."
This discussion has been archived. No new comments can be posted.

The Future of Speech Technologies

Comments Filter:
  • by backslashdot ( 95548 ) on Saturday January 28, 2006 @03:53PM (#14589155)
    mast and the stand can't aches.

    (the future of speech technology must understand context)
  • by knipknap ( 769880 ) on Saturday January 28, 2006 @04:03PM (#14589233) Homepage
    The present of speech technology already does, and did so for years. One problem is that you don't have a huge enough word corpus for training that technology (the knowledge of context is always limited to the domain that you have been training it against).
  • Actually... (Score:2, Informative)

    by ijablokov ( 225328 ) on Saturday January 28, 2006 @04:55PM (#14589529) Homepage
    ...the point of our multimodal work is that you can have a two way dialog with the device, as well as have visual feedback to the interaction. See http://ibm.com/pvc/multimodal [ibm.com] for some examples.
  • by Anonymous Coward on Saturday January 28, 2006 @05:04PM (#14589569)
    Being a court reporter, I'd say no. A computer doesn't say "What?" when it doesn't understand the words, and it doesn't tell people not to talk at the same time so that the record's clear. Some courts try video, some try just audio recorders, but so far the results haven't been so good. You need people to operate the machine, people to catalog the recordings, people to transcribe the recordings if necessary. It's just better to have a court reporter there to do all that (and often cheaper).

    The problem with the field is that with fewer reporters to meet an increasing demand, the lack of capable court reporters is forcing more electronic recording -- good results or not.

    Now, for medical transcription, it's a great product. After about six months of use, the doctor (or anyone that dictates a lot) has gotten the computer trained to his voice and can go at a pretty good clip (150 words per minute or more). But this is one voice and a limited, task-specific vocabulary.
  • Re:MOD PARENT UP (Score:3, Informative)

    by penguin-collective ( 932038 ) on Saturday January 28, 2006 @05:08PM (#14589586)
    Yes, and Apple's speech recognition technology is many years behind the state of the art. IBM and others had better speech recognition and speech synthesis a decade ago than Apple has today.

    And where exactly is new speech technology supposed to come from inside Apple anyway? They fired all the people who knew anything about speech in the 90's and shut down the labs.
  • by Aggrajag ( 716041 ) on Saturday January 28, 2006 @05:18PM (#14589644)
    Doctors in Finland are starting to use speech recognition to update patient records. I think it is in testing at the moment, check the following link for details.

    http://www.tietoenator.com/default.asp?path=1;93;1 6080;163;9862 [tietoenator.com]
  • by Anonymous Coward on Saturday January 28, 2006 @05:32PM (#14589690)
    I want technology that'll run on a cheap single end-user or SOHO box.

    As I said, Nuance (Scansoft) bought them all up; not just SpeechWorks and Nuance, but Draggon, Lernout & Haupsie, etc. They still sell a bunch of (Windoze) retail SOHO packages for a hundred bucks or two.

    Microsoft has some crappy .NET-based stuff, but I'd give it a pass, if I were you. It's neither SOHO nor enterprise. Not sure what it is...

    It's not really soup yet, but there is also a free solution. See http://www.speech.cs.cmu.edu/ [cmu.edu]. At least one commercial vendor has taken the source, hacked it up and is using it in a commercial product. At least it runs on Linux and (I think) *BSDs

    - The AC OP

  • by Yellow5 ( 519383 ) on Saturday January 28, 2006 @05:55PM (#14589844) Homepage
    I work with speech recognition and to me, your comments sound a little misleading. When "people spend hours and hours and hours transcribing 20 minutes of tape" they usually aren't simply transcribing to text. The time is consumed by transcription of all the additional features in the text (ie. time alignment of words and phonemes, prosody, additional syntactic information such as parsing structure or part of speech tags). This is where all the time is spent. There are, of course, automatic processes for each of these annotations, but some work much better than others. My opinion is that through the next 10 to 15 years, each piece of the speech recognition puzzle will come together to create ASR systems that will be comparable to human transcribers (you only have to be 95% correct to transcribe in a court room).
  • by mikeylebeau ( 68519 ) on Saturday January 28, 2006 @05:58PM (#14589863) Homepage
    You're mistaken about Tellme laying people off; they are doing quite well and are growing. You're right that the voice portal idea is no longer emphasized, but Tellme's making great money selling voice services to enterprise customers.
  • by GnomeChompsky ( 950296 ) on Sunday January 29, 2006 @02:01AM (#14592017)
    Yes. I am aware; it's just that there isn't as much data available as there needs to be in order to be able to say with any confidence that, yes, this is what speech to children looks like, and this is what speech spoken by children looks like. Because like it or not, you have to get your grad students transcribing things for hours in order to get anything out of it. You want to research bilingual acquisition? Fine, but you're probably going to have to do years of legwork to get data for even three children learning the same two languages at the same time. Speech recognition would cut down significantly on the amount of time it took to take down utterances on either end. Which would be an enormous plus.
  • by mandreiana ( 591979 ) on Sunday January 29, 2006 @05:23AM (#14592440) Homepage
    speech recognition
    http://www.speech.cs.cmu.edu/sphinx/ [cmu.edu]

    image+speech recognition
    http://sourceforge.net/projects/opencvlibrary/ [sourceforge.net]

    Desktop voice commands
    http://perlbox.sourceforge.net/ [sourceforge.net]

    Others
    http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO /software.html [tldp.org]
    http://www.cavs.msstate.edu/hse/ies/projects/speec h/software/ [msstate.edu]

    Do you know about other usable open source speech solutions?

2.4 statute miles of surgical tubing at Yale U. = 1 I.V.League

Working...