The Future of Speech Technologies 101
prostoalex writes "PC Magazine is running an interview with two of the research leaders in IBM's speech recognition group, Dr. David Nahamoo, manager of Human Language Technologies, and Dr. Roberto Sicconi, manager of Multimodal Conversational Solutions. They mainly discuss the status quo of speech technologies, which prototypes exist in IBM Labs today, and where the industry is headed." From the article: "There has to be a good reason to use speech, maybe you're hands are full [like in the case of driving a car]. ... Speech has to be important enough to justify the adoption. I'd like to go back to one of your original questions. You were saying, 'What's wrong with speech recognition today?' One of the things I see missing is feedback. In most cases, conversations are one-way. When you talk to a device, it's like talking to a 1 or 2 year old child. He can't tell you what's wrong, and you just wait for the time when he can tell you what he wants or what he needs."
Solution to "one-way" problem (Score:4, Funny)
More popups.
Audio popups!
Heads-up display popups!
Holy blackberries! Get me my patent attorney!
Oh no! (Score:5, Funny)
"I'm sorry, Dave. I'm afraid I can't do that"
Re:Oh no! (Score:5, Funny)
Re:Oh no! (Score:1)
Re:Oh no! (Score:1)
Re:Prostitute Schedule for Jan. 28 at the MBOT in (Score:1)
the footer offs peach take no allergy (Score:3, Informative)
(the future of speech technology must understand context)
Re:the footer offs peach take no allergy (Score:4, Informative)
its been a while (Score:3, Insightful)
I'm looking forward to when I can say "computer, open openoffice for me mate" and it'll go "sure"... That'll be sweet.
Re:its been a while (Score:1)
Re:its been a while (Score:4, Interesting)
Using it when you have a cold, sore throat or when you have been indulging in your favorite alcoholic beverage can corrupt your voice profile and set you back considerably.
Never let someone else use it under your voice profile.
Will voice rec systems ever be 100% accurate and spearker independant? Maybe, but I don't expect to see it for a long time.
Re:its been a while (Score:1)
Re:its been a while (Score:2, Interesting)
We have all the problems mentioned (except drinking). There are also some others that you might not consider. For example:
As the day wears on, the radiologist will get tired, and the recognition will become worse.
Also:
A radiologist who started at 6:30AM will see the sound characteristics of the room change dramatically as more people begin working and activity in the reading roo
Re:You mean now Clippy speaks?! (Score:1)
What's wrong with speech? (Score:5, Funny)
I took a brief poll, and nobody seems to have a problem:
Bruce: I sure like being inside this fancy computer.
Vicki: Isn't it nice to have a computer that will talk to you?
Agnes: Isn't it nice to have a computer that will talk to you?
Kathy: Isn't it nice to have a computer that will talk to you?
Except the trinoids, who complained:
We can not communicate with these carbon units.
I wasn't sure which Carbon they were talking about.
MOD PARENT UP (Score:1)
Re:MOD PARENT UP (Score:3, Informative)
And where exactly is new speech technology supposed to come from inside Apple anyway? They fired all the people who knew anything about speech in the 90's and shut down the labs.
Re:MOD PARENT UP (Score:1, Interesting)
I don't think too many people realized how much more useful Apple's speech technology became when S.I. was teamed with AppleScript. I'm sure IBM's technology is/
Speech is the future! (Score:1)
Re:Speech is the future! (Score:2)
Re:Speech is the future! (Score:2, Insightful)
Re:Speech is the future! (Score:2)
Re:your (Score:2, Funny)
Re:"maybe you're hands are full" (Score:2)
Re:your (Score:1)
Language Acquisition... (Score:5, Interesting)
You see, the problem right now is that there's really not much data that's in the public domain for linguists/psychologists/what-have-you to study, because it's incredibly, incredibly laborious to do longitudinal studies of children's utterances, or of input to the child. People spend hours and hours and hours transcribing 20 minutes of tape. They're understandably reticent to just share their data out of the goodness of their hearts. Even when they do, it's never a large sampling of children-and-their-interlocutors from-birth-to-age-X, it's usually just one child and maybe his or her parents from age 8 months to 3 years.
So we have arguments about whether or not kids hear certain forms of input (Have you used passive voice with your child recently? Where's your child going to learn subjacency?) that go back and forth between psychologists and linguists, and people perform corpus studies on 3 children and feel that that's representative -- never mind the fact that these three kids were all harvested from the MIT daycare centre, and were the children of grad students or faculty members, and thus may not be representative of the population at large.
Speech recognition would make it much, much easier to amass large corpora of data for larger samples of the population. It'd make it much more likely for people to share their data. And, what's more, it'd likely be possible to have a phonetic and syntactic-word-stub (for lack of a better word) transcription made from the same recording. We'd have a better idea of how the input determines how language is acquired by children, and what sorts of stages children go through.
Re:Language Acquisition... (Score:2, Interesting)
Very interesting. Since you're a linguist, I wonder if you might address a concern I've had about speech recognition technology in general.
I've dabbled a bit with Dragon Naturally Speaking in the past (v.7) and frankly found it still too immature to be of much use to me. I find it still far easier to deal with an accurate yet artificial interface (keyboard and mouse) than an inaccurate but more "organic" interface (speech recognition).
But one of the things that stood out from the experience was the wa
Re:Language Acquisition... (Score:2)
I have a problem seeing how we should be worried...
Re:Language Acquisition... (Score:1)
Re:Language Acquisition... (Score:2)
Err... I'm confused, isn't this research going on exactly to provide speech recognition/transition systems with the data? So a perfect speech recognition system would make further acquisition unnecessary. What else would you want to collect the data for?
Re:Language Acquisition... (Score:1)
Human language acquisition, not machine language acquisition.
Also, as another linguist, let me add the part the parent forgot: In the literature, the term "language acquisition" is usually distinct from "language learning." Language acquisition is usually the term used to describe the process by which children acquire their "native" language(s). It appears to be very different from language learning, which is what you do if you start studying a foreign language after you have left the so-called "critica
Re:Language Acquisition... (Score:1)
The linguistic parallel to the collective unconscious.
If developing artificial speech and hearing with computers takes us closer to this, then I think the results should be extraordinary.
But it's just my two cents obviously!
Re:Language Acquisition... (Score:1, Informative)
Re:Language Acquisition... (Score:1)
Try http://www.cavs.msstate.edu/hse/ies/projects/spee
Re:Language Acquisition... (Score:2, Informative)
Re:Language Acquisition... (Score:1)
Re:Language Acquisition... (Score:1)
Set the application to do something useful for the user that downloads and installs the application as an incentive and well-define the task such tha
IBM Speech - Needs Superhuman sales to survive? (Score:5, Interesting)
Scansoft, who earlier all but cornered the market for Optical Character Recognition (OCR) technology, did the same with speech recognition by acquiring the largest players in this space, SpeechWorks and Nuance. Scansoft changed their name to Nuance as a part of that last acquisition.
IBM, meanwhile, has been struggling to find a market for their "Superhuman" (sneer) speech reco technology. A few years ago, they sold distribution of their retail desktop product, ViaVoice, to (wait for it) Scansoft. Their commercial product was RS/6000-AIX-only until a couple of years ago, when they ported it to more platforms, including Windows and Linux, and integrated it more tightly with their Rational and WebSphere marketing platforms.
The current enterprise product sounds really sexy, at least for Rational-WebSphere shops. You can develop your WebSphere VXML application in Eclipse and leverage all those groovy WebSphere services you've built. No (or not much) special skill required!
The problem is that their target market is Telecom Managers, who face a choice between IBM, with a few hundred ports installed, and Nuance (-ScanSoft-SpeechWorks), with tens- or hundreds-of-thousands of installed speech reco ports. Telecom Managers live in a world where their clients expect six-sigma/five-nines reliability. This is a hard sell to make.
The question is, how long can IBM keep pouring money into speech R&D and product development in the face of dismal sales? Some in the industry expect the answer is, "Not too much longer." And that. of course, makes nervous enterprise buyers even more nervous and less likely to buy.
Re:IBM Speech - Needs Superhuman sales to survive? (Score:1)
Re:IBM Speech - Needs Superhuman sales to survive? (Score:1, Informative)
As I said, Nuance (Scansoft) bought them all up; not just SpeechWorks and Nuance, but Draggon, Lernout & Haupsie, etc. They still sell a bunch of (Windoze) retail SOHO packages for a hundred bucks or two.
Microsoft has some crappy .NET-based stuff, but I'd give it a pass, if I were you. It's neither SOHO nor enterprise. Not sure what it is...
It's not really soup yet, but there is also a free solution. See http://www.speech.cs.cmu [cmu.edu]
Re:IBM Speech - Needs Superhuman sales to survive? (Score:1)
Re:IBM Speech - Needs Superhuman sales to survive? (Score:2)
At some point within the next 10-50 years, *someone* is going to develop SR technology that CAN act as a totally natural HCI. The potential profits from this exceed those of MS, possibly on patents alone! At the very least IBM is going to want the patent leverage to be able to take advantage of that technology if they are not the ones who develop it.
I am often suprised that MS/Apple arnt making a significant inve
Re:IBM Speech - Needs Superhuman sales to survive? (Score:2)
so even if someone does build the complete natural-linguistic speech recognition, it'd be worthless since scansoft (they've chagned names a # of times now) owns a couple of the stages in the stack. you can try to sell it to them or try to buy the rights, but you're just some schmoe with cool technolo
Re:IBM Speech - Needs Superhuman sales to survive? (Score:2, Insightful)
It just seems like IBM, seemly a company obsessed with creating and preserving intellectual capital, wouldn't so hastily sell off patents that they might ever be able to use / need, unless there was a catch, like they got access to Scansoft's portfolio as part of the bargain?
Just speculation, b
integration (Score:4, Interesting)
Re:integration (Score:2, Funny)
Of course, it seems you'll have the advantage of not having to tell it to switch to uppercase no i meant put the letters in uppercase not the word quote uppercase quote shift shift er fuck hey Joe what is it for uppercase huh was that caps lock YOU SAID OK THANKS NO DELETE DELETE THAT.
Re:integration (Score:4, Insightful)
Dragon Naturally Speaking is a baby step in that direction, but it is pretty much limited to single nouns or verbs.
Re:integration (Score:2)
I'm not convinced that spoken language is more precise than any other form of interface. In fact, I'd suggest just the opposite.
When one wishes to communicate anything with precision, writing it down is likely to lead to far better results. For the really demanding material, diagrams, equa
Re:integration (Score:2)
The key is in the difference between the words "facile" and "precise." You are absolutely right that written language is more precise, and written language with diagrams even more so, than spoken language. The problem is facility. The time it took me to write this and think about my choice of words is about 10x the time it would have taken for me to explain it verbally.
In an interface situation in which the computer provides me with reasonable feedback so t
Re:integration (Score:2)
Wow. Where did you get that idea? Most of the non-engineers I have encountered require an interpreter ('consultant') to translate their spoken words into something which is sufficiently precise to enter into a computer. Anybody who's been involved in the analysis/specification stage of a development project will know what I mean.
They aren't any better at doing it with a keyboard, but they
Re:integration (Score:2)
Re:integration (Score:2)
Re:integration (Score:2)
Re:In Soviet Russia (Score:1)
can it replace court reporters? (Score:4, Interesting)
In any case, I warned her about the potential for voice recognition technology to render court reporters obsolete. It probably won't happen, but the mere prospect tipped her in the direction of foregoing the opportunity. Was that a mistake?
The same concern applies also to medical transcription.
Re:can it replace court reporters? (Score:2, Informative)
The problem w
So why is voice input in decline? (Score:4, Interesting)
Try TellMe. Call 1-800-555-TELL. It's a voice portal. Buy movie tickets. Get driving directions. News, weather, stock quotes, and sports. All without looking at the phone. So what's the problem?
Re:So why is voice input in decline? (Score:4, Informative)
Re: (Score:2)
Actually... (Score:2, Informative)
Don't question my intelligence, it's fake. (Score:1, Interesting)
Re:Don't question my intelligence, it's fake. (Score:1)
Unrecognizable grunts (Score:1, Funny)
"There has to be a good reason to use speech, maybe your hands are full"
Now, what if the mouth is full too? Ventriloquism?
Re:Unrecognizable grunts (Score:2)
Screw speech recognition (Score:2, Interesting)
Babblin' all over the place is dumb.
Instead of speech recognition let's work on better speech synthesis. Here we are in 2006 and the average synthesized voice sounds hardly better than my freakin' Phasor card I had for my Apple
Re:Screw speech recognition (Score:2)
On the other hand, it is a joke killer. Star trek producers would probably sue IBM if this would go mainstream. Nobody will laugh in the scene where Scoty talks to PC mouse anymore. IBM would ruin their best scene ever
Now, if they could make my computer make coffe and a beautifull babe ready to do anything out of nothing,... that would be something. It is something I would be proud to call progress.
Re:Screw speech recognition (Score:1)
BTW: ya I know the synthesiser is part of the client, usually uses MS Sam I think...
Doctors are going to use speech recognition (Score:4, Informative)
http://www.tietoenator.com/default.asp?path=1;93;
GUI gets in the way (Score:1)
I see the future (Score:2)
It's not the tech, it's the applications once more (Score:2, Insightful)
Which is so terribly ineffient and cumbersome. You really don't want to spend the time to socially interact with your coffeemachine at 7am.
Unless it's able to go to the shop, put in exactly the right amount of coffee and is able to turn itself to on once it h
Audio search & instant report (Score:2)
Good speech recognition would be great for searching audio. We could index webcastings, not only text. It would also be great for reporting meetings and conferences.
Re:Audio search & instant report (Score:2)
patents (Score:2)
and tahts why speech recognition 2006 is the exact same as speech recognition 1997.
FUCK YOU CAPITALISM. FUCK YOU
Re:patents (Score:1)
Artificial Intelligence (Score:1)
Anyone know of a project to simulate human life starting at a fertilized egg? That would be sweet once we understood all of the chemical processe
Speech recognition is for people who are alone (Score:2, Insightful)
Open source speech recognition engines (Score:3, Informative)
http://www.speech.cs.cmu.edu/sphinx/ [cmu.edu]
image+speech recognition
http://sourceforge.net/projects/opencvlibrary/ [sourceforge.net]
Desktop voice commands
http://perlbox.sourceforge.net/ [sourceforge.net]
Others
http://www.tldp.org/HOWTO/Speech-Recognition-HOWT
http://www.cavs.msstate.edu/hse/ies/projects/spee
Do you know about other usable open source speech solutions?
Ambient Noise... (Score:1)
Hands are full? (Score:1)
"Computer, play video!"
"Hmm, to much talk..."
"Computer, fast forward!"
"Wow, nice!"
"Computer, resume normal play!"
"Mmmm"
"Computer, play that scene again..."
(Girlfriend comes home)
"Computer, stop playback! Stop! Shut down!"
Talking to 1 or 2 year olds is .... (Score:1)
ughh (Score:2)
I en tee space main open-parenthesis i en tee space a ar gee cee comma cee aitch a ar asterisk space a ar gee vee open-bracket close-bracket close-parenthesis open-curly-bracket...
"whispered" speech interface and tablet computing (Score:2)