IBM Strives For 'Superhuman' Speech Tech 289
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
Coherency? (Score:5, Insightful)
Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.
I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
Foreign languages are complex... (Score:5, Insightful)
It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.
This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.
I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
Ghee... (Score:4, Insightful)
Re:Just what we need... (Score:5, Insightful)
I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
Re:Foreign languages are complex... (Score:3, Insightful)
And how wierd sometimes. English for example loves to use the word "up" in all
sorts of unsuitable places:
give up
shut up
fed up
wash up
fuck up
laid up
muck up
turn up
free up
look up
make up
put up
screw up
hang up
wrap up
hold up
grow up
Wtf?
And home come we say "didn't he.." but in longhand its "did he not...". Shouldn't
it be "did not he"? Why does the "not" shift to the other side of the pronoun?
But then all languages have similar wierd , illogical syntax.
Awful default TTS (Score:4, Insightful)
What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.
This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.
I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
Re:Opensource? (Score:3, Insightful)
Re:Foreign languages are complex... (Score:2, Insightful)
I'm not quite sure what you mean here not bother because of this technology?
I can't see anyone not wanting to bother learning a language because of this technology. Not unless it was a babelfish/universal translator type technology - i.e. basically invisible. In which case, what's the issue?
What are you going to do:
a) Walk around with a little device which translates with 60-80% accuracy when you're in a country where people speak a language you do not understand.
b) Try to learn the language so you don't have to rely on a gadget?
I think I know which one I'd choose - not that I can speak anything other than English, but I do try.
Once devices get to 100% accuracy, my argument disappears. I'd love for that to happen too
Sid
Japanese and English are quite different (Score:3, Insightful)
Re:Foreign languages are complex... (Score:3, Insightful)
Can you see no good in a rough translation for some purposes?
Calculators have largely eliminated the need (an in some cases the ability) for people to do basic math. Therefore we should eliminate calculators before these people start believing that they completely understand cube roots when they just know how to push buttons.
Oh yeah, that reminds me...Cartoons aren't real.
Good luck IBM and I hope this stuff becomes viable soon.
Buyer beware (Score:5, Insightful)
Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.
So don't rush to buy. Let the labs check it out first.
Re:Just what we need... (Score:4, Insightful)
Re:On-The-Fly (Score:3, Insightful)
Re:Which ... (Score:3, Insightful)
BTW what Teletext is like in the U.S. is that we don't have it.
Re:Just what we need... (Score:3, Insightful)
Re:Which ... (Score:2, Insightful)
For example, go watch Memiors of a Geisha and note that Chiyo keeps calling Mameha "oneesan" (Oh-Nay-San) which literally and figuratively translates to big sister. They are not related, and it is not an afectionate reference that someone might make in English to an older woman who provides protection and guidance. The term actually holds a special meaning in the Japanese world of Hostessing (both Geisha and less formal such as snack bars) that I would find difficult to even explain in English. Good luck IBM.
Unlikely (Score:2, Insightful)
Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.
I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."