Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Software Technology

IBM Strives For 'Superhuman' Speech Tech 289

robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
This discussion has been archived. No new comments can be posted.

IBM Strives For 'Superhuman' Speech Tech

Comments Filter:
  • Coherency? (Score:5, Insightful)

    by PrinceAshitaka ( 562972 ) * on Wednesday January 25, 2006 @05:38AM (#14555810) Homepage
    From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"

    Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.

    I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
  • by pubjames ( 468013 ) on Wednesday January 25, 2006 @05:52AM (#14555857)
    I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

    It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.

    This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.

    I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
  • Ghee... (Score:4, Insightful)

    by Anonymous Coward on Wednesday January 25, 2006 @05:54AM (#14555864)
    Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?
  • by pubjames ( 468013 ) on Wednesday January 25, 2006 @05:55AM (#14555869)
    More opportunities for Arabic speaking people to misinterpret western media.

    I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
  • by Viol8 ( 599362 ) on Wednesday January 25, 2006 @06:00AM (#14555890) Homepage
    "It's not until you learn another foreign language that you realise how complex languages are, and how subtle."

    And how wierd sometimes. English for example loves to use the word "up" in all
    sorts of unsuitable places:

    give up
    shut up
    fed up
    wash up
    fuck up
    laid up
    muck up
    turn up
    free up
    look up
    make up
    put up
    screw up
    hang up
    wrap up
    hold up
    grow up

    Wtf?

    And home come we say "didn't he.." but in longhand its "did he not...". Shouldn't
    it be "did not he"? Why does the "not" shift to the other side of the pronoun?
    But then all languages have similar wierd , illogical syntax.
  • Awful default TTS (Score:4, Insightful)

    by Council ( 514577 ) <rmunroe@gmaPARISil.com minus city> on Wednesday January 25, 2006 @06:19AM (#14555957) Homepage
    Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.

    What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.

    This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.

    I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
  • Re:Opensource? (Score:3, Insightful)

    by omeg ( 907329 ) on Wednesday January 25, 2006 @06:22AM (#14555968)
    Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.
  • by virtualsid ( 250885 ) on Wednesday January 25, 2006 @06:45AM (#14556032)
    I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

    I'm not quite sure what you mean here not bother because of this technology?

    I can't see anyone not wanting to bother learning a language because of this technology. Not unless it was a babelfish/universal translator type technology - i.e. basically invisible. In which case, what's the issue? ;-)

    What are you going to do:
    a) Walk around with a little device which translates with 60-80% accuracy when you're in a country where people speak a language you do not understand.
    b) Try to learn the language so you don't have to rely on a gadget?

    I think I know which one I'd choose - not that I can speak anything other than English, but I do try.

    Once devices get to 100% accuracy, my argument disappears. I'd love for that to happen too :-)

    Sid
  • by Ogemaniac ( 841129 ) on Wednesday January 25, 2006 @06:48AM (#14556044)
    and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.
  • by anum ( 799950 ) on Wednesday January 25, 2006 @06:59AM (#14556077)
    Learning a foreign language is a net good and the only way to really understand another culture is to experience it. That said, there are a large number of languages and an even larger number of cultures. Do you intend to learn/experience them all?

    Can you see no good in a rough translation for some purposes?

    Calculators have largely eliminated the need (an in some cases the ability) for people to do basic math. Therefore we should eliminate calculators before these people start believing that they completely understand cube roots when they just know how to push buttons.

    Oh yeah, that reminds me...Cartoons aren't real.

    Good luck IBM and I hope this stuff becomes viable soon.
  • Buyer beware (Score:5, Insightful)

    by 99luftballon ( 838486 ) on Wednesday January 25, 2006 @07:04AM (#14556085)
    Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?

    Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.

    So don't rush to buy. Let the labs check it out first.
  • by user9918277462 ( 834092 ) on Wednesday January 25, 2006 @07:13AM (#14556116) Journal
    There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
  • Re:On-The-Fly (Score:3, Insightful)

    by Red Alastor ( 742410 ) on Wednesday January 25, 2006 @07:48AM (#14556211)
    However, add in "domain knowledge" and you're in some interesting territory. I think this is essentially what Google did - they fed in oodles of texts in the various languages so that the system could statistically match phrases. At a simple level, you could have a lookup table of common colloquialisms (eg. 'he's kicked the bucket'(English/UK) == 'he broke his pipe' (French/FR)).
    The problem is that why French/FR people will understand the expression, others like French/CA won't. And even if they did special lookup tables, you'll still miss subtely. For instance, if I want to use the expression you gave as an exemple as a warning to someone in French/CA, I could say "You'll break your neck." which would carry the same meaning. But if I say that someone broke his neck, then it should be understood literally.
  • Re:Which ... (Score:3, Insightful)

    by mwood ( 25379 ) on Wednesday January 25, 2006 @09:59AM (#14556918)
    Just remember that *you* have a truly enormous and well-filled content-addressable memory, a huge and richly-connected semantic network, and untold numbers of self-adapting heuristics that have been trained all day every day for decades, with more coming into production constantly. It's hard for a machine to match that. Feeding 100,000 distinct pattern matchers in parallel is something most computers just aren't architected to do well. That a machine can do even a passable job of speaker-independant continuous speech recognition is an amazing achievement.

    BTW what Teletext is like in the U.S. is that we don't have it. :-( We do have titling on some shows, but to compare that to Teletext is like comparing a single couplet to the poetry section of a library.
  • by mwood ( 25379 ) on Wednesday January 25, 2006 @10:03AM (#14556955)
    Patriotic. What part of "*International* Business Machines" did you not understand? More likely it's to show that they really understand the problem and not just the English-only subset.
  • Re:Which ... (Score:2, Insightful)

    by kryonD ( 163018 ) on Wednesday January 25, 2006 @01:34PM (#14559468) Homepage Journal
    Don't hold your breath on that. After spending seven years studying Japanese just to speak it conversationally, I can tell you flat out that there will never be on the fly translations between Japanese and English. Why you ask? Because the languages and cultures behind the languages are so drastically different, you often have to listen to several sentences before you can organize the correct context for words in the other language. Not to mention occasionally having to add material in the translated output to explain why a certain sequence of words means something.

    For example, go watch Memiors of a Geisha and note that Chiyo keeps calling Mameha "oneesan" (Oh-Nay-San) which literally and figuratively translates to big sister. They are not related, and it is not an afectionate reference that someone might make in English to an older woman who provides protection and guidance. The term actually holds a special meaning in the Japanese world of Hostessing (both Geisha and less formal such as snack bars) that I would find difficult to even explain in English. Good luck IBM.
  • Unlikely (Score:2, Insightful)

    by rcbarnes ( 875915 ) on Wednesday January 25, 2006 @02:31PM (#14560247) Homepage
    Transcription? Not too hard. Translation? I highly doubt it.

    Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.

    I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."

Anyone can make an omelet with eggs. The trick is to make one with none.

Working...