Forgot your password?
typodupeerror
Technology Science

Speech Recognition in Silicon 328

Posted by CmdrTaco
from the spell-my-naughty-words dept.
Ben Sullivan writes "NSF-funded researchers are working to develop a silicon-based approach to speech recognition. "The goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer." Good use of $1 million?"
This discussion has been archived. No new comments can be posted.

Speech Recognition in Silicon

Comments Filter:
  • Funny... (Score:5, Interesting)

    by leonmergen (807379) * <lmergen@@@gmail...com> on Tuesday September 14, 2004 @10:56AM (#10245968) Homepage
    Funny, I work on a speech recognition research project, and well, i have to say, think about all the possibilities... automa ted speech2text recording of meetings, on-the-fly subtitling of live tv shows, but it can get better : think about searching multimedia files in a google-kind of way based on audio, that automatically directs you to that part of the file where you want to be...

    If this really is true what they're saying, and knowing how much money is invested in speech recognition research on a yearl y basis, yeah, i would definately say that this is one million dollars of great investment...

    ... but then again, maybe they're just throwing around with numbers to make sure they get their money. :)

    • Re:Funny... (Score:2, Funny)

      by strictfoo (805322)
      I work on product X and think of all the possibilities (list slightly feasible but most likely never going to happen features).

      If this is really true what they're saying then people should put tons more money into product X!

      But then again maybe I'm just talking up product X to make sure I get my money :)
      • Re:Funny... (Score:5, Insightful)

        by Christopher Thomas (11717) on Tuesday September 14, 2004 @11:48AM (#10246610)
        I work on product X and think of all the possibilities (list slightly feasible but most likely never going to happen features).

        If this is really true what they're saying then people should put tons more money into product X!


        Actually, use of speech recognition technology to index video clips for search engines _is_ both a very desirable technology, and something that can be done fairly easily (most professionally produced video, at least, takes great pains to have one speaker at a time and keep noise to a minimum). There's a fair bit of video content accessible via the web right now, and this will only increase (most new digital cameras can take video clips now - remember how quickly still pictures flooded the web when digicams first became available?).

        Speech recognition technology has trouble when it's trying to sort out a noisy environment or a degraded communications channel, and has trouble holding useful open-ended conversations (as opposed to task-driven), but it's very capable in most other contexts. After all, the field has been under study for decades.

        In summary, your mocking of the parent post is premature.
    • Re:Funny... (Score:4, Insightful)

      by loginx (586174) <xavier@NoSpam.wuug.org> on Tuesday September 14, 2004 @11:12AM (#10246155) Homepage
      I want to sing the general tone of a song I heard on the radio in a microphone and have google direct me to that album on froogle.

      THAT would be awesome!
      • Re:Funny... (Score:3, Interesting)

        by richy freeway (623503)
        We have something like that in the UK called Shazam [shazam.com].

        Just dial a number on your mobile phone, hold it up to the speaker while the tune you want ID'd is playing and it'll SMS you back shortly with the track name and artist. You can then log onto the Shazam website, enter in your mobile number and you get a list of all the tracks you've searched for along with links to an Amazon search so you can purchase the track.

        Pretty good for ID'ing tracks when you're in a club and can't get to the DJ to hassle him. :
    • Re:Funny... (Score:2, Interesting)

      by tubbtubb (781286) *
      My understanding of speech recognition is minimal, but from what I understand the meat of this chip would probably just be a floating point SIMD engine to do FFTs, and some comparison and control logic.

      I'm wondering if you could just do this with your average ATI or Nvidia 3D chip and an FPGA wrapper?
    • Re:Funny... (Score:4, Interesting)

      by syukton (256348) on Tuesday September 14, 2004 @11:20AM (#10246220)
      From what you describe, it isn't so much a speech recognition thing as it is a sound recognition thing; essentially, a way for a computer to logically distinguish between many millions of different sounds.

      How far away are we from having a machine that could identify all of the instruments in a piece of music by "listening" to the music? I say "listening" because there need not physically be a playback-and-listen, the playback could be mathematically modeled by the computer.
  • by AKAImBatman (238306) * <akaimbatmanNO@SPAMgmail.com> on Tuesday September 14, 2004 @10:56AM (#10245971) Homepage Journal
    Good use of $1 million?

    Let me think for a moment... Hell yeah! If we had low power speech processors, the possibilities would be endless. For one, we'd finally have a Star Trek(TM) interface for our homes!

    "Computer, lights!"
    "Computer, make coffee!"
    "Computer, Earl Grey, hot!"

    As silly as it may sound, such an interface would be far more efficient than mashing buttons.

    In addition, blind people could be significantly helped by this. Many of them already use speech recognition and synthesis to assist in computer usage. Imagine if their computers could suddenly understand them a thousand times better? They could talk to their computers a bit more naturally, thus saving their vocal chords from undue stress.

    Other applications (off the top of my head) are:

    - Voice notes on embedded devices (store only text!)
    - Helpful Kiosks that can give you directions
    - A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
    - Voice controlled robots ("You missed a corner, vacuum cleaner")
    - Data search by voice ("Find me a channel that plays Star Trek")

    Any other cool ideas out there?
    • Any other cool ideas out there?

      Yes.

      Peter Gibbons : What would you do if you had a million dollars?
      Lawrence : I'll tell you what I'd do, man, two chicks at the same time, man.
      Peter Gibbons : That's it? If you had a million dollars, you'd do two chicks at the same time?
      Lawrence : Damn straight. I always wanted to do that, man. And I think if I had a million dollars I could hook that up, cause chicks dig a dude with money.
      Peter Gibbons : Well, not all chicks.
      Lawrence : Well the kind of chicks t
    • by theparanoidcynic (705438) on Tuesday September 14, 2004 @11:05AM (#10246089)
      Any other cool ideas out there?

      Universal language translators. Imagine headphones that let you understand any known language.
      • That is good thought! The thing software which is the simple problem where existing translation that it is developed applies algorithm to speech of real time very is healthy! Gorgeousness!

        P.S. I used Babelfish for translating this post.
    • by randombit (87792) on Tuesday September 14, 2004 @11:12AM (#10246151) Homepage

      - Voice controlled robots ("You missed a corner, vacuum cleaner")
      - Data search by voice ("Find me a channel that plays Star Trek")


      Kinda jumping ahead of yourself, aren't you? There are two steps to an operation like these, speech to text, and understanding the text you get out. Speech recognition gives you the first part, but you still have to be able to pull apart the sentence and figure out what it means.

      Also, the article didn't say more accurate than software, it said more efficient. You know, uses less power and stuff like that? If the applications you mention (like search via voice) were possible/usable, you could run them today on an upper-end PC no problem.
      • by frank_adrian314159 (469671) on Tuesday September 14, 2004 @11:28AM (#10246316) Homepage
        There are two steps to an operation like these, speech to text, and understanding the text you get out. Speech recognition gives you the first part, but you still have to be able to pull apart the sentence and figure out what it means.

        In fact, converting the speech to text and then trying to analyze the text without sound-level annotations might give bad results, as tonal or emotional content would be lost. You need both simultaneously to really understand what's being said.

      • Actually they are one in the same, it is possible to determine what something means using today's voice recog. (I've got a setup that controls my entertainment center and lights in my apartment through voice recog) However it is wildly inefficient and difficult to setup. The reason is the english language is just about the most illogical system on the planet, and computers only understand logic. Due to the limited scope of my setup, I only had to record about 20-40 words/phrases and reference them diffe
    • - Helpful Kiosks that can give you directions

      - A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
      - Voice controlled robots ("You missed a corner, vacuum cleaner")
      - Data search by voice ("Find me a channel that plays Star Trek")

      While I agree that this is a great investment, voice recognition does not equal artificial intelligence. Even if the computer is able to tell that you spoke the words what+were+last+quarter's+net+sales, it would not know w

      • I don't think you understand. Natural Language Interfaces [schemamania.org] already exist for SQL databases. Their biggest limitation is that they need quite a bit of meta data about your data structure in order to properly parse the queries. But once the meta data has been added, the computer should be capable of answering most questions about your data.

        It's not really useful for development work, but it can come in handy for allowing data requests from executives..
    • The article mentions speech recognition, but not comprehension. You cannot take pure recognition and immediately make a superhelpful information kiosk or natural language query system out of it.

      Such an informational kiosk could be made just as easily with current speech recognition technology considering how limited the interface would have to be. (A handful of phrases, such as "I'm lost", then replying to a voice prompt with the location you're looking for, at which point the computer can do a quick looku
    • I think you hit it on the head.

      Any other cool ideas out there?

      Some specific ideas off the top of my head:
      - Navigation systems in cars
      - Decent automated phone system
      - Microwave Ovens (tell it to cook two baked potatos)
      - PDA calander entries.
    • It could relieve me from having to type my password ... oh, wait ...
    • Actually, voice is terrible for controlling anything that doesn't talk back, and pretty bad for anything without a large amount of common sense (i.e., unsolved AI problem). There just isn't enough information in speech to react at all appropriately to it without a very good understanding of context, and you generally can't express unscripted ideas without dialogue.

      On the other hand, there's a lot of information currently available as speech which could be managed more usefully if transcribed automatically.
    • Any other cool ideas out there?

      Walk into someone's office: "Computer! Format C:"

    • For one, we'd finally have a Star Trek(TM) interface for our homes!


      what you are looking for is called misterhouse [misterhouse.net]

      It interfaces with the IBM via voice apps as well as other items to give you what you want.

      it's not perfect as voice recognition is only slightly better than it was in the late 80's but it's what you are asking for.

      and AMX/Panja has a turn-key system they you can buy and have installed for around $50,000.00 that also can do what you want, I saw a demo of that system last weekend in a home
    • Ah yes this is Professor Frink, floygan.

      As any self-respecting Star Trek nerd will know, the Captain orders his tea like this: "Tea, Earl Grey, hot!". Alas you have forget the initial "Tea!" floygan smoygan.
  • Text of article (Score:4, Informative)

    by Anonymous Coward on Tuesday September 14, 2004 @10:57AM (#10245983)
    Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security. Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware. ''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''

    From Carnegie Mellon University:

    Carnegie Mellon engineering researchers to create speech recognition in silicon

    Team to develop new silicon chip

    Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security.

    Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware.

    ''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''

    The problem is power--or rather, the lack of it. It takes a very powerful desktop computer to recognize arbitrary speech. ''But we can't put a PentiumTM in my cell phone, or in a soldier's helmet, or under a rock in a desert,'' explains Rutenbar, ''the batteries wouldn't last 10 minutes.''

    Thus, the goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer.

    The research team is uniquely poised to deliver on this ambitious project. Carnegie Mellon researchers pioneered much of today's successful speech recognition technology. This includes the influential 'Sphinx' project, the basis for many of today's commercial speech recognizers.

    ''We're still not even close to having a voice interface that will let you throw away your keyboard and mouse, but this current research could help us see speech as the primary modality on cell phones and PDAs,'' said Richard Stern, a professor in electrical and computer engineering and the team's senior speech recognition expert. ''To really throw away the keyboard, we have to go to silicon.'' But enhanced conversations between people and consumer products is not the main goal. ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''

    Researchers plan to unveil speech-recognition chip architecture in two to three years.
  • First Post (Score:5, Funny)

    by JohnHegarty (453016) on Tuesday September 14, 2004 @10:57AM (#10245989) Homepage
    I can just see the anonymous cowards shouting first post at their pcs now
  • by CrazyJim1 (809850) on Tuesday September 14, 2004 @10:58AM (#10246002) Journal
    My friend and I were talking about this. In countries that are more totalitarian, it could be used to root out "dangerous people" www.geocities.com/James_Sager_PA
  • accuracy (Score:5, Insightful)

    by tubbtubb (781286) * on Tuesday September 14, 2004 @10:59AM (#10246009)

    100 to 1000 times more efficient worth $1M? meh. maybe.
    100 to 1000 times more accurate worth $1M? definitely.
    • Re:accuracy (Score:3, Insightful)

      > 100 to 1000 times more efficient worth $1M? meh. maybe.
      > 100 to 1000 times more accurate worth $1M? definitely.

      Accuracy does not have to be a problem with modern speech to text systems, but the need to 'train' them to get that accuracy, and the need to talk to it in a somewhat distinctive way, make them far less efficient.

      I'd rather say that the time it takes to get used to a speech recognition system (and to get it used to you where appliable), together with the soemwhat heavy cpu requirements, a
  • by Anonymous Coward on Tuesday September 14, 2004 @10:59AM (#10246010)
    Damned straight it is! In government terms, that's a pittance. In government-funded science terms, it's downright INFINITESIMAL. It isn't even couch change, it's more like the stale pretzel under the couch cushion.

    But, of course, cue the armchair blogging fanatics without a formal science education, waxing poetic about the infinite power and glory of x86 hardware running clever open source software. Maybe we could do it in perl!
  • Sarcasm? (Score:2, Insightful)

    by Anonymous Coward
    Good use of $1 million?
    For something that would be worth hundreds of times that in the form of a finished product, I would hope so. The only dispute might be that the researchers' efforts would be better spent on other things.
  • On the one hand, it is obvious how much more efficient this would make our day-to-day tasks. Being able to "jot" notes with speech instead of writing, schedule tasks in seconds, the list goes on and on...

    This is certainly beneficial... but think about the impact on the economy! Imagine all the "Administrative Professionals" who could, almost instantly, be out of work. I for one would rather pay even $5,000 for a good piece of software to take all my notes than pay a secretary $28,000/year or so.

    Then

    • Agreed. Secretaries are needed to do paper handling, take calls and filing too. A business that prides itself on professionalism and service would IMO not rely on short cuts like the voice mail maze. So they aren't just a personal refresment gopher. Any business should still need that sort of thing.

      So what if dictation is taken away from secretaries, they still need to check the grammar and arrangement as dictation is almost always free-form without the same structure as a good written letter.
    • you're not looking at the big picture.

      this will be the next thing to cause a wave of "they're stealing my Intellectual Property!" screaming .

      Imagine, this Enabling technology to let a student sit in a class and get a perfect transcript of the lecture, then he post's it to the P2P app of that day and that professor has to start eating Ramen in his cardboard box because he is poor now.

      Ok sarcasim aside, you will hear of people screaming about this, we will have a new type of blog filled with nothing but te
    • Actually I think you're sailing close to the Broken Window fallacy.

      I'd rather look at it this way: all the people you identify - and more - will have a lot of the drudge taken out of their work and more time to do the things they are indispensable for: editing, collating material/resources/ presentations and so on - the 'added-value' bit. Oh, and even if the voice recognition works very well, there will always be a role for touching up grmamar and structure, just as one has to with OCR.

      OTOH, my secret f

  • by tcopeland (32225) * <tom&thomasleecopeland,com> on Tuesday September 14, 2004 @11:00AM (#10246031) Homepage
    ...and view the printable version [scienceblog.com].
  • by MankyD (567984) on Tuesday September 14, 2004 @11:00AM (#10246032) Homepage
    I'm curious to see if their research will improve Natural Language Queries, as opposed to just improving speech recognition. There is an important difference between having to say: SELECT name FROM users WHERE id=12345 and saying: Pull up the name of employee number 12345.
    • by Masker (25119) on Tuesday September 14, 2004 @12:23PM (#10247043)
      Natural language processing and speech recognition are two entirely separate problem spaces.

      Natural language processing tasks involve parsing strings of tokens and mapping them to commands to be executed. So, from your example, "Pull up the name of employee number 12345", the natural language system must map "Pull up" to "SELECT", "the name" to "name", "of employee number 12345" to "FROM users where id = 12345". Really, it's largely a problem of context, and your example shows an excellent problem: the "of employee number 12345" to "FROM..." map requires the contextual information of where to pull this information from. Surely multiple tables of a database could have an "employee number" field in them. Do you want all of the tuples which matches, or just from a certain table? Now, in the context of looking up a bunch of other employees, maybe I know what table you've been hitting a lot, and can determine what you're asking, but without that context, I have no idea.

      In fact, everyday speech has a lot more ambiguity in it than could be handled without keeping large amounts of state, be it contextual or experiencial/situational. For example, if I overhear two people in a conversation, and the first thing I hear is: "Yeah, but he's been lying all though his campaign, and I for one don't support him," I have no idea which politcal candidate might be speaking of. However, if I saw that person wearing a shirt for a political campaign last week, then I have enough context to make a reasonable guess that he's talking about that person's opponent.

      Speech recognition is a "lower level" than that: it's about matching acoustic information into speech sounds and then using the speech sounds to determine the word that was said. This is a hugely complex task that has a number of unsolved problems (of which these are the 3 that I can think of off the top of my head):

      1) "speech sounds" are fuzzy categories, and are not canonical targets.
      2) salient "features" of phonemes are disputed, contradictory and large amounts of redundancy/conflicting info are built into the speech signal
      3) idiosyncratic speaker-to-speaker differences make the phoneme categories even fuzzier and can complicate the task even for the one speech recognition system that we know works: the human brain.

      At any rate, the problems that need to be solved for speech recognition are not the same problems in natural language processing. While there may be some cross over in pattern-matching, the specifics of the problem spaces make it unlikely that you will get much benefit for NLS (natural language systems) from just making the algorithms faster.

      Which, in fact, is my main criticism of this article: the algorithms that we have now are piss-poor, and making them faster doesn't intrinsically make them better. Unless there's been some huge advance in the field that I'm unaware of, you'd still have to train a SRS (speech recognition system) on your idiolect, by reading some pre-selected passages to it. This model has lots of problems, most specially that it's tailored to an individual. Imagine if you had to have each person that you spoke with read some canned paragraphs to you the first time you met so that you could interact....

      [sorry I don't have sources for all of this; I'm AFB, and I don't have time to dredge up info right now. But, apparently, I have time to write one long-ass entry...]
    • There is an important difference between having to say: SELECT name FROM users WHERE id=12345 and saying: Pull up the name of employee number 12345.

      Yeah, there is a difference, I find the first query much more natural. I think I need to get out more.

  • Speech recognition on a chip, yes.

    But only "silicon" in the sense that every other silicon chip is silicon.

    No magical "silicon" breakthroughs to see here, keep moving.

    -kgj
    • Lighten up, dude. It doesn't matter that "silicon" is a buzzword. The people putting up the money need these annoying buzzwords to understand what they are financing. Considering how much voice dictation sucks (I use it for 99% of my input), it's in dire need of improvement and any buzzword that leads to some scientist getting the money he/she needs to improve it is ok with me.
  • Computer. Computer? Hello, Computer. Just use the keyboard. Keyboard. How quaint.
  • Only 1million? (Score:3, Insightful)

    by Gyorg_Lavode (520114) on Tuesday September 14, 2004 @11:03AM (#10246052)
    Thats impressive for just 1 million, working in defense and knowing our contactors. 1 million dollars is bearly enough to get them to tell you how much it would cost for them to do the initial research to tell you if they can actually build what you want.

    (I did not read the article as it is slashdotted so I am relying on the summary's statement of 1 million dollars.)

  • ...is always underestimate your costs and run over budget later. That $1 million will turn into $1 billion before anything comes of this. Hell, it'll take over a million to get the development organization up and running.

  • by Aggrazel (13616) <aggrazel@gmail.com> on Tuesday September 14, 2004 @11:03AM (#10246063) Journal
    Imagine how much money could be saved if you could *perfect* speach recognition.

    Heck, the hospital I used to work at by itself spent over a million dollars a year on medical transcriptionists ...
  • It is an interesting concept, but do we really need this?

    We already have voice recognition, this tech will just bring it to everything. You can talk to your keys, your toaster, your watch. But will they have anything interesting to say back?

    What would you do if you had 1 million dollars?

    You mean besides 2 chicks at the same time...

    Refer your friends, get a free ipod [tinyurl.com]
  • by L0neW0lf (594121) on Tuesday September 14, 2004 @11:04AM (#10246072)
    I once did a lot of work with speech recognition software, having a former significant other who was disabled. I tested a number of programs, and found the biggest problem to be the wide variances in users' dialects. The programs all have to be trained initially to recognize a single users' voice. This means that a program trained for a Bostonian may not work for someone from Arkansas, Texas, or Louisiana. Also, the programs' effectiveness decreased over time if you did not use it regularly.

    I don't know how possible it will be to make a program that can recognize all English users. Will someone who speaks Oxford English be recognized as well as a surfer from California? I doubt it.
    • I dunno bra, the California surfer accent can get kind of gnarly, there's like, a lot of drawn-out sounds and unnecessary pronounciation in there.

      I say this not as a surfer, but as someone born and raised in Santa Cruz.

      Now it is true that Californians not known for having accents (surfers are definitely known for having an accent - seen fast times? Spicoli's a pretty accurate representation thereof actually) and the californians who don't have a particularly strong accent are the people in the USA whose

  • by GMail Troll (811342) on Tuesday September 14, 2004 @11:04AM (#10246077)
    "People who are serious about software should make their own hardware" - Alan Kay

    This seems like a situation where a hardware accelerated approach is pretty sensible. I'm guessing there is large amounts of signal processing involved in speech recognition. With a custom chip like this it probably helps greatly to offload some of that onto a dedicated chip in the same way as GPUs are used on graphics cards. The only problem I can see is that there might not be much market for it. GPUs have an obvious market (games), but there is less demand for speech processing. Star-Trek style interfaces are nice to dream of but for most common tasks a keyboard and mouse will probably give you a faster and more accurate interface.

    gmail invite [google.com]

  • I see some results. So far theres been quite a few attempts at speech recongnition. Generally they all fall short, they don't like accents, and often mis-interpret. I know because awhile back we looked at something for my grandfather, he can't keep his hand steady enough to write anymore... *shrug*
  • by Threni (635302) on Tuesday September 14, 2004 @11:07AM (#10246106)
    Depends. It's not as good as using it to prevent the deaths of thousands - possibly tens of thousands - of people by ensuring they have clean drinking water and shelter from the elements. But hey - you can't put a price on being able to speak to a computer rather than type when you're ordering a pizza.

    • Think of it like this:

      Researchers put the entire speech recognition process in hardware, on a chip. Once you've got a process on a chip, you can refine it, make it cheaper to produce, less power-consumptive, and smaller.

      Eventually, you can have a speech recognition chip that fits in a solar-powered credit-card-sized form factor, like all those free calculators. If you can re-target it for different languages (different chip-sets, maybe), and design it so the LCD shows whatever was just said...

      Sounds to
    • It would be a good thing for this particular million dollars to be redirected toward humanitarian aid. The economic reality is that people don't all share the same priorities, and different people have their hands on different millions. You can't stop people doing stuff you disagree with, but you can make it easier or more enticing for them to do the things you want.

      The post I found really interesting in this thread was the one saying, go to the Oxfam site [oxfam.org] and see what ten bucks can do. I scoped around the

    • by Armchair Dissident (557503) * on Tuesday September 14, 2004 @12:04PM (#10246806) Homepage
      Every time a dollar value is placed on a piece of research, some idiot comes along and say "Hey! This could be spent providing clean drinking water, and food and shelter", as if only research that directly provides clean drinking water or food or shelter is worth funding. Quite frequently the idiot making this statement is in a perfect position to provide money to ensure that more people have access to these facilities, and just as frequently that idiot isn't doing so.

      I'm sure that when America and Russia were engaged in the space race there were people saying "Hey! This money could be better spent on disaster relief!". And where are we now? Only a few short decades later we have sattelites that tell us where hurricanes are going so that we can evacuate areas and people who would otherwise die surviveWe have a global reliable telecommunications satellites so that disaster relief agencies in third world countries can inform people of what supplies are required, and people who would otherwse die survive.

      Without the massive investment in jet airline technology that could otherwise have been spent "saving the starving", we would not be able to travel to disaster areas within hours of an incident. And so the list goes on.

      If you personally want to see more money invested in agencies that provide disaster relief, or reliable shelter or clean water then you only have to donate to the right charities, and encourage others to do the same. It doesn't take many people to donate out of their pockets to provide $1 million. You can start here [savethechildren.org].
  • History.. (Score:5, Interesting)

    by SillyNickName4me (760022) <dotslash@bartsplace.net> on Tuesday September 14, 2004 @11:07AM (#10246109) Homepage
    During 1994 upto 1998 I did marketign and technical support for IBM's Voicetype Dictation products..

    Initially, doing anythign beyond understanding a few words would take special hardware, but after a bit of 'training' highly acurate and fast speech to text was quite a possibility with a specially developed dsp.

    Then, the pentium class cpus came about, and a p90 could just do the whole thing without the dsp.

    So, now someone is developing a new dedicated piece of silicon for this.. lets see how long it takes for general purpose computers to catch up.

    The issue is not that this is not usefull, but that it either has to keep developing, or offer a somewhat longer lasting price/performance ratio or much better features for a logn time to come.
    • Re:History.. (Score:2, Insightful)

      by giblfiz (125533)
      An excilent point, However if one were to make something along the lines of a PDA or phone with voice recognition the dedicated hardware would stay useful for much longer because you not only need to wait for the CPUs to catch up, but they need to pull so far ahead that they can compete in power consumption as well. (Which may be entirely impossible)

      task specific silicon becomes very useful when you don't have as much space/power/heat-disipation as you want.
    • Re:History.. (Score:3, Insightful)

      by Jeff DeMaagd (2015)
      These chips wouldn't go into a computer, there are numerous non-computer devices that could use good, low power speech recognition.

      Will a general purpose CPU fit or operate in a phone that can be on for a week? I almost never shut off the phone and it still lasts a week, and I don't want to sacrifice that run time for speech recognition.

      Granted, ARM chips are getting more powerful but the power consumption is still a limiting factor for their designs.
  • Better approach (Score:3, Interesting)

    by Lord Kano (13027) on Tuesday September 14, 2004 @11:08AM (#10246117) Homepage Journal
    Using specialised DSPs makes more sense to me than burning up generic CPU cycles. There have been many examples over the years of how a specialized DSP is more efficient and effective for a narrow task than a regular CPU. Look at portable MP3 players. They use tiny specialized DSPs to decode the files in a manner that is much more efficient than using a regular CPU.

    We'll still need to do traditional development to interpret the data from the DSPs. We'll need to parse the output so that we can use natural commands to control devices.

    "Coffee maker, brew 10 cups, strong."
    "Bathroom lights, on."

    Without some manner of AI to interpret them, these phrases will be useless.

    LK
    • You don't even need an AI, just a dialogue tree, a "this word leads to these words" kind of thing, and at the end a command is issued that does something. This is well-suited to your coffeepot example.

      Another approach, perhaps complementary, would be to accept a list of words and do something when you have enough to match one of the stored patterns. Light control is a good example. The microphone your voice is picked up on would provide one keyword, the location. You could override it by speaking the name

  • by MooseByte (751829) on Tuesday September 14, 2004 @11:08AM (#10246122)

    From the blog: ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''

    Like some slight tweaking in order to deploy massive voiceprint-recognition silicon arrays for amazingly efficient automatic realtime conversation transcription and identity determination, attached to Echelon [agitprop.org.au].

    So cool... so potentially evil... head begins to hurt... tinfoil hat burning....

  • by Anonymous Coward on Tuesday September 14, 2004 @11:12AM (#10246153)
    Although $1million significantly can speed things up, this is a pretty ambitious undertaking.

    My Master's research was on implementing machine learning in hardware, specifically support vector machines.

    Now, they have much more money than I did, and probably this will be a collaboration involving many graduate students, but converting complex algorithms from software to hardware is no easy task.

    It is just easier to do things in software, that's why it has evolved. The modular layers of abstraction allow a Computer Scientist working in machine learning or speech recognition to not have to worry about how the underlying hardware works.

    Working in hardware, a lot these issues come face to face. Particularly since you want an architecture on a chip, whereas in a conventional desktop/server system there are resources such as lots of RAM, harddrive space, etc are available and their interconnections have been built and refined over decades.

    Throw in concerns about small form factor, low power consumption, quite fast a lot of unexpected hurles pop up.

    My master's research goal was to produce a data mining/machine learning machine, or at the very least a data mining/machine learning co-processor. In retrospect, that was a very ambitious goal that would require many years of work, probably in collaboration with other graduate students.

    What I ended up doing was just Support Vector Machines in digital hardware. Now granted, there is another aspect to my research that I'm not mentioning here, mainly that I didn't use normal floating point mathematical architectures, but a different innovative logarithmic based mathematical architecture. That in itself was a significant undertaking.

    In any case, this sounds like a great project, I just wonder how much they can do in their (in an academic sense) very small time frame of 2-3 years. Even though a lot of preliminary work has probably already been done just to apply for the grant.

    In any case, it is great to see something like this, something to keep in mind in case I ever go back for a Ph.D.
  • There isn't much overlap, but there is some. Singal processing, the breaking down of the naunces of speach.

    I figure a hardware speech processor and hardware speech synthesis (very very accurate and believable) would have a great use for mankind.

    Imagine how much cheaper sex chat lines owuld be for instance!

    They owuld only need a limited vocabulary, so perhaps the OS IBM stuff would work for now?

    Of course, I bet a patent will come out of this... voice technology that is very realible and very easy will re
    • Imagine how much cheaper sex chat lines owuld be for instance!

      I think for that speech recognition/generation per se would not be enough. The speech must also come with the right tone. I don't think a sex chat line with a monotonous computer voice would be very successful. You'd at least have some simulation of an emotional state in the voice.
      Ah, and don't forget the non-verbal noises ...
  • by Tairnyn (740378) on Tuesday September 14, 2004 @11:19AM (#10246215)
    Once this technology has matured and some more headway can be made in Natural Language Processing, (uncertainty for teh win) we'll be on the cusp of some really excellent improvements in human-computer interfaces. It's becoming more common to see 'intelligent' systems being built to mirror the architecture of the human nervous system. This will be a necessary step to forming a generally proficient AI system. The day a computer can readily recognize you're being sarcastic, it's time to be paranoid.
  • by deathcloset (626704) on Tuesday September 14, 2004 @11:21AM (#10246232) Journal
    This sounds like a great idea. Sometimes a Hammer works better than a screwdriver at a certain task. Not all Jobs can be preformed as well by a single tool or method.

    After all, the human brain has different areas for processing different types of stimuli.

    In fact, some parts of our brain are so radically different they are almost considered brains of their own.

    like the cerebellem; it's often referred to as "the small brain". This controls motor coordination - and in humans allows us to do amazing things like flips, kung-fu, and cup-stacking.

    And forgive me for forgetting the exact names, but the brain has layers as well. the outmost layer being the cortex (where most of the higher-level mamillian processing takes place - correct me if I'm wrong, the frontal lobe is pretty much purely cortical tissue). as you delve deeper you get into the hippocampus and medulla whatever (sorry IANAN I am not a Neurologist) which is where emotion rules - and if I again remember correctly is sometimes referred to as the "reptilian" brain.

    Even the eyes themselves can almost be considered little 'brains' of thier own - considering the amount of pre-processing they do (maybe a co-processor would be more accurate).

    make
  • pr0n? We all know that if there's a pr0n application, then the technology will be developed & shipped 100-1000x faster. Speech recognition + pr0n...
    of course, the obvious control of the system by speech (first steps towards a holodeck), but also you could identify who's in that video by their ... voice!
  • With the advent hardware speech recognition, hardware speech translation is just the next evolution. Imagine being able to go to any country in the world and have just an iPod size device and a bluetooth hearing aid as a translator.
  • you are working a job as customer support. I suspect that this will be used to help replace customer support, or possibly to change the somebodies accent so that they appear from Boston rather than from India

  • Layering a speech-to-commands layer over the current systems is very problematic.
    The Star Trek nonsense of 'computer! get me all the data on ship X'
    [and why does Data talk to the computer, surely he's Wi-Fi enabled ? ] is plainly wrong.

    I found using via-voice and friends physically tiring, talking all day instead of typing is quite draining.

    Now sit yourself in an office with 20 or so colleagues all trying to work - talking out loud all day.

    It's pretty much like touch screens - they sound great until you
  • by Rufus88 (748752) on Tuesday September 14, 2004 @11:41AM (#10246531)
    Now, disgruntled ex-employees won't return to the office to "go postal", so to speak. They'll just run up and down the hallway yelling "File! Exit! No!".
  • ... maybe it'll do the same thing for speech recognition as seperate processors have done for graphics, notably 3d graphics. When I was mucking around with computers as a youngster, I could only dream of the likes of quake3 & doom3. Most computers had a crappy CGA or _whew_ maybe even a EGA adapter on board. GPU's have made things not so much possible as feasible that weren't so before ... maybe a seperate chip for speech processing will have the same effect. I mean, we've been talking about speech reco
  • Carver Mead (at Caltech last I heard) was pioneering work to take neural processes such as vision and hearing, and model them in silicon via custom-fab VLSI circuits. This is a MUCH better approach to modelling these proceses, since your neurons process the information in massively-parallel, simple-cicuit networks.
    The traditional approach was to take a (completely) serial CPU and have it iterate over sampled data using a complex model of the naturally-occuring network.
    It seems like a no-brainer to me,
    • So far, analog neuromorphic VLSI has hit a dead-end in terms of real applications. Also digital signal processing has been speeding up to the point where it can go almost as fast as a lot of the parallel analog models.

      The one exception is that the work on analog retina models lead to the development of the Foveon X3 [foveon.com] technology, which is just packing R,G, and B CMOS sensors into a single vertical column on a chip. But again, the neuromorphic part of the retina model is not the X3 technology, the X3 techno
  • by peter303 (12292) on Tuesday September 14, 2004 @12:07PM (#10246847)
    National Security Agency: "We did, and they are hooked to the national phone system."
  • Live Chat & Search (Score:3, Interesting)

    by LionKimbro (200000) on Tuesday September 14, 2004 @12:08PM (#10246853) Homepage
    With voice software, you can already speak in real-time, conference style. I think Skype supports 5 people.

    With speech-to-text, you could log all conversation to IRC.

    Then you could have search engines that search *all conversation within the last 5 minutes, world-wide.*

    Well, at least all conversation that was okay with being public.

    So you could say, "Show me all conversations that are going right now about Python, [python.org] and immediately find the people talking about Python, wherever they were.

    One step towards the HiveMind. [communitywiki.org]
  • by AndyChrist (161262) <andy_christNO@SPAMyahoo.com> on Tuesday September 14, 2004 @12:57PM (#10247402) Homepage
    As it is, it's a tossup whether I prefer speaking with a machine or a customer service rep in India. Won't take much for a machine to surpass most of them in English speech recognition. (Alright, to be fair, there are some indians I've gotten on the phone who have been at LEAST as good as the typical US based rep. But that's a minority.) Anything to advance the technology.
  • national security? (Score:3, Interesting)

    by bob_jenkins (144606) on Tuesday September 14, 2004 @02:10PM (#10248203) Homepage Journal
    Why are they talking about querying online databases for 911 calls as the national security app? It's obvious the national security app is to translate every single phone call to text and store them (indexed) in a classified database. I've attempted to believe the US wouldn't do this because it's illegal, but I can't manage to suspend disbelief. The only way to avoid this is if phone calls are encrypted and the US doesn't have the keys.

COMPASS [for the CDC-6000 series] is the sort of assembler one expects from a corporation whose president codes in octal. -- J.N. Gray

Working...