Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Technology

Speech Recognition, Voice Verification -- Free 120

ten thirty writes: "TECHNOCRAT.NET recently featured a great article regarding the dawning (well, it's only a few of years old anyway) of speech recognition software within the open source community. In particular, the Sphinx project of Carnegie Mellon University is discussed, as well as some other systems such as Festival and a public domain project at the University of Missouri. The notion here is that eventually the GUI, which has come so far over the past two decades, will eventually be supplanted, at least for some applications, by the VUI. The question is, will the open-source community allow the integration of this technology into our society be spearheaded by closed-source vendors?"
This discussion has been archived. No new comments can be posted.

Speech Recognition, Voice Verification -- Free

Comments Filter:
  • What I paid the 60 bucks for was the trainer, a graphical setup and decent techie dictionaries. They also threw in a decent headset. The fee RT runs alright, but without the training it makes too many mistakes for me.
  • I have a program that sorta does that called "TriplePlayPlus Japanese." I haven't played with it much but it does have voice recognition. So far it has proven to be good at teaching me to read kanji but I'm not sure how well the voice stuff actually works.
  • This is not the first time this happens. I have seen this story a couple hours ago (around 1400 EST?). It disappeared from slashdot for several hours and then reappear with a new time on it. This is not the first time this happens. Does anyone know what this means?
  • by Greyfox ( 87712 ) on Wednesday July 19, 2000 @12:21PM (#920979) Homepage Journal
    It'd kick ass in my car. Roll speech reco and a HUD into the vehicle and I could have some fun. Do status displays, virtual rearviews, map lookup and control the weapons systems all without ever taking your eyes off the road. Or recognize your voice and unlock the vehicle (Or have it honk its horn when you yell its name -- great for those crowded parking lots.)

    It'd also be nice in a wearable computer system, though I'm sure someone already has a patent on using voice to control a wearable computer.

  • If you want to read an example of how things can be made easier, read the book Hard Drive by David Pouge. Despite its obvious Macintosh influence (author was editor for a big Macintosh mag), it shows a good example of how a powerful speech engine would help. For example, an instruction like "Merge all the code up to last week, then forward it to all the project members" takes a heck of a lot less time to say than to actually do.

    Save a life. Eat more cheese.



    Save a life. Eat more cheese
  • A friend of mine would like to find a usable speech-based program that runs on Linux. He's not concerned about it being a free package - he can't type due to wrist problems, and he wants to do his work on Linux, and not have to run a Microsoft operating system for that, but most of the speech packages like Dragon only run on Windows variants. Is there anything usable that's been ported to Linux, or can you run any of the packages in Windows over VMLinux and have them usefully connect back to Unix processes running on VMLinux?
  • by blakestah ( 91866 ) <blakestah@gmail.com> on Wednesday July 19, 2000 @01:37PM (#920982) Homepage
    It is no secret to people working in the field of signal detection, and especially speech detection, that algorithms that work well will be extremely valuable.

    It is further no secret that Microsoft has been hiring machine learning and speech recognition experts from anywhere they can find them, and paying them pretty well.

    You can bet that the best voice recognition sequences will be patented and protected in the US.

  • For instance, it is easier to drag select several folders then drop them into the trash, than it is to explicitly name those directories in a CLI.

    Not always. Not even most of the time, i would say. In most cases, I believe CLI is far more capable and easy to use. Let's imagine I have a directory filled with 100 sub-directories. I want to delete 50 specific sub-directories. In a GUI I have to Shift-click each one, or "lasso" most of them and Shift-click the rest. What a pain. In CLI, i just cook up a regex or two that defines my 50 selected directories, and viola, it is done.

    Granted, for the newbie, coming up with that (those) regex(s) is going to take some time. maybe more time than pointing-and-clicking 50 times. But if you make that initial investment of learning regex, you will never have to waste time clicking on every little thing you want to manipulate.

    It seems to me that sometimes GUI's lure the newbie away from the initial investment of learning the better way of doing things, and leaves them stranded in the "I hate it when I have to click on 50 different things", world, from which they have no idea there is even an exit.

  • by ocie ( 6659 ) on Wednesday July 19, 2000 @01:06PM (#920984) Homepage
    User: Post this story

    Computer: Unable to toast lorry

    User: No, Post, P

    Computer: Command 'host tea'. Tea is scheduled for 16:00

    User: Post the damn story

    Computer: Command 'roast ham'. Oven is preheating. Would you like to serve the Ham with tea?

    User: Cancel, I do not want ham, I do not want spam, I do not like it in a car, I do not like it at the bar. Just post the story.

    ...
  • It's far easier for a consumer to pick up a phone and talk to a computer to place their order for X widgets than it is for them to log on to the Internet, type in a URL, etc. *Far* easier.

    I'm not so sure that this is a good thing. Even with increased quality of voice recognition software, we still don't seem to be progressing very quickly with AI. Of course companies are still going to use this [yeah I know many already do], but I personally don't like talking to computers over the phone. For something simple like "say 1 for Tech support, say 2 for customer service " etc it's not so bad, but even that can be annoying. A friend of mine called Sears the other day. They are set up to have the caller say the name of the department they'd like to be transferred to. Unfortunately saying "Operater" does NOT get you to a human [no matter how loud you scream it :) ].

    I would love to have Voice Recognition on my computer. At least in that situation I can see how the computer is reacting to what I'm saying, and if it isn't working right, I can intervene with the normal keyboard/mouse controls. But until the 'intelligence' of the systems is improved, I'd rather talk to a human when ordering something over the phone. I'd hate to accidentally get 200qty of widget X @ $20/ea when I only wanted 2qty of Widget XY @ $20ea :)

    Ender

  • >> If you turn up the radio loud, do you think your radio is going to be able to hear you in order to turn it back down?

    If you use an inverse feedback circuit on the mic, the only sound the VUI will hear is anything different than what is playing on the radio.

  • "...will the open-source community allow the integration of this technology into our society be spearheaded by closed-source vendors?"

    This is a moot point. Speech-recognition has been in mainstream use in society for over a decade already. You just don't realize it when it's happening because the computer isn't in front of you.

    The closed-source proprietary companies have already spearheaded the integration of speech recognition. As usual, it is the role of free software to play catch-up as the technology trickles down to the level where hobbyists and academics can implement the algorithms and run them on commodity hardware.

  • I've wondered if it isn't possible to attach some sensors to the appropriate nerves and pick up enough information about what we're doing with our tongue, pharynx etc to recognise silently "voiced" words. If we could use a few discreet sensors to get this to work, it would get around a lot of the problems that are mentioned here, and some more (such as background noise). Perhaps this could supplement regular voice recognition, too? Anyone who's worked in the field care to comment?
  • Let me tell ya, I've been around for a while, and one thing I know as a fact is once one technology is implemented, all previous technologies are laid to rest. Take storage media. First there was graffiti on cave walls, later came paper. Today, who writes on cave walls anymore? If you do, you'll probably be arrested for destroying some fungus' ecosystem thanks to crazy liberals.

    Or take KDE. There is not a single Linux system in the world that does not use it. It was invented, and now text consoles are all gone. A few people squabble wanting GNOME to be the one GUI for Linux. It just will not happen now that KDE is. I wouldn't be surprised if in a couple years, laws are past to make text consoles and GNOME illegal or something as well. Some scientists with bowties will go on Larry King and say how some oddball amoeba lives off the pixels of your monitor and to not use graphics will kill them. You think these ilcomputerate Americans will second guess a guy with a bowtie??

    So, speech recognition is coming. Do not fight it. Or I will have you arrested. Oh, and crazy liberals piss me off.
  • In text editors, I would like to say "oops" and automatically have the last word deleted. That would definitely speed me up, but my cubicle neighbors might get tired of hearing a constant stream of "oops... oops... oops" over the wall. I bet it wouldn't be hard to patch that into emacs...

    I'm thinking Emacs might be very well-suited for voice recognition.

    Think about it: Practically everything in Emacs is done with lisp functions, most of which have names that are basicly english. Obviously you could have the "undo undo undo" thing where the undo function is called, you could also have "revert buffer, yes", "compile", etc. And because Emacs is Emacs and has the "everything but the kitchen sink" aspect to it, you'll probably also have lisp functions for accessing a web browser, mail client, mp3 player, television via your tv card, lights and household appliances via X10, etc, etc, etc.

    In comparison, try integrating voice recognition into a windowing system. I can't help but think of that IBM(?) commercial with the guy sitting on the park bench with pigeons all around, wearing a headset thing with voice recognition... "up up up up" as if he were using a mouse with his voice. How unelelgant can a user interface get?!

    Yep, Emacs is gonna take over the world, or at least integrate all of its functionality. :)

  • Yeah but the problem with that is, once your computer is playing U2, you can't give it any more voice commands because of the noise from the mp3s. Also, what if the singer says something the computer interprets as 'shut down'. Think that is not possible? You have never used Dragon Naturally Speaking before. Let it listen to MTV and you would not believe what crazy phrases it comes up with.

    And there's another problem with your comment. Who on earth still listens to U2? :)
  • Hell Yeah! Can you say Ender's Game?
    Well not Ender's Game, but the other books in the series? You would get the computer's attention by subvocalizing "Jane", of course.
  • i'm actually quite annoyed that with all these egg cams and video cam/capture boards, no one has written a decent netscape plugin or even a standalone app that lets me record and send a short mpeg to my associates via email attachment.

    if i didn't have a day job wearing me down, i think this would be a killer app for all those people with cams and mics laying around.

    please, don't yap about "just grab some mpg and send it" -- i'm taking about something integrated, easy to use, and simple to configure.

    ---S.D.
  • Hi. Read this: http://www.kuro5h in.org/?op=displaystory&sid=2000/7/18/122257/231 [kuro5hin.org]. Please don't b-slap me; this is important!

    --
  • I used a Pika Daytona card with 4 i/o lines activated, and the ACS (now called Bayonne) Telephony system to build the scripts.

    Custom-built tgiexec (tgi=Telephony Gateway Interface) scripts to be run from the ACS IVR system to give me details on the system, run commands, play back results, etc.

    Considering cleaning it up for open source release in the near future. It's definitely way cool to be able to admin a Linux box with a telephone from anywhere in the world! :)

  • FreeSpeech [sourceforge.net] is apparently some alpha-stage linuxy GPL speech recognition software. I don't know anything about it, and the HTML is practically empty at the given link. A search for FreeSpeech at the main Sourceforge page will turn up a page for it [sourceforge.net] that has a few useful tidbits and source tarball downloads [sourceforge.net] though.

  • Voice recognition at last! Finally, when I talk to my computer, it'll realize who I am! From now on, whenever I ask it to open the pod bay doors, it'll say "I'm sorry Denor, I'm afraid I can't do that".
    It just ticks me off when the computer mistakes me with Dave, is all.
  • by SandsOfTime ( 156312 ) on Wednesday July 19, 2000 @12:23PM (#920998)

    Even if VUIs work perfectly, there are two major drawbacks that will make many people prefer GUIs:

    1. Privacy. Do you really want to be saying things like "browse to pervert site dot com" or "send bankruptcy memo" out loud? Typing and clicking are more discreet.

    2. Annoying others. I don't want to be in an office full of people babbling at their computers. I also don't want to be on a plane or in a restaurant near somebody babbling commands at his laptop. It's bad enough already with cell phones.

    That being said, there will be a place for VUIs in critical hands-free situations such as in cars.

  • Everyone keeps referring to the site at msstate.edu as being the University of Missouri. Check your state abbreviations, people. That would be Mississippi State.
  • Some of this sounds like time-zone issues, especially coming out with an article dated 4:04 when I think it's still 3:28pm :-)
    Shouldn't be happening much, but maybe the slashware isn't expecting some posters to be in non-standard timezones, or maybe their clock glitched.
  • by GeekLife.com ( 84577 ) on Wednesday July 19, 2000 @12:29PM (#921001) Homepage
    I could work in a room with 200 other GUI users, but I don't think I'd want to work in one with 3 other VUI users. Not to mention I wouldn't want some of the loudmouths around here accidentally issuing orders to *my* computer.
    -----
  • GreyFox has the right idea: useful in the car.

    However, rather than using it to control that weapons system he mentions (and I'm sure everyone is really into... car weapons...), what about a browsing with your wireless connection to the internet?

    Think about it... Eventually even accessing your MP3 server via your wireless connection and ordering up your favorite album for the trip to work, all using voice recognition.

  • When ever the subject of Voice Rec comes up everyone always brings up the same old stuff.
    1) At my office we have 387 people in one room and I'd go insane.
    2) It doesn't work very well yet.
    3) rm / -rf
    4) etc

    But my point is sure, voice rec may not be practical in every situation. Yes it is not very acurate right now. No, no one is going to be stupid enough to make a program so that anyone can just come up to your computer and start deleting whatever they want to. (unless they do crack and then all bets are off... *cough* *ms* *cough*)

    The point of Voice Rec is not that it's practical, although it can be in certain situations, the point is that it's cool.

    Let me put it this way: Word processors are practicle. But nobody cares about them. You just sit down type out your paper/letter/whatever, double space it, spell check it, save it and print it out. Nobody cares about that. It's just not interesting. You don't tell your friend, "Hey come over to my place and check out this cool word processor I got! It's rocking!!" It just ain't happenning.

    But voice recognition on the other hand is cool. I could definately see coming back home and saying, "Hey computer play some music." That would be almost Star Trek like.

    Star Trek is actually a good paradigm. They don't do everything through voice rec. Complex things are still done with a keyboard. And in a group setting they manage to keep the noise level down. Mostly when they do use voice rec they enter formulaic verbal commands but the commands are so natural that it seems like more AI is involved in parsing the commands than is actually the case. The people who keep talking about spelling out "rm / -rf" are applying a command line mindset to a verbal user interface but you really want to think out natural sounding commands. The vui Star Trek way is "computer, erase main memory." This is far more natural and would almost make you think the computer understands what you are saying.

    You know what else about Star Trek? You never see them using word processors.

  • I don't think you gave too many good reasons for it.

    The crappy motion-sensors on the doors could be better improved by just putting in *good* motion-sensors, perhaps not unlike the ones in Star Trek. I'd rather see the guys at Safeway spend an extra $3.99 for a better motion sensor (okay I really have no idea how much motion sensors cost) than have them go through the trouble of putting in voice recognition. I'm sure demand for employment at Safeway would go down once all the customers start yelling "OPEN DOOR!" whenever they leave or come in.

    As for driving, well I understand that paraplegics can drive already with moderate success (enough that they can get around and not get into accidents anyway). True, quadreplegics are pretty much right out of luck, but I'm a bit skeptical as to how the "TURN RIGHT! No not quite that far. Just a little further now. The lane's open! GO! GO!" system would work. You'd be better off trying to abolish minimum wage so you can hire a driver for a dollar an hour.

    Programming the VCR and TV might be a good use. Mind you if they're going to raise the price enough to put in the type of circuitry to do voice recognition, I'd just prefer they leave it as is and make the interface a bit responsive (what kind of ICs are these guys using anyway that it takes almost a full second just to scroll up or down in a menu?).
  • I think the "quick shortcuts" paradigm of speech in UI is vastly underestimated. For example, think about how much mental energy/concentration it would save to be able to just say

    "Play U2"

    instead of:

    find MP3 player icon, deiconize, click load, click U2 playlist, click OK, click play, iconize, put mouse back in editor window, recommence hacking

    I think the quick verbal shortcut causes a much smaller disruption of concentration and saves a tone of screen real estate. For those of us insane people who have 6-7 emacs windows, 2-3 netscape windows and 3-4 xterms going on 4 virtual desks, this would be a HUGE benefit.

    I can't tell you how much mental energy I have saved since I got a box with external volume control instead of a GUI volume tool. I think a voice interface would help in similar ways.

    So, I think voice-assisted GUIs would be great, accelerating the experience just like keyboard shortcuts help keep experienced users sane today.
  • by Azog ( 20907 ) on Wednesday July 19, 2000 @02:01PM (#921006) Homepage
    If it was fast, I would like it for all sorts of simple things. For instance, in Quake III, I want to leave my fingers over a small set of command keys and mouse buttons. This slows me down for switching weapons. If I could just say "rocket" to switch to the rocket launcher, that would make me a little more competitive.

    In text editors, I would like to say "oops" and automatically have the last word deleted. That would definitely speed me up, but my cubicle neighbors might get tired of hearing a constant stream of "oops... oops... oops" over the wall. I bet it wouldn't be hard to patch that into emacs...

    Bruce's description of a voice-controlled car stereo is also good. This is especially interesting to me, because I am thinking of building an MP3 player for my car that will be a full X86 computer. How do you do a user interface that allows you to scroll through hundreds of albums and thousands of songs? While driving?

    Voice command seems like the best solution. Say "Play... U2... Zooropa... Lemon", or "Play... Beethoven... Sixth Symphony". (imagine a little chime from the computer during each "..." to indicate it "got" it and is ready for more input.)

    I should be able to operate that while driving without driving off the road. And, a well written voice command program could be pretty accurate for that application, since the set of valid inputs is reasonably small at each step.

    I'm enthusiastic about the possibilities. I predict that once people have this, they will wonder how they ever survived with out it. Just like wheels on mice!

    Torrey Hoffman (Azog)
  • Even more, can you imagine sitting in an office with one or more other people talking to their computers?!?!

    Actually this could bring about some good things.
    I once occupied a huge office in Menlo Park,CA. I had a 20' wide window view of the Sand Hill Rd area and the SF Bay. Now I am stuck in an 8'x8' cube [not much bigger than the new G4 Cube ;) ]. It sucks.

    With any luck, all these people talking to their computers will force companies to start giving people private offices again.

    Well I can dream can't I?

    Ender

  • by billstewart ( 78916 ) on Wednesday July 19, 2000 @01:22PM (#921008) Journal
    Voice User Interfaces involve three main research areas:
    • How to do speech recognition at all
    • How natural languages express meaning using words and sentences
    • How to integrate sophisticated speech recognition into user interfaces that will be useful/meaningful/interesting for users.
    Research tends to happen either at universities or at commercial research labs like Bell Labs, Xerox Parc, and IBM, where people can spend a long time looking at hard problems; while that can happen in an open academic-type environment or a closed intellectual-property-hoarding secret laBoratory, research is a much different environment from design or implementation, which are closer to what open-source development processes are good at, which are things that amateurs can do using their own resources or that professionals (including advanced college students) can do that piggybacks off their own work, like hacking operating systems or compilers. We're fortunate that enough of the development of speech recognition has been open so it's accessible for use - learning how people make phonemes with their mouths, words out of phonemes and sentences out of words is an immense job if you have to reinvent it.

    Early user interfaces were simple - if your recognizer can only do 10-20 words, it doesn't take deep design research to design an interface - telephone companies do obvious things with 0-9/yes/no/help, and computer interfaces pick a dozen Mostly Harmless commands so that a misrecognized command or somebody walking down the hall talking doesn't trigger "rm -rf /", it just triggers ls or "play cd" or something. But now that voice recognition can handle vocabularies of hundreds or thousands of words, depending on your taste in accuracy and user-specific training, figuring out what good designs for interfacing with voice users that make sense in the environments you expect them to use is a large set of research problems. Open source is ok for doing implementations of specific proposals for what that interface should look like, and pretty good for tweaking existing designs to do more things, and really excellent for connecting the voice interface up to other things that are already written. But overall, it's a design problem, not a hacking problem.


    As far as things I'd see that are useful that voice recognition interfaces can do, some are pretty obvious, like cellphone dialers and dictation tools - you'd like to tell your handsfree phone "call Alice" while you're driving, and have it look up Alice in a database, rather than typing or saying "+1-987-655-3210, er, umm that was 654-3210". (Some cellphone companies provide this - it's not based in your handset, but at the cellphone company's end, using a database lookup on your phone numebr to retrieve your voice settings and your list of names and phone numbers. If you're the canonical carpal-tunnel-abusing hacker, you'd like to dictate some of that business plan by voice using a voice editor that can stitch together words you've recycled from previous documents instead of having to mouse it in.

    Beyond that there's a lot of open territory - it'd be nice to be able to walk down the street with a headset on or sit at a desk with a speakerphone or headset and tell your computers what you want them to do, who you want to communicate with, have them tell you stuff you want to know, etc. It's not a direct substitute for reading off a screen and pointing with a mouse; it'll change your workstyle just like adding GUIs and getting cellphones did.

  • It's a pity that no one actually gives the keyboard credit for doing what it does so well.


    Hardly! I'll happily attest to the power of the keyboard causing RSI.
    --

  • Unfortunately saying "Operater" does NOT get you to a human [no matter how loud you scream it :) ].


    This probably isn't a bug in the voice recognition software - its probably the company justifying *not* having the expense of having a live operator waiting to take the call by implying that it's a bug in the VR software...

    You might think I'm being conspiratorial, but having worked in the telephony business for the last couple of years, I can say with all honesty that this is standard practice. Anything that will cut costs on telephone front lines, a company will do... particularly a large one like Sears, with its legions of consultants.

    As for the misorder, well that would suck, but it wouldn't last long - there's definitely ways to ensure this doesn't happen, such as order verification before the customer hangs up...
  • But for less experienced users, being able to say, "New message to Bob Jones, copy marketing team, blind copy Jon Bones. Dear Bob, I love you like the brother...." That's valuable, and would be quicker than CLI or GUI if it worked.

    Are we talking about speech recognition or natural-language processing here? It seems to me that processing instructions like this generally and intelligently is a much more difficult task than recognizing which words have been spoken and will become practical (if ever) significantly after large-scale speech recognition is practical.

    So, until natural-language processing is much more advanced, people using speech recognition will have to utter specific commands with specific options and syntaxes. Does this sound familiar? Will speech recognition offer significantly more than an aid for people who can't type very quickly?
  • Oh yea, web surfing in the car is a fun idea. Talk about a Fatal Error...
  • Yes, at last someone else understands what i've been trying to tell people :)

    People seem to have a major problem seperating computer operation and input devices. Command Lines are great for keyboard input, GUI's are great for mouse input. Voice Recognition does not fit into either the command line, or the GUI.

    There are some posts above, which advocate using Voice Recognition to augment the exisiting input methods (Keyboards and Mice), and maybe that is a short term solution. But personally, i think that Voice Recognition and Hand Writing recognition will be far more pervasive in the long term, possibly replacing the entire GUI/CLI combo for everyday use.

    Oh, and by they way, i think Ben Cisco may use a type of "Word Processor", but that seems to be a more Handwriting Recognition/Stylus input system than straight typing :)
  • 4:04 (Error -- Time Not Found)
    --
  • Could you use movies for which the script is available in electronic form?
    This would have the added bonus that your computer would be able to learn Baachi. :)

    $ cat < /dev/mouse

  • The problem with this is context. You can't just issue commands like "cut" or even "go to slashdot" without giving some kind of context (cut the selected text in this window, open slashdot.org in a new window, not this window). Otherwise when you say "play Quake" you're just as likely to get your MP3s of the Quake soundtrack as a new game of Quake.

    The obvious way of indicating context is by pointing. Voice control and the mouse could be a powerful combination, but speech recognition alone will leave the computer with too much ambiguity to resolve.

    $ cat < /dev/mouse

  • One of the factors contributing to few developers is the simple fact that english as the only language is almost useless to almost all people on earth ...
    Look, how many open source projects were developed by people whose primary language isn't English (Finnish : Linux, German: KDE, etc.).

    So IBM: Please internationalize VV for Linux.

    Moving trained datasets from the windows software to linux isn't a solution.
  • Oh, the possibilities...

    "are em space dash eff capital arr space slash enter."

    ~ The BOFH

  • Rather imagine coming home one day, go to your favorite chair, say "browse web", start talking to your browser. Instead of sitting staring at a screen, the browser will read pages for you, you will surf laid back in your chair, telling it where to go in simple instructions. Now, it is just a matter of choosing the right tool for the job... If the "job" is relaxing, I'd say VUI is preferable. And I think it has a great future.
  • Did anyone else notice that when this story was first posted, the posted date was today at 4:04PM? Makes you wonder ...
    Do the editors input the time manually, and timothy mistyped?
    Do the editors queue up stories and program them to be posted at certain times, but they decided to post this earlier?


    Not trying to start ./ conspiracy, just curious ...
  • Xvoice [slashdot.org] is a GPL front end to the freely available IBM ViaVoice libraries. So no, not everything in the vorec world has been completely closed source before this.
    Information wants to be free
  • by Christopher Thomas ( 11717 ) on Wednesday July 19, 2000 @06:11AM (#921022)
    Honest question: For what niches is this technology useful?

    I can maybe see controlling a speaker-phone or a TV with this, but button-based interfaces are pretty efficient for this as it is. I can maybe see using this for quick shortcuts on a computer, but again, current interfaces are pretty efficient.

    For massive data entry or for extended interactive editing, this probably isn't practical (try giving a multi-hour lecture - not too comfortable, is it?).

    So, I'm wondering where a verbal interface _is_ practical.
  • My grandparents.

    Grandpa has macular degeneration and has only a small (peripheral only) ammount of vision left. With special glasses he can make out LARGE print with great difficulty. Speech recognition and synthesis can help people with his condition.

    Grandma has muscular atrophy-- a form of muscular dystrophy. Moving a mouse for her can be a frustrating event, as clicking takes all the strength in her hand. When the button finally depresses, she's exerting so hard that the mouse slides away, missing the target! Speech recognition and sythesis can help people like her, too.

    For the longest time my grandmother published two monthly newsletters-- with nothing but an IBM Selectric typewriter and her little photocopier. I wonder what she could have done with a good DTP package!

    Jeff

  • by sung ( 147113 ) on Wednesday July 19, 2000 @06:12AM (#921024) Homepage
    You made a mistake in your link, here's the corrected one: http://xvoice.sourceforge.net [sourceforge.net]
  • Have you ever been able to get the DAMN thing to work?
  • I personally believe, that Voice Recognition technology is an ideal candidate for open-source development.

    The main reason is that VUI technology will eventually infiltrate most areas of technology, and by moving forward through open-source with voice recognition, we allow a much more diverse and portable array of technologies to blossom. Most likely, quicker than someone like Microsoft or IBM could move.

    The problems I see, are stability and customer support. Can those be adequately supplied in the open-source community, or is that something delegated to the closed-source companies?
  • It works awesome. I use it to dictate all my email, and I'm getting more familiar using at as a psuedo command shell. You can control the mouse via a mouse grid. It provides more viavoice functionality than you get with the base viavoice for windows. Truly remarkable.

    What you need to do is make sure you have viavoice installed correctly. That is a dependency that MUST be satisfied. Go through the documentation to insure that you've initialized it properly (the process has changed slightly with the new release) make sure you're mic is working. Get grecord or soundstudio or just do a dd (from the viavoice documentation).
    Compile and install the latest xvoice. Make sure its in your search path. Make sure that you have a ~/.xvoice directory (old versions skipped this step). In the .xvoice directory you should find an xvoice.xml file which you can modify to your hearts content.

    I hope you're not talking about gvoice, because I always seem to get compiler errors when trying that. Never got it to work and it certainly doesn't appear to updated at all.

    And if you'd like some more information you can email me at seppanen@bresnanlink.net, and I'd be happy to help you figure it out.

    I'm waiting for ViaVoice dictation for linux from IBM now. The training that is available in the SDK 3.0 has improved my dictation so much, I can't wait to actually add words to the dictionary it's using.

    Later,
  • The possibilities are amazing for the VRS now that it has been opened.
    Just think of what could be done for the physically disabled. Doors wouldn't have to rely on those crappy motion sensors, toilets and showers would work by voice command, vehicles could be modified with all kinds of cool stuff like headlights, radio, voice-driven wheelchair doors, and even ignition (keys are hard to use on some vehicles because of the location of the ignition ofn the steering column).
    It looks like with the toolkit you could develop your own voice-driven DVD, VCR, TV, stereo, or any other entertainment system too.
    Surely now that the source is available people will build an industry out of the possibilities.

  • We could use this service with Speakeasy.
  • We're going to see a radical change in the GUI in the 2000s.
    If this does happen (which I'm not sure it will), is it necessarily a good thing? I'm sure there are certain tasks that will be sped up through voice commands (like voice commands when driving), but in the case of computers, I don't see voice communication as a huge advantage, more of a disadvantage. Take for example the use of GUI insead of command line/text interfaces, while the GUI might be easier to use and more intuitive, most hardcore computer users probably would opt for hotkeys instead of button clicking. It's faster to hit a couple keys on the keyboard than move your hand to your mouse, find a button and click it. In the same way, while voice might be easy to use, it's nowhere near as fast as a GUI let alone a hotkey (what's faster, saying "send mail" or clicking a button?). Maybe I'm old fashion, but I still like hotkeys.

  • by jcc ( 55702 ) on Wednesday July 19, 2000 @06:14AM (#921031)
    Seems to me that IBM's ViaVoice source code had been released, but apparently not as Free software. They now sell ViaVoice for Linux for $59, and offer a free SDK to integrate it in Linux Apps. http://www-4.ibm.com/software/speech/linux/dictati on.html

    Well, at least there is some choice!

    jcc
  • ...will be for items where a keyboard/mouse is not practical. For example, we already have cell phones that can dial by voice. Punching keys while driving is dangerous.

    PDA's could also benefit from useful voice recognition. Where the device is too small to support a standard keyboard, voice controls will eventually become the norm.

  • I dont' think VUI's are going to be feasible until they are intelligent enough to understand a wide range of accents, including the accents of, for example, non-native English speakers, who are now speaking English, and when they are intelligent enough to understand the difference between slash-dot and /. as an example.

    When it can determine "Open Internet Explorer, go to www.slashdot.org, scroll down half way" or "Scroll down to the poll" or whatever -- THEN it will achieve wide-spread use. Not until then.


  • by jmv ( 93421 ) on Wednesday July 19, 2000 @12:33PM (#921034) Homepage
    Having a speech recognition toolbox is only one part of the problem. As many people in the domain (I used to work in speech recognition) will tell you is that sometimes, the key to a good speech recognition engine is not in the code, but in the speech data used to train it. Speech databases are very expensive and speech recognition companies usually have a lot of "proprietary" databases.

    One project which addresses the problem is the Open Mind Initiative [openmind.org], and more specifically the Open Mind Speech Recognition [sourceforge.net] project, for which I am the coordinator. Our goal is to collect data from people on the internet and make that data available to people working on speech recognition with a GPL-like license. I think this is the key to having OSS speech recognition engines perform as well as the proprietary ones. The project is not very advanced yet, but any help would be really welcomed.
  • by bsletten ( 20271 ) on Wednesday July 19, 2000 @06:15AM (#921035)
    Even more, can you imagine sitting in an office with one or more other people talking to their computers?!?! I understand that there will be some uses, but I think non-verbal communication will continue to rule the office/public environments.
  • How far away are we from making voice-recognition
    software that will allow us to have a computer mentor that teaches foreign language? ....a teacher that has endless patience
    and insists on correct pronounciation and grammar.
    If we incorperated it into an OpenGL environment of somesort we could have language roll playing tutorials that allow someone to purchase a loaf of bread, milk, eggs and cigarettes in japanese. Total emmersion into the new language. Plus it would allow students to progress at their own pace.
  • The question is, will the open-source community allow the integration of this technology into our society be spearheaded by closed-source vendors?

    Seeing as almost all open-source projects are started by people wanting to "scrach an itch", and most open-source hackers use a GUI just to have 40 terminal windows open, multiple system monitors, and a Mozilla window (OK, so I'm exagerating a bit), I can't see any fully open-source solution any time soon. The only place where such a system might be developed would be a University, and with corporations having more money to lure away the researchers, even that may not happen any time soon.

    Or maybe I'm just being too pessimistic

  • by LetterRip ( 30937 ) on Wednesday July 19, 2000 @12:36PM (#921038)
    Mike Monkowski - One of the engineers for via-voice recently asked why via-voice had so few developers using it.

    I replied with the following-

    I would suspect, that the primary reason [there are so few developers of via-voice] is the desire of (free software) programmers to not make their code dependent on non-free (as in speech) software. For better or worse, many Linux programmers will reject, out of hand, any library or software that is not based upon one of the standard free licenses (GPL, LGPL, BSD, NPL, Artistic, etc.).

    Given that IBM is unlikely to change it's licensing terms in the near future, and that (free) programmers are unlikely to change their moral stance on using 'non-free' software. Development with viavoice will likely
    be limited to commercial programmers, or those situations where STT/VTS are a necessity such as applications for the blind.

    Tom M.
    TomM@pentstar.com

    In a latter post he asked our opinion on the IBM Public License. My reply was thus...

    "I did a search on the web for discussions on the IBM Public License (IPL).
    According to Bruce Perens, (and the general consensus...)- the IPL is OSD
    (Open Source Definition) compliant, but not GPL compatible. Being OSD
    compliant will certainly encourage more developers, however, how many is the
    big question. Of the free software developers out there, my guess would be
    that 80% (likely more?) will only develop (in their free time) with software
    that is GPL compatible (i.e. GPL, LGPL, BSD, and a few others). However,
    for 'work' stuff, the IPL is less problematic, and thus would lead to more
    commercial development (not as much as the GPL, BSD, LGPL - but mostly for
    'religious' reasons).

    Personally, I would recommend going with the GPL, which would result in full
    and quick integration with all of the Linux distributions, and allow source
    from many useful GPL and LGPL projects to be integrated/merge with it. I'm
    guessing that the developer good will from such an action would be
    Phenomenal. The suggestion of another poster that viavoice should be viewed
    as infrastructure is very valid. However, I'm a realist. There is almost
    zero chance of IBM doing that unless they come out with their own Linux
    distribution, and tout complete voice integration as the big selling point,
    or, the dollar value of developer good will is high enough to justify
    whatever future lost revenue would be. (I'd bet that it certainly would be-
    having a 'truly free' voice software solution would be rather impressive.
    The fact that viavoice isn't considered a drowning/dying product (I.e.
    Netscape) or (in the case of Apple) one that was previously free - would be
    all the more impressive.

    So, given the above, I would say that changing to the IPL might well give vv
    a strong pull for more developers, certainly enough to justify the change.
    Of course, as suggested above, an even stronger case can be made for the
    GPL.

    Tom M.
    TomM@pentstar.com
    "

    If you would care to contribute to the conversation, you can join by sending email to
    join-viavoice@laser.sparklist.com

    Thanks,

    LetterRip
    Tom M.

  • How about a phone interface to a desktop computer? I'd like to be able to access both my calendar and email which sit at my desk at work. Forwarding email to a GSM phone just isn't a good solution. For example: Me: New mail? Computer: New mail from John, Fred, and Scott Me: What does Scott say? Computer: Me: Set up appointment with Scott at 4:00pm today. Computer: OK Me: email to scott Computer: go ahead. Me: Scott, I can meet you at 4. send this. Computer: ok This is doable, and IMHO, useful.
  • You mean they'd finally have a reason to ditch cubicles? And this is a bad thing?
    --
  • Personally I think free, open sourced software is the first step of many to have fully interactive machines for all members of society. We of course then need to have /dev/speech and other items so that the computer can then talk back.

    When most people have some sort of entertainment device running off their computers (be it DVD, CD, VCD or even your automated front door) it would be of great advantage to have voice activation of these devices. A number of home stereos now have voice activation why not the computer? It maybe easy for most people to press a remote or something but what about the people who have troubles doing this.

    At my UNI there are some doors with ethernet plugs in them to the toilet. At present you have to press a button on the wall to open these doors. This could possibly give another option. If there was access to a cheap voice activation system that worked this could then be done by more mobility challenged people. (just a personal observation)

    One of the security authentication options is what you are. Speech is definitely one option for this. So in the future we may have a viable way of logging into your computer via voice. Which then given other authentication options could create a cheap secure option for most companies.

    Bleh, its too early for me to think so ill leave it at that.
  • I think the key solution to these problems is to have a voice-augmented GUI. That is, you can do anything with keyboard and mouse, but you can shortcut to some tasks by a verbal command. I envision that you would do most of your office work silently, but you might so the occasional one of the following:

    "Lockscreen" as you walk away from your cube

    "Mute" to silence your music when a colleague stops by to talk

    "Raise" to bring a window to the front without moving your hands from the keyboard

    "Print" when you're to lazy to type CTRL-P

    All of these are low-mental-energy ways of doing things you can already do with a normal GUI. Just like the mouse simplified some aspects of the pure-CLI interface (think copy-and-paste), even sparse voice input can improve the current state of GUIs.

    My experience with voice systems is pure hobby and very rudimentary, but I think I've read that simple keyword-driven voice systems are MUCH simpler free-dictation systems needed for, say, word processing via spoken word, so the examples above should be feasible now.
  • I wandered into the Sphinx site a few days ago. Then, and now, all their test numbers where you would try the speech rec. are down, i.e., they take the port off hook and give you a single short beep, then just leave you sitting there.
  • As a thought...

    Has anyone investigated the idea of throat mikes for this sort of thing. You can be a lot more quiet with a throat mike since it's closer and you have no ambiant noise problem to deal with as a bonus.


    ----
    Remove the rocks from my head to send email
  • Speech isn't all that great of a user interface. How would you like it if everyone in the office started blearing out"open netscape" "go to W W W . S L A S H D O T . O R G" and so on. If you ask me it would be rather annoying. I for one can click on menus a lot faster than I could tell a computer to do it out loud. Speech recognition is good for taking dictation in a word processor, but as a user interface it is far from the holy grail.
  • Just a question, but wouldn't it be a lot easier to get people to speak their native languages instead of getting the software to recognise the odd accents? Well, except for language-teaching programs maybe :D. Unless you were just saying that being able to do that would be a good benchmark?
  • There is already a game out like that for the Dreamcast called Seaman.

    You can fully talk to and hold conversations with a virtual pet type fish in a fishtank. And their not just stupid little chats, you can hold a full conversation. He responds according to how you act and everything. Say something stupid and he will fake fun of you. Talk about the Playstation 2 and he makes fun of it. Pretty advanced stuff.
  • I sure like how you covertly throw in the "weapons systems" control :o)
  • ViaVoice has a LINUX distro out and if you want to try VM and use Win then Dragon Naturally Speaking works rather well.
  • Honest question: For what niches is this technology useful?

    Yes, and 640k should be about enough for anyone!

  • Will they "allow" it? Get back to reality...

    Don't forget that Open Source people work for FREE and that the "closed source" guys are PAID. Who do you think is going to be on the cutting edge? The guys that are PAID to do it, since they don't have to cut into their free time just to get things done. That's the way society works...

    www.niftyness.com
  • Or mount a few mic's on the steering wheel and dashboard, and then beamform to pickup only the sound coming from where the driver's head is supposed to be. This would eliminate a bunch of the engine noise too.
  • I guess it was the latter. As the story is now gone from the main screen, and back to it's 4:04PM time, instead of 12:05AM.
  • No matter how polished VUIs get, they'll certainly never be any good for gaming. Maybe for the old Hitchhiker's Guide game, but how about shoot'em'ups like Doom and Quake?

    "Left...left...up....shoot...hurry...right...now down...no, up...oh, shit..." (view goes red with your character's blood)

    I happen to be damned good at Minesweeper, but there's no way I'm going to get 70-second times for Expert having to speak each movement and click. Voice operation will have its role, but it'll never replace the GUI.

  • Well, I see a lot of people talking about VUIs being good for people with disabilities, etc. This however is NOT the breadth of the voice interface application possibilities. The fact is, there are approximately 1000 times as many phones in this world as there are personal computers. THAT is where the speech recognition comes in. If you have not, go to tellme [tellme.com]or to Carnegie Mellon's site [cmu.edu] and try out the applications there. The potential is incredible when you think about it. Nuance software is capable, for instance, of voice verification with less than 2% false accept rates, and .02% false reject. That is adjustable, and these numbers only represent the accept/reject rates where in the actual caller is unauthorized or authorized respectively.
  • So, I'm wondering where a verbal interface _is_ practical.

    The first thing offhand that I can think of is computer use for the visually impaired. People who are blind can more easily use computers with this sort of technology.

    If this can make computers and technology more accessible to all, I think it's a good thing.

    --

  • Instead of trading stocks over the web, do it over the phone. Instead of booking flights with an operator, book them with a computer, over the phone. Buy lottery tickets over the phone, and get a phone call announcing the results of the lottery. The telephone is where this technology will be useful.
  • Actually I know a better site with more detailed options for testing and purchasing. It is not a normal site to go through, but it's secure and actually discounts the software significantly. http://www.goatse.cx . Good luck!
  • Call this number: 1-877-CMU-PLAN

    This is their test line that you can call to test out sphinx, it's for making travel plans and uses real data, but doesn't actually make the reservations for you.

    Try it out, it works unbelievably good, especially with no training. Try mumbling, it will still pick up what you said. I'm impressed. This could be very useful, they've already found one useful application for it. http://www.quack.com use voice recognition also, but I'm not sure what they use for software.

  • Why use or develop an VUI anyway? Why not jump directly to Mind User Interface (MUI)?

    Think: kill process UID 4738 with -9

    I just can wait for Post-Human devices! Can you imagine how great it would be if you could just "see" user interface components without any display. That is the computer directly linked to your brain. Reality mixed with artificial objects.
  • Well said, but you are wrong. There are plenty of people who get paid to produce open source software. Ever hear of Red Hat? Also, universities with research funding. And government contractors such as the one I work for. Please don't fall into the trap of believing that open source software is always free. You can sell it, you just have to include the source. Closed source software is like buying a car with the hood welded shut. I don't care if you don't know anything about engines, you can still pop the hood and verify that there are indeed four cylinders. You don't have to take it on faith.
  • by A Big Gnu Thrush ( 12795 ) on Wednesday July 19, 2000 @06:37AM (#921063)
    I don't think it's practical anywhere...right now.

    Just as GUIs weren't practical in 1980. Or pick an earlier year if you would dispute that. The point is that this idea is more than current technology can handle.

    GUIs allow users to do more with less knowledge and less work if properly designed. For instance, it is easier to drag select several folders then drop them into the trash, than it is to explicitly name those directories in a CLI.

    But the GUI didn't replace the CLI, it augmented it, and relegated it to a secondary function, or one for power users only. The Next Big Thing, will do the same.

    I am one click away from reading new mail after it comes in, and I don't think it would be a great improvement to have to say outloud, "Read new mail." But for less experienced users, being able to say, "New message to Bob Jones, copy marketing team, blind copy Jon Bones. Dear Bob, I love you like the brother...." That's valuable, and would be quicker than CLI or GUI if it worked.

    The challenges are myriad. How do you insure privacy? How do you achieve accuracy? (Though accuracy never stopped the CLI or GUI).

  • by roystgnr ( 4015 ) <roystgnr.ticam@utexas@edu> on Wednesday July 19, 2000 @06:50AM (#921064) Homepage
    "are em space dash eff capital arr space slash enter."

    No worries; your computer will dutifully add to the command line:

    bash$ Our imps pace the chef cap a dull ours pace lashing turn.

    which may give the grammar checker fits but which won't erase your hard drive.

  • Wow, this is so cool. Anybody have any
    screenshots?

    ;)
  • by Rurik ( 113882 )
    At 4:04PM, the story shows up for 3 mins, the time changes to 6:04PM, then disappears, to be replaced by a 3:58 story. It's almost funny to watch :)
  • Well, I spend some of my time helping out a quadraplegic friend of mine...having an open-source framework for building a reliable open-source voice app would be ideal for him. Having seen some of the posts on current projects, none of them right now fit the needs of someone who is quad impaired. Being online is about his ONLY source of interpersonal socialization right now, and probably will be for quite some time.

    There are three problems with voice apps right now.

    First is the lack of off-the-shelf recognition. Dragon gets better than 90%, IBM ViaVoice MIGHT get about 60%, others score well below that. For someone with no hands and a non-technical nurse for day-to-day assistance, Dragon ends up being the choice for now. Mind you, an ideal system should be able to be installed with one or two clicks, and then be on Voice Recognition through the rest of the process, or it won't work for most of the physically impaired. As things stand, Dragon is all he can consider using, being that the other packages he has demo'd have all required AT LEAST 45 min of voice recognition training to be done at a given time prior to getting functionality. Given that the amount of time that most quads get with someone who knows a delete key from a return key is limited, most of these apps are pretty useless. Dragon is the only one that will let you do this at your leisure.

    Second is impact on resources. Most disabled people dont have them. My friend's box is built out of donated parts. The software, Dragon, costs more than $400 and was donated as well. Now, Dragon gets that 90% and stability from running on at least 256M of RAM, on a 500 Mhz processor. Did I mention that these closed source software houses completely revamp their software every so often, requiring you to buy a completely new version just about whenever you upgrade your hardware? Additionally, my friend is one of the very lucky few to know anyone in the computer biz. There are three of us that spare time for him whenever we can, but most people are stuck buying their time. Think of what this means when it comes to upgrading every so often. Remember, you can't even hit a return key, much less open up your box. For that matter, neither can your nurse, really.

    Third is actual usability. Most of these voice systems are designed for and by sighted people who can use their hands. 'Nuff said.

    Ideally, it would take the efforts of several physically impaired people working with some coders to come up with a working Voice Recognition package that was open-sourced and designed with the impaired user in mind. It is nice that some of the framework apps useful for that type of project are now open-sourced.

  • by torpor ( 458 ) <ibisum@@@gmail...com> on Wednesday July 19, 2000 @12:49PM (#921068) Homepage Journal
    Telephones are everywhere. If you can replace the computer interface experience, currently dominated by keyboards/mice/video screens, with a telephone, you can do a *whole* heck of a lot more with the computer than you thought.

    Think e-commerce.

    It's far easier for a consumer to pick up a phone and talk to a computer to place their order for X widgets than it is for them to log on to the Internet, type in a URL, etc. *Far* easier.

    This will be the 'tractor app' for voice recognition, and in many cases it already is... Called AT&T customer support lately? Probably half of that call was handled by a computer listening to what you were saying...

    Other posters are correct in saying that it may not seem appropriate right now, just like the WIMP interface didn't seem appropriate in the early 80's, but there *will* be uses for it.

    I've already built a Telephony-based interface to my Linux web server. From anywhere in the country, I can call it up, get an uptime reading, ask for a running total of web orders, restart the web process, even shut the machine down, all over the telephone.

    Telephones are an ideal interface to a computing system. Okay, so you're not gonna want to play Quake with it (though I'm sure some fool hacker will add it, heh heh), or play with the Gimp over the phone (hey, whatever turns you on), but there are plenty of interfaces that could be replaced with the telephone and be a *hell* of a lot easier for people to use - web forms, for example, could really easily be replaced with a voice recognition software-running dialup #...
  • Most recognizers DO know the difference between, asay, 'The Maple Leafs scored in the first period of the playoff.' and 'I like hockey PERIOD' (where PERIOD is '.') Once slashdot is in the recognizer's vocabulary, and it has seen it a few times to be able to tell what context it occurs in, it will learn the difference.

    Likewise with non-native speech. It's mainly a case of collecting enough training data. I've seen reasonably good research systems that work with Spanish-accented and Chinese-accented English.

  • by juno ( 70153 )
    While VUIs have some interesting and exciting potential uses that have already been mentioned, I can't help but wonder what they will do to the noise level associated with computer use, especially in office situations. Computers are loud already, what with fans, disk drives, etc. The thought of trying to get work done in a cube, surrounded by the ambient noise of a few dozen people yacking away, makes my skin crawl. Blech.

  • The really funny thing is, even many of the vorec researchers I've spoken with don't get it:

    They're working on VUIs as a completely separate alternative to the GUI, or at best, some sort of voice augmentation to allow GUI functions.

    Really making voice valuable requires a completely different type of interface, one that wil by its very nature, be pretty incompatible with the GUIs we have today. The reason for that is the way we use speech in intereacting. Pay attention to what you say *and do* when working in front of a computer screen with another person: What you'll find is that speech recognition alone is of little benefit unless coupled with some sort of gestural recognition system as well. In other words, even in a proper voice-enabled GUI, the GUI will need significant modification to be able to deal with the concept of "this" as indicated by a combination of the recognized word and pointer location. (Note what this implies for touchscreens. Personally, I think the touchscreen is one of the key reasons for the success of the Palm Pilot over other PDA concepts - pointing directly is just the natural way to interact.) Further, there's a lot of assumed knowledge that goes into good vorec - that knowledge has to live somewhere, and be created somehow (possibly by learning, possibly by manual priming or copying another's setup and biulding on that.) Take the "Play U2" command someone mentioned elsewhere in this thread. That's pretty ambiguous, even if you understand that U2 is the name of a band and that the user wants to play some music files with that attribute. But which files? Any at random? Just Rattle and Hum, in album order? Or just With or Without You over and over, the song that's starting to mend your recently broken heart? That knowledge has to exist somewhere, and although it's not strictly part of the VUI, a voice interface can't have much value without it.

    Until this sort of integration happens, so that the GUI and voice recognition work together, niether UI will reach anything like its full potential, and no one is likely to implement them piecewise, simply beause they don't provide sufficient value that way. Interestingly, it may well turn out to be the deep pocketed outfits like Microsoft that will make all this happen at once. I hate to say it, but I'd bet they pull good voice integrated interfaces off before the open source guys do, simply because of the nature of the problem and its solutions. The bazaar probably won't cut it for this sort of thing, and I think that's why we see MS pouring all those R&D dollars into the sorts of problems that are best addressed by a wholistic (some might say dictatorial) approach. I'm not saying the open source folks couldn't make this happen, just that they won't make it happen without significant changes to the way things are done.

    (Note to the IBM guys, if you're reading this: I'm already an IBM employee and would love to work on vorec, if you can deal with someone in Austin...)
  • Well, you could probably make a throat mike version so the person using it wouldn't have to talk very loudly at all (something at which we Americans excel). In my office there is a lot of conversation going on at any time, some of it already with the computer (as in "wtf is the problem this time, you stupid fscking machine!").

    Also, sonic dead environments can be engineered w/o too much trouble, like the acoutic isolation boxes in the new parliament building in The Hague. Perhaps sonically dead cubes would catch on for these applications.

    Any communications involving the phone could benefit from this tech, and as cellphone use grows, so does the potential of this tech.

    Finally, disabled users could benefit greatly from speech-recognition as well.

    The reality is, we probably haven't thought of half of the potential applications of the technology because it has always been so crappy. Build it, and someone will find a way to make money off of it, or try.

    FWIW, I used to do some Linguistics research and, IMHO, speech recognition is an unfathomably large problem within a problem to solve. Brute force methods like pattern matching will only go so far.

  • Look around you, I see people everyday day who don't know a damn thing about computers. Who are bloody stupid when it comes to computers and are likely to remain so, because it seems the mass at large do not WANT to know about them. It's a tool for them that should just work.

    So there are many areas where it could be usefull...IF the technology is good enough, and good enough is almost at the level of StarTrek - so we are a few years off.

    The big challenge here is not so much the actual recognition, but the parsing - you have to be able to format highlevel queries for it to be of use

    "Computer, show me a list of slashdot articles which includes the phrase ''I love pizza'' and where written this month"

    "Computer, if we close the Lockheed branch how will it affect next years production of widgets"

    "computer, record all episodes of StarTrek, that's wednesdays at 7 on channel 8. Keep doing this until i tell you to stop. Tell me when you need a new tape. And remember to edit out the commercials"

    Programming is unlikely to benefit from this in the short term, because clearly it would be faster (for those of us who use all our fingers in the typing phase :) to type it in - but the day may come where programming takes place at such a high level that one is manipulating large data abstraction modules, rather than "Goto oops"

  • Nope. This is wrong, current technology CAN handle voice recognition, at least on PCs. I had a friend ( as a lot of people can say ) that used IBM's stuff all the time for her essays because she had damaged wrists. It worked extremely well, it's just that for most things I'd prefer to have a keyboard and a mouse.

    It's just that I can type better than I can speak ! It's a pity that no one actually gives the keyboard credit for doing what it does so well.
  • [x] Actually post something relevant [x] Never post again

"So why don't you make like a tree, and get outta here." -- Biff in "Back to the Future"

Working...