Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Technology

TuVox Voice Interface 120

pablos writes: "NYTimes has an article about Tuvox who set up Handspring and Activision with voice interfaces for tech support. Apparently they can do away with the annoying 'press # now' menus. I've used things like TellMe, which played an ad everytime it didn't understand you, but I'm wondering if this sort of thing is starting to work anywhere. Anybody called Handspring for tech support lately?"
This discussion has been archived. No new comments can be posted.

TuVox Voice Interface

Comments Filter:
  • by Chuck Chunder ( 21021 ) on Wednesday February 13, 2002 @07:33AM (#2999252) Journal
    I wonder....
  • AT&T has a similar service for their "easy reach 800" customers, you can speak your 5 digit combination, or opt to speak to a representative, all without the keypad. Pretty basic, but it's been around for at least 4 or 5 years.
  • by JoeShmoe ( 90109 ) <askjoeshmoe@hotmail.com> on Wednesday February 13, 2002 @07:36AM (#2999255)
    I noticed starting about two months ago that whenever I called the main number for AT&T Broadband, I would get the message:

    "For digital cable, press or say 1" etc.

    A lot of times to avoid complicated and looping voicemail, I just don't press anything to fake like I have a rotary phone and get transferred to the first available agent.

    Well, that trick is no more! Since even rotary phone users can say their choices, not doing or saying anything disconnects you. Pretty crafty.

    - JoeShmoe

    .
    • Orange UK have had an optional service for mobile phone voicemail which is entirely voice activated. Not just the "menu" - no more 1,2,3,#, etc but real commands. And it even tries to recognise the voice of you and your regular callers and talks to them by name after they leave a voice message for you (asking extra options).

      Trouble is the time when you most need "hands-free" is when driving, but the background noise makes it difficult to use (unless your in some fancy limo).

      Now a HAL style lip reader ... that would be something!
      • I assume you mean Wildfire? Never have I laughed so much as when a friend of mine in a pub attempted to manage his voicemail. It must have taken 10 minutes to delete a message 'THROW IT AWAY!' Oh god, he looked such a fool.

        And voice activated dialing - same person (this time at a club) tried to voice dial another friend - ended up calling his parents at 2:00am. They were not happy bunnies.

        In the club this could be expected, but the pub was not too loud. The technology that Orange is using for Wildfire is just not up to scratch for normal use.

        PS. There are some interesting 'features' in Wildfire (these phrases will not be exact, but play around with them): 'Do me a favour' gets the response 'What kind of favour?' you can then say 'I'm feeling depressed' which gets the response 'Why don't you tell someone who cares' or 'What does a cow say?' which gets the response 'MOOOOO!'
    • Xerox also does this, at least on the support line for their copiers. It's fairly slick and was able to extract the english out of the horribly mangled "english" that I talked to it in :). I had to read the machine a string of numbers and it picked everyone of them up correctly.
    • fake like I have a rotary phone and get transferred to the first available agent. [...] Well, that trick is no more!

      Simple solution.. just use gibberish. works for me :)

      To sign up for new services, press or say 1 ... To find out about existing...

      brr?

      I'm sorry. I do not understand that command. Please try again. To sign up for new...

      ack!

      I'm sorry, I still can not understand your command. Please stay on the line and one of our operators will assist you.

      • I'm sorry, I still can not understand your command. Please stay on the line and one of our operators will assist you.

        Wouldn't that be a great job? Spending all day handling complaints / fixing problems for people who don't properly pronunciate and articulate.

        Sounds like my old tech support days. :-)

    • If you think that's impressive, try United Flight Information, Airtran FLIFO, American FLIFO, or Continental FLIFO. Amtrak, Thrifty, HeyAnita, TellMe, Audiopoint, AOL, and many others all have more impressive systems than the one you describe. "Press or say" is child's play.

      Systems right now do ok in noisy environments, but they're improving all of the time. Just think of where text-to-speech was a few years ago, and look where it is now... recognition will catch up.

      Todd
    • UPS has it too, when you're doing package tracking - it apparently understands both letters and numbers, also. Pretty slick. Too bad the rest of the service sucks ass :P
  • by The Smith ( 305645 ) on Wednesday February 13, 2002 @07:42AM (#2999264) Homepage
    Please insert all hilarious voice recognition puns in this thread....

    "Thank you for calling 999, which service do you require?"

    FIRE!!

    "[pause]Your request has been passed on. In order to optimise future use of this service, please repeat the following list of words in a steady voice: cat, dog, bar, sky, foo..."

    • Bender: Listen, buddy, I'm in a hurry here. Let's try for a twofer. Hehe. Suicide Booth: Please select mode of death. Quick and painless or slow and horrible. Fry: Yeah, I'd like to place a collect call? Suicide Booth: You have selected slow and horrible. Bender: Great choice!
  • by coupland ( 160334 )

    >I'm wondering if this sort of thing is starting to work anywhere

    Voice recognition works great in real world applications. Directory assistance in the city I live uses voice recognition to find out what language you speak, the city for which you want a listing, and it can even do voice recognition on common businesses. (No doubt for a fee) All without any operator intervention. It's pretty cool.

    • I got one of those e-mails a year ago that offered free magazines and decided what the hell! 2 free magazines for a whole year, not bad, just need to call a certain 800 # to cancel them or they'll bill your credit card. So I called a few weeks ago. Lo and behold it was totally voice automated! Took my 12+ digit ID with no errors. Recognized my saying the full name of a magazine for the correct abbreviation on their list of publications. Would repeat anything it just said by simply saying "repeat". Understood "yes", "no", and "correct". Actually sounded decent, not at all like those services that have pre-recorded phrases and it has to fill in certain blanks. This sounded natural! And it all worked on the first try. Two magazines canceled and the only buttons to push were the 800#. I was impressed. Werelock
  • It can be done (Score:2, Interesting)

    by Spooker ( 22094 )
    Last year the company I work for got into a project that used a Cisco 2600 with VOIP module and a product from IBM that allowed you to interact with a website via the phone...we used PHP to create the VoiceXML documents to drive the voice menus and we were scraping the data from a local site that had weather and traffic info on it...worked pretty well considering that it was also done in German :)

    I would have to agree that the technology is getting closer to replacing human beings...maybe I should go check my retirement plan now ;)

  • SJ's system (Score:5, Funny)

    by Mike Connell ( 81274 ) on Wednesday February 13, 2002 @07:49AM (#2999274) Homepage
    The train company in Sweden has one of these systems. It's always amusing listening to my other half battle with it when she wants to buy a ticket:

    OtherHalf (in very clear voice): Stockholm
    Computer : click, click,... Kiruna!
    OtherHalf : Stockholm!
    Computer : click, click,... Moscow!
    OtherHalf : Stockholm!
    Computer : click, click,... Alpha Centauri!
    etc...

    To be fair, it does eventually work, it just takes a while. It probably also takes less total time than the alternative (short conversation with a human, but a long wait to get to talk to them).

    The best thing about them was a recent radio program. They had done some reseach to find out what words sound (to the system) like destinations. During the show they'd phone SJ up and say things like "I want to go to FsckingBastardVille", to which the computer would reply "Northern or central Stockholm?" and other such amusements.

    Hours of fun :-)

    • A friend and myself tried a similar thing with my ericsson t29. Has a simple from of voice recognition, just comparing a saved name with the name you just said. Anyway, we tried all these abusive words, and it would either ring my home number, or my Dad's number... what can you do?
      • When I lived with three of my friends while in college, we signed up for phone service and got the full range of services free for the first three months or so...including voice dialing. However, it always worked best (sometimes only worked) when the key word(s) for a number was followed by 'bitch'; e.g. "pizza shop, bitch!"
    • I sypathize with your "Other Half" - I've had a similar experience with American Airlines' flight information service. You call it to find out when a flight is arriving. My conversation went something like this: Computer: Where is your flight arriving? Me: Dallas Computer: When is your flight arriving? Me: six p.m. Computer: When is your flight arriving? Me: SIX P.M. Computer: When is your flight arriving? Me: six o'clock Computer: When is your flight arriving? Me: six etc.etc.
    • apparently your Other Half should have sent "FsckingBastardVille" to go to Stockholm...
  • by Anonymous Coward
    We inquired with TuVox how much it would cost to set up a solution for our level 1 help desks. The cost was mind-boggling. So, we trained one monkey for each group of tech to answer the phone and AUTOMATICALLY READ FROM A SCRIPT!!! Can you imagine? A revolution in help desk support. The script includes such high-tech TTS sounding shit such as "Press 2 for Customer Service". Then, in our mind blowing second step - we trained the monkey to pick out DTMF tones BY EAR ALONE!!! So our customers hit 2, and the monkey transfers then to customer services. Truly the wave of the future.

    • The only problem was, the monkey kept trying to throw poo through the phone.
    • Sounds like part of "Demolition Man" where Rob Schnieder is sitting at the San Angeles police desk and starts doing that when answering the phone.

      I think I laughed more at that one throwaway scene than any other Rob Schnieder movie :)
  • In a way, this is sad. Helpdesk functions have been a way for new people to pick up experience for technology-oriented jobs. As these functions become more and more automated (and less helpdesk-people are required), it will become harder for people to use a helpdesk function as a stepping stone in a career.

    I'm enough of a realist to understand that the evolution of swapping jobs with technology is unstoppable but still: With the current recession, that's not really a thing to be looking forward to.

    • Having worked help desk at a gym, computerlab and ISP, the problem is not that simple. The turn around rate for help desk positions tend to be 3 months average. If you figure it takes 1-2 months to sufficiently train a help desk staff, it is really quite expensive and not very efficient. Not only does the HR, supportt and IT department have to deal with the high turn around rate, but finding appropriate people to fill those positions is hard (low paying jobs).

      For example, it costs sprintpcs and att wireless approximately 40/month to service an account and half that is support. In many cases, a lot of specific domain knowledge is lost because of the high turn over rate. Building knowledgebases to maximize the quality of support is a very difficult task. I know some one that built a natural language support system using knowledgebase and expert system shells. It was far from trivial and isn't a full proof solution. In most cases, the actual deployment doesn't lead to drastic cuts in support staff. Rather it allows them to work more efficiently. Usually support calls comes in herds and is totally overwhelming.

      I'm sure there are people that are displaced by support systems, but from my knowledge, it's not as bad as one would guess.

  • I have been looking at a product called InterVoice Brite [intervoice-brite.com] that appears to have a similar function. Not only do they have the software available for use inhouse, but also an ASP offering. From listening to their sample sound files, they are way ahead of a lot of the basic "say or press one" implementations I have seen.

  • This isn't.. (Score:3, Interesting)

    by saqmaster ( 522261 ) <stuNO@SPAMhotmail.com> on Wednesday February 13, 2002 @08:00AM (#2999290) Homepage
    ..exactly new, is it?

    It's been a while since there was really much media hype about voice recognition technologies. Sure, the whole voice activated menu's "1, 2 etc." has been around for quite a few years, but I suppose there is a huge difference between repeating a few numbers than describing technical problems. I mean, is this literally a flowchart menu with various diagnostic paths or does it actually try and understand a sentance? If it's the former, then that is nothing more advanced than what is currently available and probably in use elsewhere.

    I wonder what would be more frustrating, repeating yourself twenty times to a computer to battle through a menu, or sitting for twenty minutes trying to explain your problem to a ex k-mart 1st line support engineer. The choice is yours :)
    • Re:This isn't.. (Score:2, Interesting)

      by Mr. Slippery ( 47854 )
      I mean, is this literally a flowchart menu with various diagnostic paths or does it actually try and understand a sentance?

      Somewhere in between, but more the former than the latter.

      TuxVox uses Nuance to perform the speech reco; I'm currently integrating Nuance into a new product for the division of IBM that originally developed Sprint's "Voice Command" voice dialing system.

      In this sort of speaker-independent voice reco system, you provide a grammar of the utterances you expect the caller to say at each step of the way. For example (in JSGF format, which isn't what Nuance uses but is more BNF-like):

      <command> = [please] (help | [make] [a] (plane | train | automobile) reservation | destroy (<city> | <nation>));

      It's fancier than a menu, but it's far from free-form speech. For example, the above would understand "please destroy Redmond", but not "let's nuke New Jersey". Still, with clever grammars, you can do pretty well.

  • Airlines use voice recognition for flight reservations and confirmations (something like this was actually one of the DARPA benchmark tasks). It works reasonably well. The long distance companies are using it as well.
    • I called up United's arrival/departure information line when I was in the USA. Sure, it's nice, but at least the tech that *they* are using has pretty far to go because it was having trouble understanding my New Zealand accent a lot of the time.
    • This is definitely not new. In fact, there was a company I worked for back in '97 that contracted with Northwest Airlines to do their website support. It was more like doing travel-agent work, and not real tech support, but we also had to do some work with the then-new automated telephone reservations system.

      As I seem to recall, it was using IBM's ViaVoice and interfaced with the reservations system. Since it was a pilot product, it was only offered to specific WorldPerks members (about a thousand). Any time that the recognition failed, it kicked them out to our support lines for a manual reservation. Although, it did get better over time. Sitting at the call center alone at 2a gives you a pretty good feel for call volumes...

      And now Sprint PCS is going the same route with their support, as well. There's Claire, your Personal Digital Assistant. All you need to do is press *2, *3, or *4 on a Sprint PCS phone to get into it. Pretty lame, really, as it just hinders the process of getting actual support. But their process is rather unique in that you can say, "I want Customer Service," and it will forward you to the real support queue. Rather ineffectual, if you ask me. For a demo, you can go to either a local Sprint PCS store or your local Radio Shack (they both have phones available for demo calls, and remember that long-distance is free ;-).
  • Heh... (Score:1, Interesting)

    by $0 31337 ( 225572 )
    Anybody called Handspring for tech support lately?

    That's the good thing about a Handspring.. You have no need to call tech support :)
  • For more than a year now there has been a (beta-phase) phone-number where a voice recognition program tells you the best available train-connection between two cities, at a given time.

    It's nice to realize that they've made an attempt to recognize polite customers: words like "please" are ignored.
    • The great leap will be when words like "please" are not ignored but understood.

      I couldn't believe it when the article ended with

      "We're actually getting people saying thank you -- to this robot," he said.

      He didn't say whether his system answers this with "You're welcome!" The more that systems like these resemble normal human speech, the better. If I use pleases and thank yous, the system should be - I don't know - just a little bit nicer.

      If the system could detect the tone of the caller, that would be even better. A caller who seems frustrated could be transferred to a human. An angry caller could have a different tree of prompts.

      As hard as it must have been to talk to answering machines when they first appeared, it doesn't compare to talking with a computer. Ten years ago we had the technology to interact with our computers with voice, but most of us felt silly doing so in front of others. We still do, in certain cases. This is changing as more of us use these system in public.

      The social barriers to voice human-computer interactions seem to be larger than the technological ones.

      On TuVox's director of worldwide customer relations calling his system "a robot": about 15 years ago I worked directory assistance for the local phone company. At this time they had the technology to let the computer read out the number once we had found and confirmed it. One woman, after confirming the address, added "And don't give me to the robot!" I think she actually pictured me passing the phone to a tin robot beside me. "[whirr] [click] The new number is 555-3111 [bzzzzzzzzt] [click] [whirr] [bzonk]"

      yo
  • Leading Edge NLSR (Score:3, Informative)

    by kelv ( 305876 ) on Wednesday February 13, 2002 @08:28AM (#2999330)
    (Warning I have worked for the following company during my undergraduate EE degree)

    For people interested in seeing how far NLSR (Natural Language Speech Recoginition) can be pushed for specific applications go and look at VeCommerce [vecommerce.com.au] and their demo clips. The betting system I helped build can take betting sentences of over 100 words with 96% accuracy. (Data from a live system with 1200 lines)

    Customers HATE DTMF based systems, this sort of thing is the way of the future.

  • by TekkonKinkreet ( 237518 ) on Wednesday February 13, 2002 @08:33AM (#2999340) Homepage
    ...keeping the customer from costing you any money.

    CRM is *expensive*. Forrester Research did a study a while back on the average cost of handling customer calls by various means:

    Telephone: $33.00/incident
    Email: $9.99/incident
    Chat: $7.80/incident
    Message Boards: $4.57/incident
    Knowledge Base: $1.17/incident

    The technology of this article shifts a call from the top to the bottom of this list. They admit that the advance is not in AI or voice tech, but in making the experience "resemble a conversation". So at its best, this will still let grandma have *some* access to the information she could have had before from a live human. At its worst, it's a puppet show to distract us from the fact that we're not getting very good service.
  • Tuvox system online:

    you: "I have a problem with my handspring treo"

    Tuvox: "I fail to see your problem ensign. It would appear you have selected an inferior technology."

    graspee

  • Bart: [watching Flanders] An ax. He's got an ax! I'll save you, Lisa!
    [tries to walk on his leg, falls back] Uh, I'll save you by
    calling the police. [dials 911]


    Voice: Hello, and welcome to the Springfield Police Department Resc-u-
    Fone[tm]. If you know the name of the felony being committed,
    press one. To choose from a list of felonies, press two. If you
    are being murdered or calling from a rotary phone, please stay on
    the line.


    Bart: [growls, punches some numbers]


    Voice: You have selected regicide. If you know the name of the king or
    queen being murdered, press one.



    Thanks, SNPP

  • Ay, 'tis the grand convergence [virtualentity.com] of TuVox [tuvox.com] and Tellme Networks [tellme.com] and all these other speech technologies [sourceforge.net] leading us inexorably onwards towards the Technological Singularity. [caltech.edu]

    Speech technology [scn.org] for Open Source Artificial Intelligence [sourceforge.net] is now at a critical point, because the free Open Source Robot AI Mind [scn.org] has become capable of immortally self-rejuvenating perpetual mentation [scn.org] and therefore any Linux maven with speech-tech know-how may vie for the distinction of hosting the longest-running Artificial Mind [sourceforge.net] and of equipping the AI with truly phonemic speech recognition and generation.

  • by nsanit ( 153392 ) on Wednesday February 13, 2002 @08:41AM (#2999357) Homepage
    I went to Seattle a few years ago, but my bags didnt. Outside of the Beast being based next door to Seattle, it is a wonderful city. I called the airline (United) and was asked to 'press or say' whichever number was to get an update for lost luggage.

    It then asked me to speak the destination city and the departure city, then asked for the claim number I got when I reported the bags and it would let me know that theyd still not found my luggage.

    This was 2 or 3 years ago and it worked pretty flawlessly, and I'm pretty sure the technology has come along since then too. There were times I had to repeat myself, but that's better than sitting on hold forever just to be told by the person on the other end who's day, in their minds, is worse than yours that you should stop worrying about it and get on with your life.

  • I've used things like TellMe, which played an ad everytime it didn't understand you

    Regardless of whether TellMe or something like this would be able to understand German, I'm still preferring the keyboard which doesn't spam someone like me who speaks a rather strong Bavarian accent ("Scheissglump vareckts, sacklzement, hoit dei Mai und moch hi...") which is getting even worse if something annoys me ;-)
  • We've got a system on the Odeon cinemas ticket booking line in the UK. First, it asks you which cinema you would like to book the ticket at: Computer: Which cinema would you like to book tickets at? You: Kensington Computer: You chose Kensington. Say yes if this is correct. You: Yes Computer: Which cinema would you like to book tickets at? Please speak clearly. You: Kensington Computer: You chose Kensington. Say yes if this is correct. You: YES Computer: Which cinema would you like to book tickets at? Please speak clearly, or hold for an operator. You: Kensington Computer: You chose Kensington. Say yes if this is correct. You: FUCK OFF Computer: Kensington is correct. It can recognise hundreds of cinema names, but always has a difficulty with yes.. When it voice recognition first came out on voicemail boxes, we'd derive great amusement from saying random stuff into the phone and seeing what number it would guess...
  • Sprint PCS has a similar feature. When you call customer services (pound-something), you get greeted by your "digital assistant!". She says to "interrupt me at any time." Still, it feels weird cutting off a very human-sounding voice with an authoritative, "tell me how many minutes I have left!". It's actually a high-quality and accurate system, in my experience.

    -Dan
    unixpunx.org - punks, computers, technology
  • ... is the Odeon Film Line. You get this message that says:

    "Welcome to the Odeon Film Line! To pick the cinema you want just say the name!"

    To which you do and, in my experience, its got it right every single time. Including stuff like "Odeon Leicester Square", "Mezzanine", "Wimbledon" and "Manchester".

    From what I understand they use software by Vocalis [speechtml.com].

    • Thanks for the positive feedback on the system. We developed it here at Telephonetics in the UK. It actually uses the Nuance speech rec engine. If anybody else has any experience with the Odeon system (good or otherwise) I'd be very interested to hear it.
      • In reply to your comment, it's great on the cinema names, but often has difficulty confirming it afterwards with a simple "Yes or no".
        • The recognition should be improving substantially shortly, especially on yes/no!

          You may be interested to know that it will only ask you to confirm if either (a) you're calling from outside what is deemed the catchment area of the cinema, (b) if you're calling from a mobile or (c) if it wasn't confident on its interpretation of what you said.

  • I had to call UPS once to complain about one of those yellow sticky notes they leave, and it said "Please speak your tracking number." This number is like 10 digits long, both alpha and numeric. Sure enough, it got 100% the first time I said it!

    duane

    (Note, I still don't like them. The package I was complaining about had been left in a puddle near my garage, and the guy wrote "delivered at front door" on the slip.)

  • by acb ( 2797 ) on Wednesday February 13, 2002 @09:31AM (#2999474) Homepage
    A friend of mine (from Australia) went to the US a year or two ago, and found himself needing to call a service which used such a system. When he did, he found that it could not understand his accent; after three unsuccessful attempts at doing an "American" accent, he gave up.

    The moral of this story: make sure that there's a touch-tone menu to fall back on.
  • I'm not sure about the dig on TellMe; I've always found their services (both their direct public system, and their use in part of American Airlines's phone tree) quite well executed. The voice reconition has generally been good, even with background noise (the one failing I experienced was asking for the movie "Amelie", when "Ali" was also playing). I never got an ad when it didn't understand--what kind of brain-damaged company would intentionally aggravate an already annoyed customer? Moreover, I found the flow of the "conversation"--timing, appropriateness of response, integration of voice samples--to be excellent, which I think is key to making a system like this palatable. And I know first-hand that they have quality people in engineering.

    By contrast, I once called SprintPCS and ended up on a similar system, but it was terrible. The VR was flakey, and it did not degrade gracefully when it didn't understand, leaving me disoriented. I confirmed from a friend at TellMe that SprintPCS used someone else.

    I don't know anything about Tuvox, but I question whether they will have success against TellMe, which not only has good tech, but is very well backed. If they're betting on their "AI", they're probably dead as soon as people find out it sucks. If they're just trying to be a better TellMe, they have a challenge--but I hope they come out with a competing public service to get publicity!

  • SprintPCS Voice Dial (Score:2, Interesting)

    by dcocos ( 128532 )
    I've been using Sprint's voice dial service since it came out and its pretty effective. I occasionally have to repeat myself, but I've uploaded my whole address book and it is very good at figuring out names.

    For those of you unfamiliar with the system it works like this.
    User :*
    SPCS :Ready
    User :Call John Smith at Home
    SPCS :Calling John Smith at Home Correct
    User :Yes

    Done

    Its super convient when you are in the car or running through an airport and don't have the time to look down at the phone. The reason I'm impressed with it is because you don't have to "train it" to your voice.
    • The problem isn't the voice-dialing with Sprint PCS (the recognition domain is pretty limited, so number recognition is quite good in many VoRec systems now), but rather "Claire, your electronic customer service representative".

      In my experience (as recent as yesterday), Claire is both hard of hearing and pretty darn stupid. The most frustrating thing is that the entire system is designed to prevent you from ever getting to a real person, so if Claire can't help you, you're SOL. I did notice that after about a half-dozen failed attmepts to go through Claire, I got a different answer (I was dialing 611 to try to get them to do the PRL update they've been unable to do since August), this time asking for the last 4 digits of the SSN of the account holder. I'm not sure if this was programmed, or if Claire just happened to go off-line at that time.

      I *really* hate companies that use IVR systems to *prevent* you from getting customer service. Sprint is probably the worst offender I've encountered at this, but then they have by far the worst customer service I've ever encountered in a service company. (Although Home Depot clearly takes the cake for worst customer service overall - they direct all complaints to "Ben Hill". Like Arlington Hewes [tpc.int], Mr. Hill does not exist - the phone is answered by a rotation of store assistant managers, who do not track or follow-up on complaints. It's essentially a well-desiged and deliberate bit-bucket for Home Depot's customer complaints. Once they have your money, they're not really interested in hearing if you have a problem - the height of poor customer service.)
  • Oops (Score:3, Funny)

    by Evanrude ( 21624 ) <david AT fattyco DOT org> on Wednesday February 13, 2002 @10:57AM (#2999877) Homepage Journal
    Well, the other day when I accidently baked my Visor, I had to call their support line....
    • Was it in an oven?
      Seriously, I do wonder how much tech support they hand out... My visor installed flawlessly from step one. I've even bounced it off the pavement once. Still chirping at me!
  • I've used TellMe's service quite a lot in the past. Driving directions, Movie listings, and just generally wasting time on the phone. It is a great service. I even played around with VXML, where I came up against the greatest current limitation with non-speach-to-text voice recognition systems:

    They seem to be pretty much exclusively based on grammar files. Basically, you write out a grammar that lists all the possible things you think the person speaking would utter and then match them up to different branches in your system. Unfortunately, you can't easily take free form speach and store it as anything other than a sound file. This makes it difficult to do something such as allow the user to speak a message to send as an e-mail. The VXML engines have a great deal of heuristics to handle differences in speach style and tone, but without the grammar, you pretty much need to go through voice profile training to get decent results.

    If anyone knows of kewl advances in this particular area, I'd love to hear them!
  • Check out www.1-800-555-TELL.com. It's a very cool application of vml stuff.
  • Is it just me? Did anyone else read this as 'TuxVox'?
  • by Dan Crash ( 22904 ) on Wednesday February 13, 2002 @12:20PM (#3000396) Journal
    I've been working with VXML, CallXML, and other voice oriented IVR solutions for a while on a hobby basis, and I've been really frustrated that no workable open source VXML solution exists.

    SpeechWorks' OpenVXI [cmu.edu], originally promoted as an open source VXML interpreter, has turned out not to be a good one. Speechworks developers maintain the code, and refuse to incorporate the patches and requests of the open source community, in favor of keeping OpenVXI tied to Speechworks products. The codebase could be forked, but it's really not worth investing the effort in such a brittle product tied to proprietary solutions.

    Bayonne [sourceforge.net], the GNU telephony server, is great and getting better all the time. It currently supports a strong scripting language for DTMF applications, and Bayonne's XML plugin structure and built-in support for multiple telephony cards makes it the logical choice for open source VXML.

    All that's needed at this point is to finish integrating Bayonne with an open source Text-To-Speech engine (most-likely candidates are Flite [cmu.edu] or Festival [festvox.org]), Automatic Speech Recognition engine (in this case, Sphinx [cmu.edu]) and write the XML plugin. But there is a shortage of coders with the skill and time to do this.

    I really think small business and the average Slashdotter could benefit from an open source VXML solution. Small businesses could create professional telephony apps that could make them much more competitive (from accepting credit cards securely over the phone to providing dedicated 24-hr support numbers for their products), while creative coders could use it for everything from Eliza-style chatbot answering machines to having your boxen call you up and describe a hack attempt as it's being made.

    I'd love to see a VXML enabled Bayonne blow TellMe and others out of the water. If you're intrigued and you'd like to get involved, check out Bayonne's Sourceforge site [sourceforge.net] and sign up for the mailing list.

  • Call 1-800-PICK-UPS (1-800-742-5877) and select the option to track a package. Most likely, you'll be presented with a (very American-sounding) voice asking you to "Please say the tracking number now".

    Rattle off a legit UPS tracking number fairly rapidly (still understandably, though), and it repeats back to you what it thought it heard. Even calling from my car, with a handsfree car kit, it's never once gotten it wrong.

    Fairly impressive, if I may say so myself...
  • I do not remember if it was calling the phone company, or my car insurance or the local ticket master, but I've had the answering machine ask me to tell it what I wanted. (tickets, sales, I don't remember now)... I replied, thinking that it boded poorly for me actually getting where I wanted to go. Surely enough though it worked great. I've also a tendancy to mumble my words, but it worked fine.
  • I work for Empirix.com programming test systems that test voice recognition/response.

    To write these tests you often recored what the user would say to the system "1017" for flight 1017 then play it back at the correct time in the menu. We like to get our customer to recored the message so there is no question about how it sounds or is said. But sometimes we recored the system itself. Often the system has trouble even understanding its own voice.

    It is also amazing how fast people who test these systems manually learn to speak so that the system understands them. Automated testing using recorded prompts makes a difference.

    We collect some of the prompts we get back from systems
    http://performance.empirix.com/VoiceIndex/hear-r ig ht.htm?page=vpi_home&link=hear-right_ad
  • I've never used any voice recog system, but I read of someone using it with a word processor to dictate a memo. While doing so he greeted two co-workers "Hi, Nick and Ben" and the word processor dutifully wrote "Hi, naked men"
  • They're great from landlines, but as soon as you go on the road to use your mobile phone, it doesn't work that well.

    Road noise from the car, the cell phone cutting in and out, etc. all screw-up the experience. The human ear (and brain) is great at piecing together something meaningful heard when someone is cutting in and out, but unfortunately, voice interfaces aren't there yet.

    Until mobile phones become as reliable as landlines, I think voice interfaces will be handicapped. It's like trying to type SMS messages on a number keypad....
  • when i graduated college (a year ago) i worked for a company that was focused on providing wireless and converged applications. it didn't take long to realize that the technoloy for creating the types of applications was readily available and not to difficult to use.

    the primary shortcoming in developing these type of applications is access to the equipment needed. we used voxeo for voice applications, and simplewire for sms services. (we had a couple of other suppliers for various other services, but those are the two that i remember.) the applications themselves are very simple to write. voxeo (and i think most of their competitors) all use voiceXML. any competent html author can learn how to write an interactive voice application without too much trouble- if they have an agreement with voxeo or somebody else (tellMe and beVocal come to mind) most of these have free developer programs, but to actually deploy an application on them for commercial use costs money.

    in the end we got out of the application development because the engineers spent all of our time writing one-off demos that never got used, and our sales guy spent all of his time setting up "strategic partners" (translated: we do work for them so we can say we have clients, but they dont give us any money) and trying to sell to other companies that were as broke as we were. we changed our focus to create a platform to develop the applications we were spending all of our time writing. we developed a visual studio-like ide for converged applications, with wizards, template applications, tutorials, etc. more importantly, we made access to all of our secondary providers transparent. in one environment you could send sms messages through simplewire, run voice applications through voxeo, deliver wap and pqa applications, and have them all pass information from one to another more or less transparently.

    for a while we were giving out free logins for people to sign up and write applications and give us feedback on the system. we almost posted it here on slashdot once upon a time, but at the time, we weren't sure our system would handle that much traffic (or that voxeo wouldn't start charging us more for that much traffic) that and there were a few security concerns in our system that we weren't ready to expose (you could embed php, perl, or c code in your application inline. very handy to have for trusted developers, but certainly not someting you want to hand out to everyone...)

    unfortunately at this point our sales person still hadn't sold anything, despite the fact that everyone who saw our environment was drooling over it. we fired him when he (literally) told our ceo "i couldn't give this away". plus we were almost out of money, and (i'll still never understand why) he was the highest paid person in our company. on top of this, we were unable to get any further invstment because (1) we didnt have any paying clients and (2) our ceo never actually demonstrated any of our working technology to the investors, and was too busy talking about how zondigo (our company) leveraged adMonitor technology -adMonitor was written by our ceo, frank addante, the former ceo of l90 (check out l90 on fuckedcompany for a good laugh). half of our seed round of funding went towards a licence to use admonitor technology. of course the only thing we ever used it for was tracking hits to our corporate website. and even that was at the insistence of frank, as that didn't tell us anything we couldn't have found from the server logs.

    anyway, around june, the company ran out of cash and shutdown. all of our technology was purchased by a company named voxicom that is using it to develop a voice recognition baed email service (similar to the wildfire somebody else mentioned somewhere)

    anyway, the point of this gargantuan post is that

    1) the technology to create these applications is readily available, and usable by anyone competent at writing html

    2) the technology to create these applications has been readily available for at least a year now

    3) the technology to create these applications is only available through a few select vendors due to backend hardware and software requirements

    4) my old ceo and head of sales are both idiots.
  • by shivan ( 12148 )
    just for those of us who dont like to give out or info to read an article

    the article without registration [nytimes.com]
  • Hello from TuVox (Score:2, Informative)

    by TuVox ( 558625 )
    Hello everyone. I'm Ashok, the CTO and cofounder of TuVox - the one with the Frankenstein green skin in the New York times article ;-) It's really great to see fellow slashdotters interested in our technology. Some comments/thoughts/observations to offer. I'll try and add notes over the course of today. We provide automated technical support using speech recognition as an underlying modality. Speech-based technical support is a very different kind of problem than a more conventional speech application. Most speech applications are "few turns" and low ambiguity. It only takes a few interactions with the system to get a train schedule, or a stock quote and there is little ambiguity - you either want to go to City A, or City B. Companies that provide few-turn, low-ambiguity applications spend literally tens of thousands of dollars (or even hundreds of thousands of dollars), getting each turn to be as accurate as possible. The content of such an application rarely changes, and if it does, the rollout/testing period can be very long. A final note - these callers generally use the system frequently (ie. calling for a stock quote). Because of this, callers are willing to be educated on the commands/VUI to drive the system. We, on the other hand, have to deal with long conversations (10 minutes), with users fumbling with their equipment, confused and angry, etc. You can imagine a call System: "How can we help" - Caller - "My $#@% machine doesnt work".... We have to get the caller to their answer, in spite of the fact that they don't know what the answer is. Additionally this is probably the first time the user has ever used the system. Finally, we have to make literally thousands of answers available in a conversational style, the day the product ships. That's when the highest call volume occurs (in the few weeks after the product ships). Oh...by the way - we use real humans for the voices, not text-to-speech. That makes the production schedule even more interesting! Callers can leave us messages about their experience. It's really heartwarming (in the words of one of our customers) to hear what callers say - we got a call a few nights ago (at 1 in the morning) where a caller said he was glad that he was able to solve his battery recharging problem, because he thought he was going to lose all his data. The part of the New York Times article talking about people saying thank you occurs very frequently. They say thank you in so many places we have started to put thank you responses into the system. Callers dont' have to wait. Callers get answers at any time. Callers dont get rude, untrained, agents abusing them (everyone at TuVox has had a horrible experience with an ISP tech support agent)! Our customers like that proposition! Last point - It's not a choice between a live agent and automated support. We're offering the alternative to no agent at all. People think we're replacing agents. We're not. Our technology is designed to work with and support a tech support agent. Right now, our initial rollouts with customers are after hours because that's where the call volume is lowest and where we can fix any unforeseen problems. But there's even more interesting technologies in our pipeline. Kindest Regards, Ashok
  • I called up the other day and you just speak your card number and or say "customer support" and you are dispatched. Quite refreshing from hitting the sometimes non-responsive buttons on my phone.
  • There is a very old (since '95 at least) example of a private Ineractive Voice Response (IVR) system that you can check out at: 1-800-WILDFIR I have nothing to do with this group, but have encountered it from time to time.
  • The FedEx system works amazingly well, from my experience. Its a heluvalot easier to read my department's 3000-digit account number right off the card rather than typing it into my phone. I'm used to a keyboard's numpad, which is upside-down from a phone, so I inevitably mistype the account number...

    I've never had the system misunderstand me, unless I obviously mumbled into the phone. I guess my only complaint is that its a tad slow when processing what you say, but otherwise its pretty convenient.
  • (Full disclosure: I have worked with most of these companies).

    Telephony-based voice-recognition is going to be the Next Great Thing (tm). The main companies that are involved in this stuff are SpeechWorks [speechworks.com], Nuance [nuance.com] (both work on the main speech recognition/software stuff), HeyAnita [heyanita.com] (which works with Sprint [sprint.com]), and TellMe [tellme.com].
    • There are more to the list, like BeVocal (BeVocal.com), Telera (telera.com). TuVox seems to have a well defined market space- tech support. I have tried many of such applications, seems to be a great usage of human talent that takes advantage of tech to save time and money for call centers. -Wiselink
  • Could /. please stop providing free PR to random companies? Thanks.
  • I always just say "Agent! Human! Agent!" and then hit zero repeatedly when these come up. They are NEVER but NEVER accurate enough, and they are a huge waste of time. I remember when Moviefone tried them for a little while -- it was a disaster as the thing was completely unusable.

    Sorry, but there is no substitute for a human at the other end of the line. Charge more if you have to, but answer the phone!

Beware of Programmers who carry screwdrivers. -- Leonard Brandwein

Working...