Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI Input Devices XBox (Games) Technology

The Uncanny Valley of Voice Recognition 83

An anonymous reader writes: We've often seen the term "uncanny valley" applied to the field of robotics — it's easy to get unsettled when robots act close to being human, yet fail completely in a few key ways. GitHub Engineer Zach Holman writes that we've now reached uncanny valley territory in speech recognition as well, though the results are more frustrating than they are disturbing. He says, "Part of this frustration is the user interface itself is less standardized than the desktop or mobile device UI you're used to. Even the basic terminology can feel pretty inconsistent if you're jumping back and forth between platforms.

Siri aims to be completely conversational: Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar? Xbox One is basically an oral command line interface, of the form: Xbox (direct object). ...it's these inconsistencies that are frustrating as you jump back and forth between devices. And we're only going to scale this up."
This discussion has been archived. No new comments can be posted.

The Uncanny Valley of Voice Recognition

Comments Filter:
  • by turkeydance ( 1266624 ) on Monday February 09, 2015 @10:07PM (#49022983)
    i thought so.
  • by msobkow ( 48369 ) on Monday February 09, 2015 @10:07PM (#49022987) Homepage Journal

    I fail to see how the "inconsistency" of speech recognition UIs are any more earth-shattering that the inconsistency between graphical UIs. People learn to use what they have, no more, no less. Anyone who "expects" device Y to behave like device X when they're from different vendors is a fool.

    Hell, even Android devices aren't consistent between vendors, and they start off with the same core code!

    • by Shadow of Eternity ( 795165 ) on Monday February 09, 2015 @10:36PM (#49023087)

      Exactly. The problem isn't frustration with different interface schemes, the problem is that they don't fucking work. I use several different programs with buttons and menus in different arrangements, but when I click a button the button is bloody well clicked regardless of where exactly it is. Voice recognition on the other hand is simply too unreliable.

      • by Anonymous Coward

        I think it's the same basic problem as handwriting recognition. The algorithms are still too primitive to handle the way people actually communicate.

        If you talk like a computer with overly enunciated words and stick to common words the likelihood of the computer getting it right is substantially better than if you talk the way that people actually talk. I overly enunciate my Mandarin because I don't yet trust myself to make the tones if I don't, and the computer usually picks up exactly what I want most of

        • Exactly. Clicking a button is clicking a button no matter how someone moves the mouse or presses on it but even if we had 100% perfect voice recognition we'd still be stuck when it comes time to do something with that. There's an enormous number of ways to say the same thing in most languages, and often many phrases that can mean entirely different things depending on context.

      • but when I click a button the button is bloody well clicked

        Looks like you don't have much experience with cheap touch screens.

        • by jc42 ( 318812 ) on Tuesday February 10, 2015 @09:39AM (#49025535) Homepage Journal

          but when I click a button the button is bloody well clicked

          Looks like you don't have much experience with cheap touch screens.

          Heh. You obviously haven't work with any of the more expensive ones. I have a small collection of different portable gadgets for web testing, and that statement about buttons definitely isn't true for the various Apple tablets or phones. Thus, there's a little "x" icon whose function is to close the tab/window. I've learned to just start tapping it about twice per second, and maybe by the 3rd or 4th or 6th or 10th tap, it'll close.

          Of course, the little monster might know very well that I'm tapping it, but wants to see how serious I am about it.

          Of course, Apple's gadgets aren't the only ones like this. They're just one of the worst of a bad lot. And often it's a good idea to not tap too fast, because when the window finally closes, it usually gets replaced with another that'll do something totally unexpected when you tap it in that newly-exposed spot.

        • I said click for a reason.

      • by jez9999 ( 618189 )

        I just do not think you have had enough experiment with. delete that, delete that, dear mom let's set so double the killer delete select all

    • There's a standard for speech recognition already, as long as you're talking about "intelligent agents", which the Xbox One is certainly not: Natural English (or insert your language here) conversation. The gold standard, no pun intended, should be to phrase queries or commands in such a manner that any reasonably intelligent native speaker could easily understand your intent, and the computer should perform those tasks or retrieve that information for you.

      At this point, the only reason there's jarring inc

    • by Rideak ( 180158 )

      As far as voice commands go, people are inconsistent too! someone from the deep south is going to have a bit of trouble talking to someone from the UK. Especially when using slang and idioms. People from different age groups say things differently as well!

      Hell, even when people are from the same area and are same age they'll need to hear something more than once before they can make sense of what's being said.

      It goes well beyond just accents though. phrasing, tone, cadence, context and a bunch of other thin

      • by ihtoit ( 3393327 )

        hah, I just watched "Expendables 3" again, and came across this bit:

        Christmas: "Status of enemy?"
        Drummer (over satphone): "What fuckin' language is that??"
        Ross (slurred): "What's the status of the enemy?"
        Drummer: "Local mostly..."

        made I chuckle.

    • Anyone who "expects" device Y to behave like device X when they're from different vendors is a fool.

      Not have as much of a fool as anyone who forgets that human beings aren't a collection of logic gates. However much we like to consider ourselves rational agents. Human beings generalise, which makes it difficult to remember whether the application your using uses shift-ctrl-z or ctrl-y for redo... which is why most programmers have standardise on shift-ctrl-z.

      Penny Arcade's Extra Credits did a video a couple of years back on why Microsoft Kinect was the uncanny valley of input devices. (Kinect Disconnect [penny-arcade.com])

    • Ugh. creating reminders with Siri sucks, because I regularly set a reminder for two hours in the future, and saying "Remind me in two hours to..." generally results with Siri decoding "Remind me into to...", which results in an untimed reminder starting with "into to"

  • That is the problem that human language is very ambiguous and context-sensitive, which is the whole reason we invented programming languages instead of trying to express it in English. Either you limit yourself to a set of simple unambiguous commands or you try to parse what we're really trying to say, which is like giving the computer the business requirements document and tell it to program itself. Fortunately for our job security that "valley" won't be crossed any time soon, people imagine it'll be like

    • by lucm ( 889690 )

      I guess we're making advances on answering trivia questions and adding appointments to the calendar, but it's not exactly ready to hold a conversation.

      It's a good thing. If I have to start holding a conversation with my computer to get it to manage my calendar it will become higher maintenance than my secretary, who only needs a cheap gift basket on Secretary Day and a small smack on the butt when she remembers the extra espresso shot in my latte.

    • Even if voice recognition reaches the quality of Star Trek, we still don't have the AI quality to back it up. Sure your PC can turn your speech to text just as well as the Enterprise computer can, but as we have seen with Cleverbot and the likes, it's still just conversationally a complete idiot!
  • I find siri very annoying. It has a few tricks and tries to act cute but its cuteness means that it gives the wrong answer
    half the time. For instance, a simple question like "Can you get chickenpox from chickens?" gets a reply of "Who, me?"
    This is a simple question that a human can easily understand that it isn't directly addressed to them and Google voice
    search, not trying to have a persona of its own, is smart enough to just do a search for an answer it doesn't know instead
    of being a smart aleck. I've a

    • by Anonymous Coward
      I didn't understand your question at all. Who is the "you" to whom you are referring? Obviously Siri can't get chickenpox from a chicken; she's a piece of software. Next time, ask proper questions.
      • Using an imprecise mechanism like language requires verification

        If the 'user' wants to deride that verification, then they will get the same response as any ass-hat that demands instant response to ambiguous statements

        Going beyond the uncanny valley will require both conversation and 'training' to the individual, just like any working relationship with a human

      • My kids ask me questions like this all the time. Most people with normal intelligence
        realize that the "you" should really be replaced with the word "a person" as it refers
        to an ambiguous you not a specific you. For many of my kids questions, I had
        gotten used to just asking google before switching to an iphone last month and
        quickly discovered that siri tried to be a smart aleck instead of just doing a search.
        On a random side note, while on my android, my kids always used to ask me if I
        was talking to siri e

    • Ummm, yeah,

      Just asked Siri on my ipod "Can you get chickenpox from chickens" and all it did was come up with a list of ~15 websites, the top being WebMD, as well as ~15 images of chickenpox rashes.

      So, tl;dr version, pretty much the same results as using Google voice search in my GNote2.

      • Ummm, yeah,

        Just asked Siri on my ipod "Can you get chickenpox from chickens" and all it did was come up with a list of ~15 websites, the top being WebMD, as well as ~15 images of chickenpox rashes.

        So, tl;dr version, pretty much the same results as using Google voice search in my GNote2.

        Not sure how that works. I'm using a one month old iphone 6 so maybe siri varies some from platform to platform.

  • Variety is different from the Uncanny Valley.

    • Specifically, if this was an Uncanny Valley then people would prefer lower quality voice recognition.

  • by bipbop ( 1144919 ) on Monday February 09, 2015 @10:49PM (#49023135)

    Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar?

    It's hard to imagine anyone who's actually used Siri thinking that question could get a useful answer. Siri can't understand even far more basic English. It's not much more advanced than Dr. Sbaitso.

    • And why wouldn’t you just say “Nixon” instead of trying to trip up the parser?

    • "Siri, where is the nearest gas station" is too hard half the time. The only real use for Siri was on long trips - we would give the phone to my toddler age son and let him ask Siri things and then argue with her about her nonsense answers for miles andnmiles LOL.
  • by R3d M3rcury ( 871886 ) on Monday February 09, 2015 @11:08PM (#49023175) Journal

    As I understand it, the "Uncanny Valley" refers to things are that very close to human behavior--close enough that the mind shifts from this being an imperfect representation of a human to being an imperfect human.

    Personally, I'm not sure there would really be an issue with "uncanny valley" in regards to speech recognition. It's good if it recognizes what you're saying. It's bad if it doesn't. There isn't really a middle ground where it's off in a way you can't really identify, which is where "uncanny valley" comes from.

    What he seems to be talking about is the "personification" of "digital assistants" like Siri and Alexa (Amazon Echo) which will eventually create an "uncanny valley." But I'm not sure that it's really that big of an issue. Just because I call something by name doesn't mean I expect it to behave in a human fashion. I don't get frustrated with my dog when I say, "Fido, change the oil in my car" and the dog just lies there and licks his balls, so I don't expect I'll ever get that frustrated because Siri can't tell me what time the sun will set next Tuesday--or, if I do, my frustration will be aimed at the people at Apple who believe that sunrise and sunset is part of the weather.

    Siri and Alexa have a long way to go before someone would mistake them for humans.

    • I can kind of see what he means, although I think the comparison with the uncanny valley is a bit weak.

      I've taken to using Google Now's voice commands to set timers while I'm cooking, so something like "Ok Google, set a timer for 20 minutes". I don't have to touch my phone and it works brilliantly even in the noisy environments of a kitchen.

      I've gotten used to talking to it in a very naturalistic way, which is where the problems occasionally crop up, and when they do they can be quite jarring.

      A good example was the last time I asked it to set a timer for "an hour and a half", which Now interpreted as 1:00:30s, i.e. an hour and a half *minute*.

      The jarring effect is at this edge where we feel like the speech recognition system is understanding what we say, but really it's just trying to use lots of different rules and patterns that have been coded in. If you happen to just fall outside of one of those rules it fails completely, and it can seem very arbitrary.

      • by hitmark ( 640295 )

        Sounds like the problem that has haunted overly "smart" user interfaces since day one, as their smarts invariably fail to account for all the variables and thus fail exactly when the user is at the most irritable (hello Clippy).

        To me a UI works better when held static rather than trying to second guess the user. Then the user applies their "smarts" to integrate the UI into their tasks.

    • by hitmark ( 640295 )

      Indeed. Uncanny valley (a questionable, or perhaps cultural, phenomena at best) is about animatronics getting so close to human likeness that we take them for being severely ill or corpse-like, and thus setting off various safety related instincts.

  • I actually find it a bit funny how big of a deal the uncanny valley still is. But maybe the low-point of the valley is dependent on the person, and I suspect people that have grown up with computers and video games are far less creeped out by it.

    For me, the low-point on the curve was from some of the characters in late 90's-early 2000's video games. Think Ocarina of Time or Deus Ex. Once it got past that, I was perfectly comfortable.

    As for voice, hell, I could sleep soundly with hal-9000, gladOS, or
  • Comcast is trying this how badly will it fail for them?

  • by Anonymous Coward

    The term "Uncanny Valley" has nothing to do with Pixar nor computer animation - it was originated by Masahiro Mori long before and is related to robotics.

    • The term "Uncanny Valley" has nothing to do with Pixar nor computer animation - it was originated by Masahiro Mori long before and is related to robotics.

      The part about Campbells' Soup didn't give you much of a jump on things, did it?

  • What gets me angry is when a voice command that Google understood perfectly clear a week prior, in my car, with radio playing and fan running, it will refuse to understand under and circumstances this week. It's great when you're driving and all of the sudden a command that was working fine suddenly dumps you to a search and you have to play "try to click three times while driving at speed in Twin Cities rush hour traffic" for something that used to work.

    I don't trust voice commands to work when I need
  • I find it hard to believe that there's an uncanny valley in voice recognition.

    Did you mean voice synthesis?

    • No, it's a fair view. The uncanny valley is all about intolerance of diversion from the expected norm. When VR was all stilted commands, all users quickly became accustomed to it (even if they didn't like it). The problem with Siri is that at first use, we're not expected to treat it like a formal system -- we're encouraged to interact act with it in an unconstrained way... yet it doesn't respond to that. It is "broken human" rather than "stupid machine".
  • Me: Do you think the freshman Congressman from California's Twelfth deserved to sit on HUAC, and how did that impact his future relationship with J. Edgar?

    Siri: I think, therefore I am. But let's not put Descartes before the horse.

    I have a hard time believing that Siri knows about this Slashdot post yet (it will...) but that answer is still highly (uncannily?!) appropriate to the original article...

    • And its parsing probably stopped at "Do you think" and hit the canned response.

      • by Dahamma ( 304068 )

        Could be. But then again, if you are starting a question to an automated information service with "Do you think...", you deserve whatever you get :)

        [I have to say my favorite horribly bad Siri response so far is to "Siri, where is the closest bagel shop?"]

Help fight continental drift.

Working...