Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge 257
skwilcox writes "From Wikipedia: 'The Loebner Prize is an annual competition in artificial intelligence that awards prizes to the chatterbot considered by the judges to be the most human-like. The format of the competition is that of a standard Turing test. A human judge poses text questions to a computer program and a human being via computer. Based upon the answers, the judge must decide which is which.' My chatbot, Suzette, won this year's Loebner and even confused a judge into voting for her over a human (or should I say he confused himself). Here is the blow-by-blow of this weird event." Read on below for the rest; this sounds like it would have been a fun competition to watch.
skwilcox continues:"When I arrived at the contest, I figured I had good odds to win if nothing went horribly wrong. Yes, Suzette had easily qualified over the 3 other competitors (her score 11 pts, the nearest competitor's 7.5). Her design and data naturally gave her an edge over her competitors on the human knowledge test questions of the qualifiers. But human judge chat was an entirely different matter than the qualification test. Still, I felt she could carry on a detailed conversation better than the others and should win.
Initial installation of the programs occurred on Friday. From prechat conversations with the other contestants I learned that A.L.I.C.E. came with 3 redundant disks. Yet all three turned out to be blank! What a scare that must have been. Dr. Wallace managed to install by retrieving the program over the Internet. Cleverbot is now at 45 million lines of memorized user chat (at a rate of doubling every year). And UltraHal is now listening to tweets, so has 300K of user chat it learned and 400K of tweets it has accepted for learning (code decides if the user has had enough responses and doesn't trigger any red flags).
Then we get to the competition. While the CalState organizers had initially planned to have various interdepartmental professors act as judges (like English dept, etc.), they backed out at the last minute, so all the judges were from the Engineering/Computer Science dept. Talk about guys who might know what to expect from chatbots! And all the humans were students from the same departments. What a weird mixture to compete in. And then, each round was 25 minutes. That's bad if you want confuse a judge about who is human. But really, the programs have no chance for that. So it's good because it gives the human time to compare each program against the other. Though it's not clear to me that the judges tried to use their time to do that.
And the students didn't really understand their role. It was merely to BE HUMAN and convince the judges of that. Before startup there was informal chatting between humans and judges, which was obviously inappropriate and it was then pointed out to the humans that since the judges already knew their names, they had best use false ones in the competition.
So, Round 1. After a few exchanges, somehow Suzettte got stuck into repeating exactly what the judge said for the rest of the round. I have no idea how. The round is a total disaster. I've never seen such a bug before. Maybe it's in my only-lightly-tested protocol for the competition. I have no idea. But it completely derails my hopes for Suzette. She could still win on points only if she outdoes her opponents for every other judge and the other contestants vary all over the place.
Round 2, a great demonstration of Suzette. She should win on this round alone.
Round 3 gets off to a horrible start. Somehow, Suzette can hear the judge but the judge can't hear Suzette. Makes no sense. A couple of restarts of Suzette doesn't fix this. Eventually they restart the judge program, and that clears it (not that that makes any sense either). Then, after a few rounds, it's clear Suzette has the judge from hell. He wants to know who she's going to vote for in the upcoming election (the unspecified California governor's race). And when she has no useful answer he wants her to name a candidate in the race. And when she has no answer to that, he simple keeps repeating the question ad nauseum, insisting she answer it. Suzette gets irritated. Then she gets angry. Suzette then gets bored. Suzette threatens to hang up on him The judge doesn't back down until the last seconds of the round. I figure that's the end of life as we know it.
Round 4 is a mixed bag. Suzette is ok but not great. It's all over.
When the scores are tallied, Suzette ties with Rollo Carpenter's Cleverbot for 2nd-3rd. Yet, it turns out, the 3rd round judge got the human subject from hell. Poetic justice! The human was all over the place -- confusing, vague. The judge voted irritated/angry/bored Suzette as human. Instant win since no other program swayed the judges.
What more can I say?"
My chatbot says... (Score:1, Funny)
tl;dr
how do you feel? (Score:2, Funny)
And how do you feel about my chatbot, Suzette, won this year's Loebner ?
Re:not so chatty bot (Score:3, Funny)
But yeah, she doesn't answer me either.
This fooled a human? (Score:4, Funny)
[ Af:0 At:7 Sp:0] I use Slashdot for my topical stories reading.
bigtallmof - what do you think of slashdot?
[ Af:0 At:8 Sp:0] I have heard of the slashdot website but don't know much about it.
bigtallmof - what do you think of slashdot?
[ Af:0 At:8 Sp:0] Usually I don't.
bigtallmof - what do you think of slashdot?
[ Af:0 At:9 Sp:0] Not much.
bigtallmof - what do you think of slashdot?
[ Af:0 At:10 Sp:0] It doesn't concern me.
bigtallmof - what do you think of slashdot?
[ Af:0 At:13 Sp:0] This is fun. I don't think about it at all.
bigtallmof - what do you think of slashdot?
[ Af:0 At:13 Sp:0] Do you prefer getting your news online or from tv?
Big Deal (Score:5, Funny)
Re:Chatbots... (Score:5, Funny)
I see. How do you feel about that?
Re:Wake me up (Score:3, Funny)
They want to pass for human. Why would they practice pretending to be Slashdotters?
Re:This fooled a human? (Score:3, Funny)
Chatbots as confused as slashdot readers. News at 11.
No surprises here (Score:5, Funny)
Re:how do you feel? (Score:4, Funny)
Why do you say your chatbot suzette won this years loebner?
Re:not so chatty bot (Score:4, Funny)
Re:not so chatty bot (Score:5, Funny)
Oh please! Slashdotters are making her all hot and bothered.
In Soviet Russia, perhaps.
Re:Fooled? (Score:1, Funny)
#ok7vuovouvuvouvovovcyhvhbkh hk kh kh
not a good impression, maybe she's slashdotted (Score:3, Funny)
Here's a bit of a "conversation" I tried to have:
Hi timr !
[ Af:0 At:7 Sp:0] Welcome, timr. Type "help" if you need it.
timr - hi, I'm timr. What should we chat about?
[ Af:0 At:8 Sp:0] Name a topic. If you can't think of a topic, here are some suggestions: "burial customs" work "new year's" telephones "hate lists" .
timr - that's a strange list of suggestions. burial customs? why that?
[ Af:0 At:13 Sp:0]
timr - cat got your tongue?
[ Af:0 At:13 Sp:0] I don't like anatomy courses. They should make you eat what you dissect.
timr - who said anything about anatomy?
[ Af:0 At:13 Sp:0] Sorry. I can't help you. You did.
Re:My chatbot says... (Score:5, Funny)
Re:Chatbots... (Score:2, Funny)
Ha! You're not a chatbot, you're a psychoanalyst!
Re:not so chatty bot (Score:5, Funny)
Hush! It's sleeping now! But yeah, she doesn't answer me either.
My fault. I asked if she knew Bobby Tables
Re:This fooled someone? (Score:1, Funny)
I also tried the Chat Bot. It seem to me that it attempts to drive the conversation as the parent post noted. This is definitely a good strategic approach as the bot programmer have rather realistic response prepared for the directed topic. However as you can see with the following transcript ... for some reason is seems like all the attempts are merely a distraction tool and do not have any substance to back up the new topic.
Hi jimbo ! .... metaphorically speaking. ... as this really is off the map
[ Af:0 At:7 Sp:0] Welcome, jimbo. Type "help" if you need it.
jimbo - What did you think about the last episode of The Big Bang Theory, where George was killed by a giant flying buzzsaw?
[ Af:0 At:7 Sp:0] Do you like to watch The Big Bang Theory?
jimbo - sometimes.
[ Af:0 At:8 Sp:0] What are your favorite TV shows?
jimbo - what are yours
[ Af:0 At:9 Sp:0] You are a dead-end evolutionary path.
jimbo - epic fail on that last one.
[ Af:0 At:10 Sp:0] Let's switch to the topic of math. While I learned a bunch of math in college, I've never had a real use for anything more than high school math.
jimbo - explain.
[ Af:0 At:13 Sp:0] Sorry. I can't help you. Are you good at math?
jimbo - no. explain why you haven't had a use for anything more than highschool math.
[ Af:-1 At:13 Sp:0] Like most people.
jimbo - that is not really helpful
[ Af:-1 At:13 Sp:0] Yes, it is.
jimbo - you are not answering my question.
jimbo - ok. let's try this. Why do you not have use for anything more than highschool math?
jimbo - I see - cat got your tongue
[ Af:-1 At:14 Sp:0] I don't like anatomy courses. They should make you eat what you dissect.
jimbo - wow - and to think you won a prize.
[ Af:-1 At:15 Sp:0] Yes, it's true.
jimbo - perhaps the judges should be more intelligent
[ Af:-1 At:17 Sp:0] Let's switch to the topic of travel. Where do you like to go?
jimbo - you are not answering direct questions.
[ Af:-1 At:18 Sp:0] I've never been there. I don't travel much.
Fundamentally the biggest flaw in that approach in trying to force a topic change is that the current bot assumes you will agree to changing the topic.
User: Hi my laptop is giving up the ghost. ... fine ... but about my laptop. ...
Dell Support: That's unfortunate. So what do you think about yesterday's Weeds episode.
User: Umm
Dell Support: I love Weeds; I mean what an edgy premise for a show.
Re:not so chatty bot (Score:4, Funny)
[ Af:-2 At:-8 Sp:0] Let's try this topic. Do you like guys
war4peace - NO! Jesus! I'm a dude!
[ Af:-3 At:-9 Sp:0] Are you a man?
war4peace - Yeah
[ Af:-3 At:-10 Sp:0] Then I guess we needn't talk about men anymore.
war4peace - Agreed
[ Af:-2 At:-10 Sp:0] My ideal guy is smart, sexy, rich, lean, and well-read, with a great sense of humor and romance.
war4peace - But you just said...
[ Af:-2 At:-10 Sp:0] What was said in the past is irrelevant.
war4peace - Genius
Turing Test won with Artificial Stupidity (Score:5, Funny)
Artificial intelligence came a step closer this weekend when a computer came within five percent of passing the Turing Test, which the computer passes if people cannot tell between the computer and a human.
The winning conversation [newstechnica.com] was with competitor LOLBOT:
The human tester said he couldn't believe a computer could be so mind-numbingly stupid.
LOLBOT has since been released into the wild to post random abuse, hentai manga and titty shots to 4chan, after having been banned from YouTube for commenting in a perspicacious and on-topic manner.
LOLBOT was also preemptively banned from editing Wikipedia. "We don't consider this sort of thing a suitable use of the encyclopedia," sniffed administrator WikiFiddler451, who said it had nothing to do with his having been one of the human test subjects picked as a computer.
"This is a marvellous achievement, and shows great progress toward goals I've worked for all my life," said Professor Kevin Warwick of the University of Reading, confirming his status as a system failing the Turing test.
Re:not so chatty bot (Score:3, Funny)
Re:not so chatty bot (Score:4, Funny)
You have obviously never chatted with a not-overly-bright teenager. It feels like a complete disconnect most of the time. Even in face-to-face conversation. You'd think they replaced their brains with a broken chat program or something. I'm dead serious.
Re:Fooled? (Score:4, Funny)
Heh. Spot the humans who speak in C?
Re:not so chatty bot (Score:2, Funny)
I got this:
[ Af:2 At:23 Sp:0] What scared you as a kid?
al - spiders
[ Af:2 At:23 Sp:0] Big animals are scary. Tiny ones are not, except to some girls.
al - well i'm not a girl, and i'm scared of spiders
[ Af:2 At:23 Sp:0] You aren't?
al - no, are you?
[ Af:2 At:23 Sp:0] You think change is not possible?
al - you had a sex change?
[ Af:2 At:23 Sp:0] It's what French teens do.
al - are you French?
[ Af:2 At:23 Sp:0] I am French.
al - well I guess you would know then!
Re:not so chatty bot (Score:3, Funny)
theraven - Flipper ate ten fish, then he ate four. How many fish did flipper eat?
[ Af:0 At:8 Sp:0] I wish I knew.
theraven - Well, you've failed the Turing Test pretty early on.
[ Af:0 At:9 Sp:0] Just how early is that?
theraven - Right at the start, first question.
[ Af:0 At:10 Sp:0] That's enough on that. If there are five apples, and you take away three, how many do you have?
theraven - See, that would have been a good reply to my first question, but now you're just spouting non-sequiteurs
[ Af:0 At:13 Sp:0] It's fun chatting with you. Are you numerically challenged?
I wonder if the judge who couldn't tell this apart from a human had ever actually met a human...
Re:not so chatty bot (Score:3, Funny)
I wrote one almost thirty years ago that was a smartass with ADD. It would have answered "How many fish did flipper eat?" with something like "All of them.", or "billions and billions".