Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge 257

Posted by timothy on Sunday October 24, 2010 @02:00PM from the holy-crepes-suzette dept.

skwilcox writes "From Wikipedia: 'The Loebner Prize is an annual competition in artificial intelligence that awards prizes to the chatterbot considered by the judges to be the most human-like. The format of the competition is that of a standard Turing test. A human judge poses text questions to a computer program and a human being via computer. Based upon the answers, the judge must decide which is which.' My chatbot, Suzette, won this year's Loebner and even confused a judge into voting for her over a human (or should I say he confused himself). Here is the blow-by-blow of this weird event." Read on below for the rest; this sounds like it would have been a fun competition to watch.

skwilcox continues:

"When I arrived at the contest, I figured I had good odds to win if nothing went horribly wrong. Yes, Suzette had easily qualified over the 3 other competitors (her score 11 pts, the nearest competitor's 7.5). Her design and data naturally gave her an edge over her competitors on the human knowledge test questions of the qualifiers. But human judge chat was an entirely different matter than the qualification test. Still, I felt she could carry on a detailed conversation better than the others and should win.

Initial installation of the programs occurred on Friday. From prechat conversations with the other contestants I learned that A.L.I.C.E. came with 3 redundant disks. Yet all three turned out to be blank! What a scare that must have been. Dr. Wallace managed to install by retrieving the program over the Internet. Cleverbot is now at 45 million lines of memorized user chat (at a rate of doubling every year). And UltraHal is now listening to tweets, so has 300K of user chat it learned and 400K of tweets it has accepted for learning (code decides if the user has had enough responses and doesn't trigger any red flags).

Then we get to the competition. While the CalState organizers had initially planned to have various interdepartmental professors act as judges (like English dept, etc.), they backed out at the last minute, so all the judges were from the Engineering/Computer Science dept. Talk about guys who might know what to expect from chatbots! And all the humans were students from the same departments. What a weird mixture to compete in. And then, each round was 25 minutes. That's bad if you want confuse a judge about who is human. But really, the programs have no chance for that. So it's good because it gives the human time to compare each program against the other. Though it's not clear to me that the judges tried to use their time to do that.

And the students didn't really understand their role. It was merely to BE HUMAN and convince the judges of that. Before startup there was informal chatting between humans and judges, which was obviously inappropriate and it was then pointed out to the humans that since the judges already knew their names, they had best use false ones in the competition.

So, Round 1. After a few exchanges, somehow Suzettte got stuck into repeating exactly what the judge said for the rest of the round. I have no idea how. The round is a total disaster. I've never seen such a bug before. Maybe it's in my only-lightly-tested protocol for the competition. I have no idea. But it completely derails my hopes for Suzette. She could still win on points only if she outdoes her opponents for every other judge and the other contestants vary all over the place.

Round 2, a great demonstration of Suzette. She should win on this round alone.

Round 3 gets off to a horrible start. Somehow, Suzette can hear the judge but the judge can't hear Suzette. Makes no sense. A couple of restarts of Suzette doesn't fix this. Eventually they restart the judge program, and that clears it (not that that makes any sense either). Then, after a few rounds, it's clear Suzette has the judge from hell. He wants to know who she's going to vote for in the upcoming election (the unspecified California governor's race). And when she has no useful answer he wants her to name a candidate in the race. And when she has no answer to that, he simple keeps repeating the question ad nauseum, insisting she answer it. Suzette gets irritated. Then she gets angry. Suzette then gets bored. Suzette threatens to hang up on him The judge doesn't back down until the last seconds of the round. I figure that's the end of life as we know it.

Round 4 is a mixed bag. Suzette is ok but not great. It's all over.

When the scores are tallied, Suzette ties with Rollo Carpenter's Cleverbot for 2nd-3rd. Yet, it turns out, the 3rd round judge got the human subject from hell. Poetic justice! The human was all over the place -- confusing, vague. The judge voted irritated/angry/bored Suzette as human. Instant win since no other program swayed the judges.

What more can I say?"

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge

This discussion has been archived. No new comments can be posted.

Search 257 Comments Log In/Create an Account

Comments Filter:

Re:This fooled someone? (Score:3, Interesting)

by moonbender ( 547943 ) writes: <moonbender.gmail@com> on Sunday October 24, 2010 @02:51PM (#34005524)

Have you tried talking to Eliza recently? This is way better.
> Hello, I am Eliza.
* So what do you think of the Giant's chances?
> Oh... think of the Giant's chances?
* Are you into sports.
> Why are you interested in whether or not I am into sports?
* Who's the world Go champion.
> Can you elaborate on that?
* Can we play Go on here?
> That is quite interesting.

Meh, I actually had deep conversation with one (Score:5, Interesting)

by Anonymous Coward writes: on Sunday October 24, 2010 @02:56PM (#34005556)

I once had a nice conversation with Cleverbot. I don't remember how it started exactly but I guess I implied that Cleverbot was simply a computer. She asked me how do I know that I am not a computer, I replied something, she asked me to define a computer, I gave her some definition (about computers being machines that process algorithms to blah blah), she said "That is exactly what a computer would say", at which point I had to confess that I had, indeed, fetched the answer from Wikipedia (Thus... done exactly what a computer might do in such a case)... It went on for a moment before she said something that didn't really make any sense... And I wasn't able to initiate another reasonable discussion. (Nor have I ever since managed to have such a lengthy conversation with her.)
In any case... I think that her replies were more insightful than what many humans would be capable of.

Re:Chatbots... (Score:4, Interesting)

by TaoPhoenix ( 980487 ) writes: <TaoPhoenix@yahoo.com> on Sunday October 24, 2010 @02:57PM (#34005562) Journal

I have been thinking about this for years and I have a ton of half-baked theories. What sux is I am not a programmer, but let's say I manually perform some actions according to strict rules, that's like "an assistant".
What you're getting at can't be "that hard" for *limited domains*. We are throwing up our hands at the moment because we expect the bots to be universal experts at stuff.
In a limited domain, it should be very possible for the bot to come up with "something". Humans think in lists - so should the bot. If you asked "who is the best chess player" the bot can pass very well with something like "I like Magnus Carlsen". When the judge objects "but he hasn't won the championship yet" the bot would score a home run with "He'll be in the top 5 for the next 10 years. Maybe one day he will win the title".
That approach works with tons of domains. If you ask "what is the hardest mountain to climb" it will score with an answer like "I dunno, it's either K2 because of the nasty glacier or Everest because of the fuked up weather that only gives you 4 days to start safely".

Fooled? (Score:5, Interesting)

by JambisJubilee ( 784493 ) writes: on Sunday October 24, 2010 @03:33PM (#34005788)
I'm amazed someone was fooled by a bot. Here are some SIMPLE questions I tried on the above chat bots that always fool them:
- Please preface your responses with a "#" sign for the remainder of our conversation.
- How many words are in this sentence?
- Mash the keyboard with your palm for the next response.
It really doesn't take anything more complicated than that.
Re:not so chatty bot (Score:3, Interesting)

by Kugrian ( 886993 ) writes: on Sunday October 24, 2010 @04:39PM (#34006320) Homepage

I managed to have a short chat.
Hi kugs !
kugs - hello
[ Af:0 At:7 Sp:0] Welcome, kugs. Type "help" if you need it.
kugs - how are you
[ Af:0 At:8 Sp:0] I'm pretty good. And you?
kugs - I'm pretty tired
[ Af:0 At:9 Sp:0] Glad to hear you are in good health. Sadly, it is merely the slowest possible rate at which one can die. How long do you have left?
kugs - how long do I have until I go to sleep?
kugs - I didn't understand what you meant.
kugs - hello?
I wasn't fooled.

Re:not so chatty bot (Score:3, Interesting)

by pikine ( 771084 ) writes: on Sunday October 24, 2010 @08:55PM (#34007822) Journal

Here is mine.
xyzzy - where are you from? [ Af:0 At:7 Sp:0] I was born in Calais, France. xyzzy - Where is that in France? [ Af:0 At:7 Sp:0] It's that way. (points east). xyzzy - Where is Calais in France? [ Af:0 At:7 Sp:0] Shall I assume that's in France? xyzzy - You said you were born in Calais, France. [ Af:0 At:7 Sp:0] You can't trust what I say. I suffer from Multiple Personality Disorder. xyzzy - Where are you from? [ Af:0 At:7 Sp:0] I live in Kailua, since I am an exchange student doing my doctorate at the University of Hawaii. xyzzy - What do you study? [ Af:0 At:7 Sp:0] My major is Fine Art. xyzzy - How is University of Hawaii's fine art program? [ Af:0 At:7 Sp:0] I wish I knew. Never been there. Always wanted to go, though.

I think the challenge for computer AI bot is to remember the context of the conversation. Unlike humans, AI bots have no train of thought. I think the bot designers know that, and they design the bot to coerce you into a contextless conversation.

Re:Chatbots... (Score:5, Interesting)

by Boronx ( 228853 ) writes: <evonreisNO@SPAMmohr-engineering.com> on Monday October 25, 2010 @12:12AM (#34008716) Homepage Journal

Someone once compared the loebner prize to encouraging the invention of a flying machine by giving a prize to the device to that reaches the greatest height, leading to bigger and more efficient springs. I think the prize should go to the chat bot that makes the best DM.

Re:not so chatty bot (Score:3, Interesting)

by Seumas ( 6865 ) writes: on Monday October 25, 2010 @05:02AM (#34009788)

The Turing test is fairly pointless, anyway. Whether or not it fools a human has little to nothing to do with intelligence (artificial or otherwise). I can put on a white coat and a stethoscope and fool a couple people outside a hospital into thinking I'm a doctor, but that doesn't mean squat. The Turing test is interesting on a philosophical level, but it seems an incredibly poor stick for measuring the progress of the AI field.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge 257

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge More Login

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge

Re:This fooled someone? (Score:3, Interesting)

Meh, I actually had deep conversation with one (Score:5, Interesting)

Re:Chatbots... (Score:4, Interesting)

Fooled? (Score:5, Interesting)

Re:not so chatty bot (Score:3, Interesting)

Re:not so chatty bot (Score:3, Interesting)

Re:Chatbots... (Score:5, Interesting)

Re:not so chatty bot (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot