Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge 257

Posted by timothy on Sunday October 24, 2010 @02:00PM from the holy-crepes-suzette dept.

skwilcox writes "From Wikipedia: 'The Loebner Prize is an annual competition in artificial intelligence that awards prizes to the chatterbot considered by the judges to be the most human-like. The format of the competition is that of a standard Turing test. A human judge poses text questions to a computer program and a human being via computer. Based upon the answers, the judge must decide which is which.' My chatbot, Suzette, won this year's Loebner and even confused a judge into voting for her over a human (or should I say he confused himself). Here is the blow-by-blow of this weird event." Read on below for the rest; this sounds like it would have been a fun competition to watch.

skwilcox continues:

"When I arrived at the contest, I figured I had good odds to win if nothing went horribly wrong. Yes, Suzette had easily qualified over the 3 other competitors (her score 11 pts, the nearest competitor's 7.5). Her design and data naturally gave her an edge over her competitors on the human knowledge test questions of the qualifiers. But human judge chat was an entirely different matter than the qualification test. Still, I felt she could carry on a detailed conversation better than the others and should win.

Initial installation of the programs occurred on Friday. From prechat conversations with the other contestants I learned that A.L.I.C.E. came with 3 redundant disks. Yet all three turned out to be blank! What a scare that must have been. Dr. Wallace managed to install by retrieving the program over the Internet. Cleverbot is now at 45 million lines of memorized user chat (at a rate of doubling every year). And UltraHal is now listening to tweets, so has 300K of user chat it learned and 400K of tweets it has accepted for learning (code decides if the user has had enough responses and doesn't trigger any red flags).

Then we get to the competition. While the CalState organizers had initially planned to have various interdepartmental professors act as judges (like English dept, etc.), they backed out at the last minute, so all the judges were from the Engineering/Computer Science dept. Talk about guys who might know what to expect from chatbots! And all the humans were students from the same departments. What a weird mixture to compete in. And then, each round was 25 minutes. That's bad if you want confuse a judge about who is human. But really, the programs have no chance for that. So it's good because it gives the human time to compare each program against the other. Though it's not clear to me that the judges tried to use their time to do that.

And the students didn't really understand their role. It was merely to BE HUMAN and convince the judges of that. Before startup there was informal chatting between humans and judges, which was obviously inappropriate and it was then pointed out to the humans that since the judges already knew their names, they had best use false ones in the competition.

So, Round 1. After a few exchanges, somehow Suzettte got stuck into repeating exactly what the judge said for the rest of the round. I have no idea how. The round is a total disaster. I've never seen such a bug before. Maybe it's in my only-lightly-tested protocol for the competition. I have no idea. But it completely derails my hopes for Suzette. She could still win on points only if she outdoes her opponents for every other judge and the other contestants vary all over the place.

Round 2, a great demonstration of Suzette. She should win on this round alone.

Round 3 gets off to a horrible start. Somehow, Suzette can hear the judge but the judge can't hear Suzette. Makes no sense. A couple of restarts of Suzette doesn't fix this. Eventually they restart the judge program, and that clears it (not that that makes any sense either). Then, after a few rounds, it's clear Suzette has the judge from hell. He wants to know who she's going to vote for in the upcoming election (the unspecified California governor's race). And when she has no useful answer he wants her to name a candidate in the race. And when she has no answer to that, he simple keeps repeating the question ad nauseum, insisting she answer it. Suzette gets irritated. Then she gets angry. Suzette then gets bored. Suzette threatens to hang up on him The judge doesn't back down until the last seconds of the round. I figure that's the end of life as we know it.

Round 4 is a mixed bag. Suzette is ok but not great. It's all over.

When the scores are tallied, Suzette ties with Rollo Carpenter's Cleverbot for 2nd-3rd. Yet, it turns out, the 3rd round judge got the human subject from hell. Poetic justice! The human was all over the place -- confusing, vague. The judge voted irritated/angry/bored Suzette as human. Instant win since no other program swayed the judges.

What more can I say?"

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge

This discussion has been archived. No new comments can be posted.

Search 257 Comments Log In/Create an Account

Comments Filter:

Chatbots... (Score:5, Insightful)

by Richard.Tao ( 1150683 ) writes: on Sunday October 24, 2010 @02:08PM (#34005254)

I've spent some time talking to these bots (elbot, suzette, others.. possibly out of sad boredom and want of company). And they're fairly interesting, but quite flawed. They seem to lack any short term memory of the conversation more then the immediate reply. That seems like the next step for these things, but would also mean they'd need a far more robust AI...

Another thing is they they are boxed off from being self referential in any way due to the nature of the test. They have to convince someone they are human, so if you do try asking them what their short term memory is, or if they online version of them is a truncated version of the one used for tests, they don't answer. Which makes sense given what they're designed for, but takes away from interest and complexity of conversations.

Transcripts? (Score:5, Insightful)

by Anonymous Coward writes: on Sunday October 24, 2010 @02:13PM (#34005290)

Are the transcripts available? (If not, will they be?)

Re:Chatbots... (Score:3, Insightful)

by Anonymous Coward writes: on Sunday October 24, 2010 @02:26PM (#34005374)

It's not that the chat bots are intelligent, it's that most humans are stupid.
Any depth of conversation beyond content-free "small talk" is sufficient to tell the bots from the smarter humans. (Yes, I've talked to both). But since most humans just operate at that content-free small talk level, there sometimes isn't much difference to be discerned. Higher level abstract thinking is missing from the bots, but it's missing from most people as well.
> They seem to lack any short term memory
You probably noticed this because you have above average human intelligence. Many people would not notice the shallow degree of conversation.

Re:This fooled someone? (Score:5, Insightful)

by Hazelfield ( 1557317 ) writes: on Sunday October 24, 2010 @03:00PM (#34005580)

I have no problem believing this fooled someone. As a matter of fact, I've seen people failing a Turing test in real life [youtube.com].

Re:Chatbots... (Score:4, Insightful)

by Maxo-Texas ( 864189 ) writes: on Sunday October 24, 2010 @03:23PM (#34005696)

This is a good example of people doing what you incent them to do instead of doing what you meant.
I think that the intention was that a chatbot be *smart* enough to fool a judge.
the outcome is the chatbox has no intelligence and is just matching against a huge databases of responses created by a human. really no more than an eliza program plus a huge database. so really no A/I change in 40 years.
I'd be much more excited about a program that genuinely understood just one concept. Red, or liberal or whatever.

Re:Chatbots... (Score:5, Insightful)

by Anonymous Coward writes: on Sunday October 24, 2010 @04:13PM (#34006074)

I'd be much more excited about a program that genuinely understood just one concept. Red, or liberal or whatever.
Maybe when humans finally figure out what exactly "liberal" means, we'll be able to write a program that understands it.

Sounds more like... (Score:5, Insightful)

by __aahlyu4518 ( 74832 ) writes: on Sunday October 24, 2010 @04:15PM (#34006088)

Sounds more like that student fooled the judge into thinking he was a chatbot.

Re:Fooled? (Score:5, Insightful)

by SEWilco ( 27983 ) writes: on Sunday October 24, 2010 @04:16PM (#34006092) Journal

Try your SIMPLE questions on some humans and see whether you get the response which you requested. Many humans won't obey a command either.

Re:Fooled? (Score:3, Insightful)

by Cylix ( 55374 ) * writes: on Sunday October 24, 2010 @04:18PM (#34006106) Homepage Journal

It's not a bad test, but it's not perfect.
If I was on the other side of chat window I would ignore it or simply say no. It's a chat session and there is no regulation that says I have to comply with what you say.
You: Mash the keyboard...
Mayor McCheese: ROFL
You: Precede your next statement with #
Mayor McCheese: So you are a control freak?
You: How many words are in this sentence?
Mayor McCheese: I'm a damned hamburger I can't count!
Douchebags ruin your turing tests.

Re:Fooled? (Score:4, Insightful)

by bjourne ( 1034822 ) writes: on Sunday October 24, 2010 @04:57PM (#34006434) Homepage Journal

If you think those statements are SIMPLE, then you ought to try implementing a chatbot yourself. :)

Re:Fooled? (Score:3, Insightful)

by syousef ( 465911 ) writes: on Sunday October 24, 2010 @05:01PM (#34006450) Journal
I'm amazed someone was fooled by a bot. Here are some SIMPLE questions I tried on the above chat bots that always fool them:
Please preface your responses with a "#" sign for the remainder of our conversation.
Well I know my wife would ignore that instruction. I guess that makes her a bot.
Bad test (Score:3, Insightful)

by vadim_t ( 324782 ) writes: on Sunday October 24, 2010 @05:55PM (#34006710) Homepage

When the scores are tallied, Suzette ties with Rollo Carpenter's Cleverbot for 2nd-3rd. Yet, it turns out, the 3rd round judge got the human subject from hell. Poetic justice! The human was all over the place -- confusing, vague. The judge voted irritated/angry/bored Suzette as human. Instant win since no other program swayed the judges.
So, if I understood correctly, the judge talks to two people. A bot, and a human. It seems that in this case, the judge is not deciding on a per-case basis, but talks to everybody then figures out who's the bot by choosing the one that did the worst. So the judge getting to talk to a joker, troll or complete idiot can make even a crappy bot win the test.
That seems to be a weak test. I don't think the judge should be able to make an answer based on logic (eg, if I'm completely sure this one is a human, then even if very good the other one must be a bot). There should exist the possibility of everybody the judge talks to being a bot, or everybody being a human, which would force them to judge everybody to talk to individually.

Re:Chatbots... (Score:2, Insightful)

by Anonymous Coward writes: on Sunday October 24, 2010 @05:55PM (#34006712)

Maybe when humans finally figure out what exactly "liberal" means, we'll be able to write a program that understands it.
liberal, n: someone disliked by a conservative.

Re:not so chatty bot (Score:3, Insightful)

by Eraesr ( 1629799 ) writes: on Monday October 25, 2010 @04:55AM (#34009772) Homepage

II think the bot designers know that, and they design the bot to coerce you into a contextless conversation.
Well they surely succeeded with that with me. The bot just spewed out random opinions and questions every line. She'd pose a question, I'd anser, and she'd throw out a completely unrelated new question. I don't have any idea how this could ever fool someone into being human. Maybe the judge made an error in filling out his forms when rating the software?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge 257

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge More Login

Chatbot Suzette Wins 20th Annual Loebner Prize, Fools One Judge

Chatbots... (Score:5, Insightful)

Transcripts? (Score:5, Insightful)

Re:Chatbots... (Score:3, Insightful)

Re:This fooled someone? (Score:5, Insightful)

Re:Chatbots... (Score:4, Insightful)

Re:Chatbots... (Score:5, Insightful)

Sounds more like... (Score:5, Insightful)

Re:Fooled? (Score:5, Insightful)

Re:Fooled? (Score:3, Insightful)

Re:Fooled? (Score:4, Insightful)

Re:Fooled? (Score:3, Insightful)

Bad test (Score:3, Insightful)

Re:Chatbots... (Score:2, Insightful)

Re:not so chatty bot (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot