Linguists Out Men Impersonating Women On Twitter 350
Hugh Pickens writes "Remember when the Gay Girl in Damascus revealed himself as a middle-aged man from Georgia? On a platform like Twitter, which doesn't ask for much biographical information, it's easy (and fun!) to take on a fake persona but now linguistic researchers have developed an algorithm that can predict the gender of a tweeter based solely on the 140 characters they choose to tweet. The research is based on the idea that women use language differently than men. 'The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance,' reports David Zax. Other research corroborates these findings, finding that women tend to use emoticons, abbreviations, repeated letters and expressions of affection more than men and linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet, the program could correctly identify gender 65.9% of the time. (PDF). Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research."
Let's hope that 15%... (Score:5, Insightful)
I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.
Re: (Score:2)
Well depends on how it increases (Score:4, Insightful)
A statistically significant amount of accurate based on a single, at most 140 character, statement is not a small thing, so long as it scales with more. If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting. If it is 65% right all the time, then yes it was rather a waste.
Re: (Score:2)
If it is 65% right all the time, then yes it was rather a waste.
Disagree - that's 2:1 odds, which is still pretty huge all on its own.
Re: (Score:3)
Re:Well depends on how it increases (Score:5, Funny)
I go to my congressional office, take my shirt off, arrange my family photos in the background, and take a picture to send to them.
Re: (Score:2)
hahaha, dammit, my mod points expired yesterday. You, sir, made me LOL. I assume you're a guy since there's no exclamation points or smileys! :-)
Re: (Score:2)
It's probably best I'm a guy. Have you seen the women they elect to congress?
Well, I guess if you got one of the Republicans you'd be OK. Democrats seem to vote for brains.
Re: (Score:3)
"How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?"
You look at more of their tweets until you're 98% sure. Then target your advertising.
GONG! Thanks for playing.
Re: (Score:2)
How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?
What's your point? 2:1 doesn't qualify as huge to you? If it were 10:1 odds what now?
Re: (Score:3)
How is this "huge?" What the hell are you going to do with it?
That's what she said. Or maybe it was a he -- I'm confused now.
Re: (Score:3, Funny)
Interesting stuff. I wrote the first revision of my best friend's profile for match.com (I'm a man, she's a woman) simply because she was just awful at putting her best foot forward. She tweaked it, but I wonder how that would have come out under such analysis.
;)
Noooo! She's not a lithe fifty year old target shooting yoga instructor, she's a MAN!
Re:Well depends on how it increases (Score:4, Insightful)
I wonder what the proportions are on tweets that are deliberately intending to be misleading. Getting a 65% hit rate on people who are attempting to deceive is much more impressive than 65% who aren't making any attempt to obfuscate their gender.
Re: (Score:2)
Once you know to add hahaha, lol, :-) and such wouldn't the deception be better? In some ways this is like detecting fake accents -- you can learn to speak with another accent with practice.
Re: (Score:2)
They claim 75% accuracy when they analyze every tweet the account produced.
Re: (Score:3)
We'll just pay the researchers in bitcoins.
Re: (Score:2, Insightful)
It seems to me that there are more men than women posting on twitter, so guessing man on every tweet might yield a higher accuracy than this algorithm.
Re: (Score:2)
According to the study's (dubious) technique of identifying the gender of tweeters, the breakdown is 42% female, 35% male, 23% unidentified.
Re:Let's hope that 15%... (Score:4, Insightful)
oh, yea,
MOD PARENT UP (Score:2)
Re: (Score:3)
That's a 65% prediction rate based on a single tweet. The authors report a 92% success rate for the best classifier on the entire set. If they restrict the data set just to tweet texts (but more than one), they achieved a 76% success rate. That still might not satisfy you, but the authors also report that only 5 in 130 people correctly classified 100 tweets with a higher accuracy.
Re: (Score:2)
What's the male/female split of posters on Twitter?
If 65.7% of users are male I can guess what gender a poster is and I'll get it right 65.7% of the time.
Re: (Score:2)
What's the male/female split of posters on Twitter?
It's in the article, and it's closer than 60/40.
Re: (Score:2)
Re: (Score:2)
I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.
Besides, all they've done by publishing this is give people wanting to fake their gender some good pointers on how to do it.
Re: (Score:2)
I wonder how much the traders paid for their algorithms that had much less than 50% chance rate of predicting future housing prices?
It's even worse (Score:3)
It's even worse. The initial assumption was that 55% of the users were female, so basically a hardcoded 'return "female";' would already guess with 55% accuracy. Bumping it to 65% is actually only a 10% bump.
But that assumption is purely based on what people declared on their account on Twitter, i.e., basically trusting that everyone who labeled themselves "female" is actually female, and everyone who labeled themselves "male" is actually male. The caveat there needs not be detailed.
Basically, they have 100
Re: (Score:2)
That's 65% with just one tweet, though; presumably quality is better given more tweets as sample data.
I wouldn't be so sure. I think the uncertainty might have less to do with a given user's linguistic variation from one tweet to the next, and more to do with the fact that gender isn't actually the sole determinant of how people talk. If 30% of women consistently produce "male-sounding" language and 30% of men produce "female-sounding" language, according to whatever metric these researchers have come up with, that's quite different from if all women produce "male-sounding" language 30% of the time and vice
Re: (Score:2)
(this is all from merely RTFS, though. I'm not really interested enough to look at their data, but which stat they chose to put in the summary speaks volumes.)
Re: (Score:2)
Re: (Score:2)
I think this whole discussion is sort of missing the point.
First, someone is trying to fool the world into believing that they are the other gender. So, we have an intelligence gaming the system, for whatever reasons. Fun, scamming, whatever - we have an intelligence attempting to fool all other intelligences on the web.
The average person has no idea whether this individual is actually male or female, when they meet that person online. Previously, the primary indicators were the screen name, and whatever
Re: (Score:3)
*pout*
*sigh*
sooooo tired of the 'no women teh interwebs' meme
Re: (Score:3)
I thought Lay-D-Boy was a counterfeit armchair till I went to Bangkok.
Re: (Score:3)
I suppose I could always google it. Oops, I think I broke the algorithim.
Re: (Score:2)
They also had a handful of Amazon Mechanical Turk users identify gender for the same tweets and they were 67.3% accurate for single tweets compared to the automated system's 67.8%.
Re: (Score:2)
why not have the researchers break windows for a living?
there is good natural language research. this, however, could be done (given the data) by one person in a few hours with prepack software: http://cran.r-project.org/web/packages/textcat/index.html [r-project.org]
Who Knew! (Score:2)
Re: (Score:2)
I'm totally going to hell for this but in order to re-enforce my manhood, I must say:
My zipper was down and my wife found my gf. My nigga wanted my beer and my shorts! I took my jeep and my woman to my vegas timeshare.
(Here! [fastcompany.com])
Re: (Score:3)
I'm totally going to hell for this but in order to re-enforce my manhood, I must say:
My zipper was down and my wife found my gf. My nigga wanted my beer and my shorts! I took my jeep and my woman to my vegas timeshare.
(Here! [fastcompany.com])
You used an exclamation mark you are clearly a woman.
Re: (Score:2)
I hate the word "hubby". It's the linguistic equivalent of shopping for groceries in slippers and a mumu.
Re: (Score:2)
Re:Who Knew! (Score:5, Funny)
The mere fact that you show emotion outs you. Real men only use periods and commas, AND TYPE IN ALL CAPS BECAUSE REAL MEN ARE ALWAYS SHOUTING.
Re: (Score:2)
chuckles
Re: (Score:3)
I don't have mod points, so I have to post a comment to tell you, "funniest thing I read all day". You boys are weird :P
Re:Who Knew! (Score:4, Funny)
We use a FULL STOP. Cus when I tell that sentence to end it motherfucking does. Bitches.
Re: (Score:2)
It's ok. The program filters for exclamation points used after the opening word or after all-caps.
Women do use lol more though, to use up some of the 120 characters left over after anything intelligent they had to say.
Or... (Score:2)
Or it can be used as a training tool for would-be impersonators.
Re: (Score:3)
> Or it can be used as a training tool for would-be impersonators.
Or to test gender-altering scripts. OMG! :)
Re: (Score:2)
I see your point. It won't fool the people who use common sense but it -will- fool the people who use the same software. :)
Re: (Score:2)
tits or gtfo
Linguists Need to Visit a Starbucks Occasionally (Score:5, Funny)
The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting
or a Mac user.
Uh Oh (Score:2)
Then again, the nickname probably isn't helping either...
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Would playing one in a orchestra suffice?
The Internet... (Score:2)
V.S. Naipaul was right? (Score:2)
Nobel laureate V. S. Naipaul recently caused an uproar when he claimed, among other things, that he could identify the gender of an author from their work [care2.com]:
In what must have been an attempt to be as offensive as possible, he continued, saying that men’s and women’s writing is “quite different I read a piece of writing and within a paragraph or two I know whether it is by a woman or not. I think [it is] unequal to me.”
I guess this means he was right? Although, for the record, he still seems like an arrogant, sexist SOB--just not for this particular reason.
Well... (Score:2)
I've spent a long time online. And I can pretty easily and reliably deduce gender from what someone "says". Women write differently. I wouldn't say they write worse, just differently.
Re: (Score:2)
Women write differently. I wouldn't say they write worse, just differently.
From my own vast experience, I predict you are a man.
man vs. machine (Score:2)
Re: (Score:2)
I would off-the-cuff suggest that it's unlikely that humans are good at it. We have a high-tendency to let previous biases or preconceptions of people get in the way.
So, for example, you see the picture of a tweeter as a female, and you impose that belief into his/her text.
Re: (Score:2)
Re: (Score:2)
True. But there are people who are good at identifying those situations where the gender doesn't match the behavior. In real life, its called 'gaydar'. On line, it could just be a phony picture and a poser.
The gender-behavior mismatch is evident (I've been told) from the writing of the subjects in question. Not just the choice of words or little hearts where the periods should be, but based on the style of writing and subject matter. Apparently, a transcript of a conversation (or series of e-mails) betwee
Re:man vs. machine (Score:4, Informative)
True. But there are people who are good at identifying those situations where the gender doesn't match the behavior. In real life, its called 'gaydar'. On line, it could just be a phony picture and a poser.
The gender-behavior mismatch is evident (I've been told) from the writing of the subjects in question. Not just the choice of words or little hearts where the periods should be, but based on the style of writing and subject matter. Apparently, a transcript of a conversation (or series of e-mails) between individuals produces a more accurate determination than an essay.
Yes, humans widely use language differently based on their own subcultures. Women particularly in some cultures speak an entirely different language from the gender-neutral language spoken by everyone. In some languages such as Japanese gendered language is extremely readily apparent, and when I was chatting on Japanese chatrooms, it was nice to be able to identify the gender of the speaker in one or two lines of text from them.
In much the same way, while we often are of the belief that men and women use language the same way in English, because it's not readily apparent, we do actually use language differently. Here is another interesting one: women use fewer contractions than men. Weird but oddly true.
All of this has less to do with "gaydar" than that every subculture speaks a slightly different dialect. Gay men have a selection of words that set them off, (I actually commented to a gay-rights group, where I was an "ally" of gay-rights, that they were using "fabulous" like... A LOT. And I was all, "um... do you REALLY want to be projecting the notion that this stereotype is valid and accurate? Because that is what you are doing.") and this does not mean that gay men talk like women. They actually talk differently and distinctly from women, but in this world of false dichotomies that we live in, we presume that if gay men don't talk the same way as straight men, then they must talk like women. But, in reality, this isn't actually correct.
Oh this ought to be good (Score:2)
I wonder what would happen if you fed my stuff to this algorithm. I'm transsexual and hang out in very different environments depending on which of my friends I psend time with. It can range from LANs to baking parties. On the overall I'd say I'm a poor fit for both male and female stereotypes. It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.
Re:Oh this ought to be good (Score:4, Funny)
It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.
I guess I don't quite see what their weight has to do with anything...
Recognizing gender 65.9% based on one tweet (Score:4, Insightful)
What was the gender distribution of the tweets this was tested against? If 65.9% of the tweets were from a male, the algorithm "return Gender.male;" will get the gender right 65.9% of the time...
Re: (Score:2)
Also how did they know that the users claiming to be women were actually women. How were the 18000 users selected? Perhaps the article does clarify these things but I'd rather sleep now than read it.
Re: (Score:3, Informative)
Linguists don't "out" anyone. (Score:2)
Linguists know that "out" is not a verb.
Re: (Score:2)
http://en.wikipedia.org/wiki/Outing [wikipedia.org]
http://www.thefreedictionary.com/outing [thefreedictionary.com]
http://www.merriam-webster.com/dictionary/outing [merriam-webster.com]
Linguists know that their job is to document usage, not prescribe it.
What's the unicode ... (Score:2)
Re: (Score:2)
Don't know, but there's U+2763 "Heavy Heart Exclamation Mark Ornament".
Tried to type it, but the /. comment system still doesn't take Unicode... In 2011.... It wouldn't accept the HTML code for it either. Stripped it right out.
It's not on the whitelist (5:erocS) (Score:2)
It wouldn't accept the HTML code for it either. Stripped it right out.
Slashdot instituted a code point whitelist after the erocS incident [slashdot.org].
Re: (Score:2)
U+01D0 LATIN SMALL LETTER I WITH CARON (Score:2)
Gender Inequality (Score:4, Insightful)
Re: (Score:3, Informative)
From TFA (http://images.fastcompany.com/upload/a_variousfields.png):
Feature: Accuracy
Baseline (Female): 54.9%
One tweet: 65.9%
Description: 71.2%
All tweets: 75.8%
Screen Name: 77.1%
Full Name: 89.1%
Tweets + screen name: 81.4%
Screen name + description + all tweets: 84.3%
All four fields: 92%
Honestly, 77% based on screen name alone was the most interesting result to me.
Re: (Score:2)
False Headline on Slashdot - News at 11 (Score:2)
This study in no way outs men impersonating women. In fact it specifically identifies gender for analysis by comparing it to the linked blog/website profile information and assuming that "the effort involved in maintaining this deception in two different places suggests that the blog labels on Twitter data are largely reliable". Basically it assumes that anyone attempting to impersonate the opposing gender is a tech ignorant moron that has made no effort to create a persona - something that is contrary to
Love the last line in TFS (Score:2)
Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research
Better written as
Depending on how successful the program is proven to be, it could be used for ad-targeting (EVIL), or for socio-linguistic research (GOOD)
Re: (Score:2)
it could be used for ad-targeting (EVIL),
Why? If I'm going to be subjected somehow to advertisements, it might as well be for something I'm actually interested in.
NOT Impersonating (Score:2)
I read this article and didn't find anything about impersonation of gender. If anything, part of detection algorithm is based on looking for terms like "my wife/my gf" vs. "my hubby/my boyfriend". I think these terms are pretty self-explanatory (well, at least the former, in most states, the latter may belong to a gay user). In any case, this is a fairly trivial method of determining gender and one that seems to be quite basic and naive, in a way that any impersonator, even not particularly determined, woul
Re: (Score:2)
even worse: without any malice whatsoever, the situation described in the headline is potentially broken unless lesbians have roughly the same likely/unlikely words as women-as-a-whole.
judging from the wordlist, any single male graduate student who practices yoga and eats yogurt may well be misclassified.
Hyperbole and Male Language Use (Score:2)
One of the gender differences in English uses that has interested me most is the male tendency to use absolutes more often. A lot of it seems to stem from the sort of "fish story" and humor-based phase of social bonding that begins for most boys in grade school. Men are more likely to say "always" when they mean "usually", "never" when they mean "rarely", etc... which tends to mean that those pedants among us who try to use more precise language sometimes end up appearing more effeminate, or weak (i.e. "You
Re: (Score:3)
Now THAT is insightful. That would explain things. It would explain why some people told me over the years that I "talked like a girl" because I spoke properly, and precisely in that nerdy way. By my standards, most men are sloppy speakers. Even my sister pointed this out to me at a drive thru some years back, she said most men would say "I wanna burger, fries, and coke." and then stop and drive on, while I said, "I would like a hamburger, medium order of fries and medium coke, please, and that will be
OMG! I as a man will not love this AT ALL!!!!! (Score:2)
Stop showing me ads for tampons, damnit. I'M A MAAAANNNNN!!!!!!
Number 1 Clue (Score:3, Funny)
Identify gender 65.9% of the time. Right. (Score:2)
program could correctly identify gender 65.9% of the time.
Vs. 50% for random?
Accurate (Score:3)
Re: (Score:2)
Re: (Score:2)
Yeah, it guessed I was male, so it got that wrong too. Looking at the details of the twitter algorithm, this, too, would probably mistake me for a man.
I find the girly squee stuff off-putting. Most of the women I follow on twitter are engineers and scientists, so I don't see that much of it. But when I was looking for a forum to get some support regarding pregnancy, I couldn't find anything remotely comfortable. I was looking for a bit of quiet reflection and rational advice, but all the forums were domina
Re: (Score:2)
The test set represented 18000 users. The probability of flipping 18000 coins and getting 65.9% heads or more is 8.1e-405.
Re: (Score:2)
Quite seriously though, I'm a straight guy, and I make heavy use of exclamation marks, emoticons, "omg", "haha" and "love" in IM conversations, although not so much when blogging. I don't use Twitter, so one can't say whether I'd show these girly traits there.
Re: (Score:2)
I'm a lesbian trapped in a man's body.
Heh. In our house, my wife occasionally comments on how several well-known online companies (including netflix and google) seem to have decided that she's a gay male. If so, she's very good at impersonating a straight female when I'm around. ;-)
So far, we haven't actually found any downside to this, but it's not hard to imagine situations where it could cause serious problems. For example, the guys who killed Matt Shepard [wikipedia.org] might not believe that her "disguise" isn't a disguise. Such things happen in o
Re: (Score:2)
One thought, I suppose, might be "How can a lot of us work to sabotage things like this and poison their inferences?" Another might be "Is there a way we can learn about who is getting such inferred info about us, and what they're planning to do with it?" Or "It there a way we can find out who has bought this information, and sue the perpetrators if the information is incorrect?"
There really isn't a way to be able to sue them, unless you consider being called the wrong sex defamation, but even if you do, I doubt that courts would really recognize it as an actionable claim.
Re: (Score:2)
Except when it's not.
Re:The only reason for the deduction is... (Score:5, Insightful)
Not entirely true I am afraid.
Several experiments were conducted in the 60s and 70s on children raised in gender neutral parenting conditions, that focused on toy choices.
The experiment was intended to show the impact of societal imperitives on children and gender identities and gender specific behaviors, using toy preferences as metric.
The result of the test STILL had little girls favoring dollies with bright colors, and boys favoring machines and soldier type toys, even when very carefully imposed gender neutrality parenting was in effect, even from very young ages.
This is somewhat reinforced by more modern research into the physiological differences between male and female nervous systems.
The idea that men and women might intrinsically focus more on different concepts (and thus, relate to their environments differently from each other, and as such, describe them differently in literature) is not really all that far-fetched.
It is simply politically incorrect to state that women might actually have a biological proclevity toward being the "Domestic" partner in relationships given the current political climate of our western post-sufferage societies.
Somehow, "Staying home, taking care of babies, and doing the chores all day." is seen as a degrading thing, while "Standing in an assembly line inserting part A into assembly B ad nauseum all day" is somehow seen in an idealized fashion as a kind of "Freedom"-- however sick that might be in reality not withstanding.
Now, if you want to complain about women being statistically paid less than men, I will strongly support your argument that it (the practice) is based on pure bull--- But the statement that men and women are innately gender neutral and get conditioned exclusively by stereotypes? that is not supported by behaviorists.
Gender stereotypes simply reinforce already existent behaviors, for better or for worse.
Re: (Score:2)
Let's see - sneakers, flip-flops, black formal, dark brown formal, light brown formal, tennis, golf, badminton, slip-ons. I'm at 9 and I don't even have a white pair!
Re: (Score:3)