Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Twitter Communications Social Networks

Linguists Out Men Impersonating Women On Twitter 350

Hugh Pickens writes "Remember when the Gay Girl in Damascus revealed himself as a middle-aged man from Georgia? On a platform like Twitter, which doesn't ask for much biographical information, it's easy (and fun!) to take on a fake persona but now linguistic researchers have developed an algorithm that can predict the gender of a tweeter based solely on the 140 characters they choose to tweet. The research is based on the idea that women use language differently than men. 'The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance,' reports David Zax. Other research corroborates these findings, finding that women tend to use emoticons, abbreviations, repeated letters and expressions of affection more than men and linguists have also developed a list of gender-skewed words used more often by women including love, ha-ha, cute, omg, yay, hahaha, happy, girl, hair, lol, hubby, and chocolate. Remarkably, even when only provided with one tweet, the program could correctly identify gender 65.9% of the time. (PDF). Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research."
This discussion has been archived. No new comments can be posted.

Linguists Out Men Impersonating Women On Twitter

Comments Filter:
  • by Lead Butthead ( 321013 ) on Thursday July 28, 2011 @06:01PM (#36914946) Journal

    I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.

    • I dunno'. What's 1 SD worth?
    • by Sycraft-fu ( 314770 ) on Thursday July 28, 2011 @06:09PM (#36915044)

      A statistically significant amount of accurate based on a single, at most 140 character, statement is not a small thing, so long as it scales with more. If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting. If it is 65% right all the time, then yes it was rather a waste.

      • If it is 65% right all the time, then yes it was rather a waste.

        Disagree - that's 2:1 odds, which is still pretty huge all on its own.

        • How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?
          • I go to my congressional office, take my shirt off, arrange my family photos in the background, and take a picture to send to them.

            • by mdf356 ( 774923 )

              hahaha, dammit, my mod points expired yesterday. You, sir, made me LOL. I assume you're a guy since there's no exclamation points or smileys! :-)

              • by raehl ( 609729 )

                It's probably best I'm a guy. Have you seen the women they elect to congress?

                Well, I guess if you got one of the Republicans you'd be OK. Democrats seem to vote for brains.

          • "How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?"
             
            You look at more of their tweets until you're 98% sure. Then target your advertising.
             
            GONG! Thanks for playing.

          • How is this "huge?" What the hell are you going to do with it? Someone tweets and uses an exclamation point, so you... what now?

            What's your point? 2:1 doesn't qualify as huge to you? If it were 10:1 odds what now?

          • How is this "huge?" What the hell are you going to do with it?

            That's what she said. Or maybe it was a he -- I'm confused now.

      • Re: (Score:3, Funny)

        If that means that with a few statements or a longer statement you get in to the high 90s then that would be quite interesting.

        Interesting stuff. I wrote the first revision of my best friend's profile for match.com (I'm a man, she's a woman) simply because she was just awful at putting her best foot forward. She tweaked it, but I wonder how that would have come out under such analysis.

        Noooo! She's not a lithe fifty year old target shooting yoga instructor, she's a MAN! ;)

      • by LordLucless ( 582312 ) on Thursday July 28, 2011 @06:53PM (#36915484)

        I wonder what the proportions are on tweets that are deliberately intending to be misleading. Getting a 65% hit rate on people who are attempting to deceive is much more impressive than 65% who aren't making any attempt to obfuscate their gender.

        • by mdf356 ( 774923 )

          Once you know to add hahaha, lol, :-) and such wouldn't the deception be better? In some ways this is like detecting fake accents -- you can learn to speak with another accent with practice.

      • They claim 75% accuracy when they analyze every tweet the account produced.

    • We'll just pay the researchers in bitcoins.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      It seems to me that there are more men than women posting on twitter, so guessing man on every tweet might yield a higher accuracy than this algorithm.

      • According to the study's (dubious) technique of identifying the gender of tweeters, the breakdown is 42% female, 35% male, 23% unidentified.

    • by bwayne314 ( 1854406 ) on Thursday July 28, 2011 @06:31PM (#36915244)
      HAHA! omg, thats soooo cute! ....

      oh, yea, :)
    • by he-sk ( 103163 )

      That's a 65% prediction rate based on a single tweet. The authors report a 92% success rate for the best classifier on the entire set. If they restrict the data set just to tweet texts (but more than one), they achieved a 76% success rate. That still might not satisfy you, but the authors also report that only 5 in 130 people correctly classified 100 tweets with a higher accuracy.

    • by shermo ( 1284310 )

      What's the male/female split of posters on Twitter?

      If 65.7% of users are male I can guess what gender a poster is and I'll get it right 65.7% of the time.

      • by tepples ( 727027 )

        What's the male/female split of posters on Twitter?

        It's in the article, and it's closer than 60/40.

    • by EEPROMS ( 889169 )
      What about metrosexual or gay men/woman as they cross the lines. Anyone trying to define sexuality as a yes or no (just ask a biologist what is natural and they will laugh at you) based on a set language subset is in for a demoralising hiding.
    • I hope that extra 15% certainty didn't cost millions in research grants; as a blind guess has 50% chance of being right.

      Besides, all they've done by publishing this is give people wanting to fake their gender some good pointers on how to do it.

    • I wonder how much the traders paid for their algorithms that had much less than 50% chance rate of predicting future housing prices?

    • It's even worse. The initial assumption was that 55% of the users were female, so basically a hardcoded 'return "female";' would already guess with 55% accuracy. Bumping it to 65% is actually only a 10% bump.

      But that assumption is purely based on what people declared on their account on Twitter, i.e., basically trusting that everyone who labeled themselves "female" is actually female, and everyone who labeled themselves "male" is actually male. The caveat there needs not be detailed.

      Basically, they have 100

  • Huh...the word "hubby" is used more by women. Who knew!
    • I'm totally going to hell for this but in order to re-enforce my manhood, I must say:

      My zipper was down and my wife found my gf. My nigga wanted my beer and my shorts! I took my jeep and my woman to my vegas timeshare.

      (Here! [fastcompany.com])

      • I'm totally going to hell for this but in order to re-enforce my manhood, I must say:

        My zipper was down and my wife found my gf. My nigga wanted my beer and my shorts! I took my jeep and my woman to my vegas timeshare.

        (Here! [fastcompany.com])

        You used an exclamation mark you are clearly a woman.

    • I hate the word "hubby". It's the linguistic equivalent of shopping for groceries in slippers and a mumu.

    • by PPH ( 736903 )
      New York state just screwed up the statistics on that.
  • by Shillo ( 64681 )

    Or it can be used as a training tool for would-be impersonators.

  • by RobotRunAmok ( 595286 ) on Thursday July 28, 2011 @06:03PM (#36914964)

    The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting

    or a Mac user.

  • Apparently I have very feminine text messages/tweets, as I use excessive emoticons, exclamation points, and affectionate pet names (though those are directed towards females). And here I thought I had solidified my masculinity when I burnt all my pink shirts.

    Then again, the nickname probably isn't helping either...
  • where men are women, women are women and kids are cops.
  • Nobel laureate V. S. Naipaul recently caused an uproar when he claimed, among other things, that he could identify the gender of an author from their work [care2.com]:

    In what must have been an attempt to be as offensive as possible, he continued, saying that men’s and women’s writing is “quite different I read a piece of writing and within a paragraph or two I know whether it is by a woman or not. I think [it is] unequal to me.”

    I guess this means he was right? Although, for the record, he still seems like an arrogant, sexist SOB--just not for this particular reason.

    • I've spent a long time online. And I can pretty easily and reliably deduce gender from what someone "says". Women write differently. I wouldn't say they write worse, just differently.

      • Women write differently. I wouldn't say they write worse, just differently.

        From my own vast experience, I predict you are a man.

  • Do we have a benchmark for how well a human can detect genders? I understand being automagic has some special applications, but it seems like a useful point of comparison for its accuracy.
    • I would off-the-cuff suggest that it's unlikely that humans are good at it. We have a high-tendency to let previous biases or preconceptions of people get in the way.

      So, for example, you see the picture of a tweeter as a female, and you impose that belief into his/her text.

      • How other factors like handling of profile data come into play would be interesting, but we could isolate the profile factors by just letting the human read the text of the tweet, and nothing else.
      • by PPH ( 736903 )

        True. But there are people who are good at identifying those situations where the gender doesn't match the behavior. In real life, its called 'gaydar'. On line, it could just be a phony picture and a poser.

        The gender-behavior mismatch is evident (I've been told) from the writing of the subjects in question. Not just the choice of words or little hearts where the periods should be, but based on the style of writing and subject matter. Apparently, a transcript of a conversation (or series of e-mails) betwee

        • Re:man vs. machine (Score:4, Informative)

          by snowgirl ( 978879 ) on Friday July 29, 2011 @12:51AM (#36918092) Journal

          True. But there are people who are good at identifying those situations where the gender doesn't match the behavior. In real life, its called 'gaydar'. On line, it could just be a phony picture and a poser.

          The gender-behavior mismatch is evident (I've been told) from the writing of the subjects in question. Not just the choice of words or little hearts where the periods should be, but based on the style of writing and subject matter. Apparently, a transcript of a conversation (or series of e-mails) between individuals produces a more accurate determination than an essay.

          Yes, humans widely use language differently based on their own subcultures. Women particularly in some cultures speak an entirely different language from the gender-neutral language spoken by everyone. In some languages such as Japanese gendered language is extremely readily apparent, and when I was chatting on Japanese chatrooms, it was nice to be able to identify the gender of the speaker in one or two lines of text from them.

          In much the same way, while we often are of the belief that men and women use language the same way in English, because it's not readily apparent, we do actually use language differently. Here is another interesting one: women use fewer contractions than men. Weird but oddly true.

          All of this has less to do with "gaydar" than that every subculture speaks a slightly different dialect. Gay men have a selection of words that set them off, (I actually commented to a gay-rights group, where I was an "ally" of gay-rights, that they were using "fabulous" like... A LOT. And I was all, "um... do you REALLY want to be projecting the notion that this stereotype is valid and accurate? Because that is what you are doing.") and this does not mean that gay men talk like women. They actually talk differently and distinctly from women, but in this world of false dichotomies that we live in, we presume that if gay men don't talk the same way as straight men, then they must talk like women. But, in reality, this isn't actually correct.

  • I wonder what would happen if you fed my stuff to this algorithm. I'm transsexual and hang out in very different environments depending on which of my friends I psend time with. It can range from LANs to baking parties. On the overall I'd say I'm a poor fit for both male and female stereotypes. It would also be fun to see what it would do with my lesbian friends, many of which are immense tomboys.

  • by wurp ( 51446 ) on Thursday July 28, 2011 @06:46PM (#36915388) Homepage

    What was the gender distribution of the tweets this was tested against? If 65.9% of the tweets were from a male, the algorithm "return Gender.male;" will get the gender right 65.9% of the time...

    • by raynet ( 51803 )

      Also how did they know that the users claiming to be women were actually women. How were the 18000 users selected? Perhaps the article does clarify these things but I'd rather sleep now than read it.

    • Re: (Score:3, Informative)

      by ahziem ( 661857 )
      55% female according to the linked paper
  • Linguists know that "out" is not a verb.

  • ... for the glyph of an 'I' dotted by a little heart?

  • Gender Inequality (Score:4, Insightful)

    by FrootLoops ( 1817694 ) on Thursday July 28, 2011 @07:04PM (#36915638)
    From the paper, in their data set 47.7% of tweets were from females, 32.8% were from males, and the rest was unspecified. Tossing out the unspecified ones, guessing "female" all the time would then give ~59% accuracy. On the surface that makes the 65.9% figure in the summary very lackluster, though better figures are reported with more information elsewhere in the article.
    • Re: (Score:3, Informative)

      by Demogoblin ( 249774 )

      From TFA (http://images.fastcompany.com/upload/a_variousfields.png):

      Feature: Accuracy
      Baseline (Female): 54.9%
      One tweet: 65.9%
      Description: 71.2%
      All tweets: 75.8%
      Screen Name: 77.1%
      Full Name: 89.1%
      Tweets + screen name: 81.4%
        Screen name + description + all tweets: 84.3%
      All four fields: 92%

      Honestly, 77% based on screen name alone was the most interesting result to me.

  • This study in no way outs men impersonating women. In fact it specifically identifies gender for analysis by comparing it to the linked blog/website profile information and assuming that "the effort involved in maintaining this deception in two different places suggests that the blog labels on Twitter data are largely reliable". Basically it assumes that anyone attempting to impersonate the opposing gender is a tech ignorant moron that has made no effort to create a persona - something that is contrary to

  • Depending on how successful the program is proven to be, it could be used for ad-targeting, or for socio-linguistic research

    Better written as

    Depending on how successful the program is proven to be, it could be used for ad-targeting (EVIL), or for socio-linguistic research (GOOD)

    • by Leebert ( 1694 ) *

      it could be used for ad-targeting (EVIL),

      Why? If I'm going to be subjected somehow to advertisements, it might as well be for something I'm actually interested in.

  • I read this article and didn't find anything about impersonation of gender. If anything, part of detection algorithm is based on looking for terms like "my wife/my gf" vs. "my hubby/my boyfriend". I think these terms are pretty self-explanatory (well, at least the former, in most states, the latter may belong to a gay user). In any case, this is a fairly trivial method of determining gender and one that seems to be quite basic and naive, in a way that any impersonator, even not particularly determined, woul

    • even worse: without any malice whatsoever, the situation described in the headline is potentially broken unless lesbians have roughly the same likely/unlikely words as women-as-a-whole.

      judging from the wordlist, any single male graduate student who practices yoga and eats yogurt may well be misclassified.

  • One of the gender differences in English uses that has interested me most is the male tendency to use absolutes more often. A lot of it seems to stem from the sort of "fish story" and humor-based phase of social bonding that begins for most boys in grade school. Men are more likely to say "always" when they mean "usually", "never" when they mean "rarely", etc... which tends to mean that those pedants among us who try to use more precise language sometimes end up appearing more effeminate, or weak (i.e. "You

    • Now THAT is insightful. That would explain things. It would explain why some people told me over the years that I "talked like a girl" because I spoke properly, and precisely in that nerdy way. By my standards, most men are sloppy speakers. Even my sister pointed this out to me at a drive thru some years back, she said most men would say "I wanna burger, fries, and coke." and then stop and drive on, while I said, "I would like a hamburger, medium order of fries and medium coke, please, and that will be

  • OMG! Awsoooooomeee! Ha-ha, TV. How cute.

    Stop showing me ads for tampons, damnit. I'M A MAAAANNNNN!!!!!!
  • by Greyfox ( 87712 ) on Thursday July 28, 2011 @08:15PM (#36916412) Homepage Journal
    It's pretty easy to tell if she often tweets about her penis.
  • program could correctly identify gender 65.9% of the time.

    Vs. 50% for random?

  • by ZombieBraintrust ( 1685608 ) on Friday July 29, 2011 @10:45AM (#36922060)
    Don't most people pretending to be female on twitter fill their tweets with stereotypical female language? This would only catch pretenders who are really lazy and incompatent.

Power corrupts. And atomic power corrupts atomically.

Working...