Project Anonymizes Your Writing Style To Hide Your Identity 103
mikejuk writes "An open source project to combat 'stylometry,' the study of attributing authorship to documents based only on the linguistic style they exhibit, is proving that it is possible to change writing style to evade detection. Artificial Intelligence techniques are routinely used to detect plagiarism and recently were employed to reveal that Harry Potter author J. K. Rowling is indeed the author of The Cuckoo's Calling, which was published under the byline of Robert Galbraith. Now software is tackling the opposite problem — anonymizing writing style to protect the identity of the originator. The JStylo-Anonymouth (JSAN) framework is a work in progress at the Privacy, Security and Automation Lab (PSAL) at Drexel University. It analyzes a written text and detects features which could be used to identify the author. It then suggests changes that need to be made to avoid the author's stylistic fingerprint appearing in the work."
I don't know (Score:5, Funny)
How will it disguise my terrible opinions that are obviously wrong?
Re:I don't know (Score:5, Funny)
Those blend right in with the rest of the internet.
Re: (Score:1)
Come on, I said "wrong" not "worse than a hypothetical Hitler-Stalin hybrid"
Re: (Score:1, Offtopic)
Sorry, but Obama is making the (R) GWB look like a saint.
The whole political disdain for all things (R) among the liberal elites here on /. is simply amazing to watch. What is unacceptable in an (R) is perfectly fine with Obama. The whole double standard the political duopoly in the US is schizophrenic, and very telling.
Re: (Score:2)
Intelligent comment. Exactly what I expect from a (D) lemming.
Re:I don't know (Score:4, Informative)
Dude, let it go, this thread was started on a post about how everyone's opinions are wrong. Not a good context for debate.
Re: (Score:1)
Magical mystery: How it was an inside job, coordinated in 9 months? Followed by it was a bunch of people coordinating an attack in 9 months. Freaking Amazing how you believe that 9/11 was GWB's fault. And here I thought GWB was a bumbling fool who was too stupid to be president, and you have him being a freaking GENIUS who caused people to blow up buildings, so he could implement Patriot Act, while ignoring that Obama as done Patriot Act on Steroids. Let me guess, that too was GWB's fault.
And I am not a (R)
Re: (Score:2)
How will it disguise my terrible opinions that are obviously wrong?
It won't, it will just attribute them to Francis Bacon.
Re: (Score:2)
Great. Benjamin Franklin is going to end up the only person to have a valid opinion.
Re: (Score:3)
Cardinal Richelieu (supposedly) wrote: "If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him." Will the JStylo-Anonymouth mean that he'd be able to hang everyone who used it?
Re: (Score:2)
It will post them on slashdot.
Re: (Score:2)
Which person posted this?
There is simply not enough data in your post to find that out. You would probably have to write a few paragraphs of text in your natural style to give the algorithm any real chance.
The Cuckoo's Calling (Score:5, Informative)
Uhm, what? It was revealed by someone at Rowlings agency tweeting it to a Sunday Times reporter, after the reporter commented on how good it was for a debut novel - that has all been confirmed by the agency.
Unless the above line is badly phrased and is meant to say "recently were employed to confirm prior reports that..." - it didn't reveal anything of the sort, the link had already been revealed by plain old journalism.
Re:The Cuckoo's Calling (Score:4, Informative)
No it was revealed by a partner at the law firm who should have known better, and should now face sanctions from the Law Society. Being struck of the register would be about right.
On the other hand they have already reached an out of court settlement for a substantial sum, which probably came out the partners own pocket. I would also imagine the firm has lost the JKR account.
Re: (Score:1)
Re: (Score:2)
Well I heard it was revealed by the wife of a partner. Slightly better but not by much.
Was the wife legal counsel to J.K. Rowling? No? Well, then, it was revealed by the partner. That he revealed it to his wife first, or perhaps only, is completely irrelevant.
Hurry it up (Score:2)
Re: (Score:3)
Tools like this basically do: (step 1) build abstract representation of text - (step 2) rebuild it into a new text using random substitutions.
Plagiarism detection tool will just have to do step 1 and then compare it with database of saved essays in same abstract form.
How would that help if the plagiarism detection tool only has the randomized outcome of step 2?
Simple plagiarism detection tools just use string matching. If a person used popular quotes and phrases in an essay, it is entirely possible for the software to give a high plagiarism percentage. That's why all the good software packages use highlighting with a link what it thinks was plagiarized.
More advanced tools can detect things like a student using a thesaurus for one to one word replacement. I do not know how much they can do in this regard though. String matching still works as long as the match
Re: (Score:2)
Tools like this basically do: (step 1) build abstract representation of text - (step 2) rebuild it into a new text using random substitutions.
Those are easily spotted by their near-miss of English. It's called "content spinning" and it is easy to spot.
Re: (Score:3)
This assumes that they're as stupid as we all suspect, because the next thing the administration begins to do is check whether the student's written oeuvre is self-consistent without bunkering down under a blander identity than a Milli Vanilli cover of Valium Spice.
I'm so busted.
Re: (Score:2)
It was confusingly worded in TFA. What I eventually figured from it is that it was not used as a discovery mechanism. It looks like it was a test they performed after it was revealed, and the test only confirmed that she was the author.
It was not done to uncover any hidden truths, it was done to demonstrate the correctness of the tool.
AI doesn't do shit to detect plagiarism (Score:2)
Profit does. When your bottom line depends on keeping schools convinced that you're indispensable in the War On Plagiarism you damn well find plagiarism everywhere you can, whether or not it's actually there. There are approximately 80 MILLION students in the US, with our education system being as repetitive and formulaic as it is it becomes a virtual certainty that out of 80,000,000 students a significant number will say the same thing the same way.
Re: (Score:1)
Re: (Score:1)
Long long ago, in a computer teaching lab 30 miles away, I had 20 assignments turned in to me for grading. Of them, I had seventeen identical, bizarre wrong answers. Seriously, people... if you're going to cheat, at least copy from someone who isn't high/psycho/retarded.
Re: (Score:1)
Re: (Score:2)
Finding plagiarism when it comes to coding is mainly a matter of style. Students should be encouraged to talk to each other about doing their homework. That doesn't mean that they should copy whole problems verbatim from one another though.
Look at the whole rangecheck(...) debacle. The algorithm wasn't secret by any means. The whole issue came about because the same coder wrote both functions. He has his own programming style that becomes immediately apparent when comparing small snippets of code like
Re: (Score:2)
Off topic, but the braces format question will get better answers if it's phrased differently, such as:
a)
if (...) {
} else {
}
b)
if (...)
{
}
else
{
}
c)
if (...) {
}
else {
}
Prior to "Perl Best Practices", I preferred to use an inconsistent style of:
if (...)
{
} else {
}
The different handling of elsif and else's compared to if's always bothered me, but I found the lined up braces much more pleasing. I didn't like option "b" because the else's take up WAY too much vertical room. Option "c" is now my personal preference.
YMM
Literature IS style! (Score:2)
I am sorry, but as far as literature goes, writing style anonymization (is that a word?) would harm the original intent of the author. A literary work is valuable (when so) due to author's style, among other factors, much like in movies, where a certain actor's voiceover is best for a certain character. The same character would become retarded if the actor's voice changes. Imagine Donkey (from Shrek) played by Morgan Freeman or Darth Vader played by Danny de Vito. Good characters, good actors, no match in s
Re: (Score:2)
Stephen King (Score:5, Insightful)
Stephen King seems to agree with you.
In his book "On Writing [amazon.com]", he explains (among many other good points) that one hallmark of good writing is finding the right combination of words for imagery.
He uses examples like "I lit a cigarette, tasted like a plumber's handkerchief'" from Raymond Chandler and "'It was darker than a carload of assholes' by George V Higgins.
The Odyssey (IIRC) has the phrase "it was a wine dark sea", so this has been around for a very long time.
For casual writing the project may be useful, but I wonder how much imagery will be lost in translation.
Many of the works of revolutionaries, radicals, and dissenters are memorable for their specific imagery. Simon Sinek analyzed "I have a dream [wikipedia.org]", and noted the difference between "I have a dream" and "I have a plan". The two are very different, and have different effects on people. (Viz. TED talk "How Great Leaders Inspire Action" [ted.com])
I'm doubtful that AI has progressed to the point where the mood and emotional content will be preserved in such a translation.
To be effective, defiant writing will still require courage.
Re: (Score:2)
This isn't for people who want to be known by their writing.
Re: (Score:2)
Just one mention: I think I agree with Stephen King, not the other way around. After all, I heard of him (as a matter of fact, I just finished reading The Long March and started Misery) but I highly doubt he ever heard of me :)
Thanks. (Score:2)
An excellent point, I will try to remember this in future writing. It's the sort of thing you don't get in a writing course, for which I am grateful.
Thanks.
Re: (Score:2)
> For casual writing the project may be useful, but I wonder how much imagery will be lost in
> translation.
Except, did they not say it "suggests changes"? Doesn't that still leave the author free to either take the suggestion, or select a different phrasing or imagery choice?
I mean if it comes to "Wine dark sea" and suggests instead "deep red sea", or "sea of dark wine" I would assume the author would understand his original meaning and be able to work from there, and then iterate through it again to
Re: (Score:2)
"To be effective, defiant writing will still require courage."
Surviving to be defiant may require anonymity.
Re: (Score:1)
Imagine Donkey (from Shrek) played by Morgan Freeman or Darth Vader played by Danny de Vito.
I'd pay to watch either of those.
Re: (Score:2)
As a matter of fact, you'd probably pay to watch an excerpt of 2 minutes of either of those.
I once watched "Twins" (Arnold&DeVito) dubbed in Hungarian. it was hilarious... for a few minutes. Then it was annoying, then I couldn't handle it anymore.
Re: (Score:2)
Imagine Donkey (from Shrek) played by Morgan Freeman or Darth Vader played by Danny de Vito.
Imagine Eddie Murphy playing a Chinese Dragon in Mulan. Oh wait...
Re: (Score:2)
>>Darth Vader played by Danny de Vito
Which is one reason why Spaceballs was so darned funny. Rik Moranis as Darth Helmet... almost the exact opposite of a James Earl Jones voice and style wise.
Only if you remain anonymous... (Score:1)
... in the rest of your digital life.
In light of recent events -and I'm not only referring to the NSA-gate, but also to all the known ways to get your private information- it is hard for me to figure out a digital way of keeping your identity secret in a high profile incident.
Confirm (Score:1)
This is he next step in surveillance, if he government isn't doing it already. Binding together various accounts of yours based on statistics of phrases.
And it's redundant since they have a database of all IP connections, web pages, and stuff you type in anyway. Sigh. I suppose it will make confirmation of these AI. techniques trivial. Yey.
Google translate? (Score:2)
Re: (Score:1)
Certainly one can simply translating their prose mechanism to another language and back to avoid identifying stylometric?
Surely, one can only auto-interpretation of their prose to another language and back to avoid stylometric identification?
Of course, you could just automatically translate your prose into another language and back again, in order to avoid the stylometric identification?
Surely one will simply start their prose-translation to other languages ââand back to avoid stylometric about yo
Re: (Score:3)
First of all, this: http://www.youtube.com/watch?v=LMkJuDVJdTw [youtube.com] (YouTube)
Second of all:
"Of course you can, just stylometric identification and back home in order to prevent another language is automatically translated prose?" -- (Haitian Creole -> Azerbaijani -> Slovenian -> English ...)
"Not even the same language at home and another stylometric can automatically translated into prose?" -- ( ... Irish -> Hebrew -> Czech -> English ...)
"Not even in the same language and prose automatically t
Re: (Score:2)
i got the order of translation mixed up but same story. The Urdu-led translation trip was second, then led by the Irish, then the Japanese.
conversion to another's style (Score:3)
Re: (Score:3)
So, can any mediocre author convert his story to the style of a known good author using this?
There's hope for Slashdot's editors! Huzzah!
Re: (Score:2)
Speaking as someone who's done a little work in stylometry, I'm sure that it's a lot easier to make your work look like it's not yours than it is to make your work look like a specific different person's. I haven't looked at this project, but I'm guessing that it'll do the former. If I made software that could do the latter, then I'd be loudly advertising that fact, or I'd keep silent and make use of it...
Re: (Score:2)
yeahbutt (Score:2)
Wasn't used to out J. K. Rowling (Score:2)
Re: (Score:2)
Sounds like some company is trying to toot their own horn here or something, but AI didn't out J.K. Rowling. Her lawyers friend did. http://www.businessinsider.com/russells-apologizes-to-jk-rowling-2013-7 [businessinsider.com]
This is a privacy related story on Slashdot. Facts have as much of a place here as in a Microsoft story.
Although Slashdot does hate lawyers, so maybe you can get some traction with this ...
Re: (Score:2)
Also, you think this is going to identify people that type very little? Or have multiple personalities, bipolar disorders or similar?
No, it probably can't. And there's likely to be many, many other scenarios in which it cannot detect the writer reliably. So what? It doesn't have to be completely perfect to be useful.
Re: (Score:2)
Right here [iwl.me]. :)
Looks like a typical web toy, so I wouldn't quit your job and start working on your Great American Novel based on the results.
This is a Dupe (Score:2)
Been done before (Score:2)
Dear aunt, let's set so double the killer delete select all
BUSTED! And on AOL! (Score:2)
Way back, in the dim, distant past of the bucolic walled gardens that preceded the Internet as we know it ... there was AOL. AOL had walled predator-free gardens within gardens, where only teens younger than 18 were supposed to be communicating.
There were rumors that evil pedophiles were lurking in these gardens, so I made a sub-account for a totally bogus 16-year old boy named Alex. And Alex went forth to play.
All was going well, Alex was quite a popular young man amongst his peers and had lured ZERO pedop
Facepalm... (Score:2)