Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Computers Paraphrase English 212

Posted by michael on Friday December 26, 2003 @02:45PM from the me-talk-pretty-one-day dept.

AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.

This discussion has been archived. No new comments can be posted.

Computers Paraphrase English

Load All Comments

Search 212 Comments Log In/Create an Account

Comments Filter:

hmm... soudns familiar... (Score:4, Funny)

by Dorothy 86 ( 677356 ) * writes: on Friday December 26, 2003 @02:46PM (#7813642) Homepage

This technology, destined to eventually replace most reporters with very small shell scripts
This shirt? [thinkgeek.com]

Share
twitter facebook
- Re:hmm... soudns familiar... (Score:2)
  
  by tbone1 ( 309237 ) writes:
  
  Actually, I thought Gannett had already done that for their papers.
Automated slashdot? (Score:3, Funny)

by TwistedSquare ( 650445 ) writes: on Friday December 26, 2003 @02:46PM (#7813644) Homepage

This technology... thankfully still in its infancy.
So one day instead of complaining against michael and co., everyone will be moaning about someone else's code - seems more appropriate for a nerd site somehow ;)

Share
twitter facebook
- Re:Automated slashdot? (Score:2)
  
  by evanbd ( 210358 ) writes:
  
  At least it can use a spellchecker. And it can probably catch dupes on occasion, too, with some work. I don't know what you'd be complaining about.
  - - Re:Automated slashdot? (Score:4, Insightful)
      
      by Steve Franklin ( 142698 ) writes: on Friday December 26, 2003 @06:51PM (#7814854) Homepage Journal
      
      "It might even know the proper use of to/too and your/you're."
      
      Yeah, but can it manage to use "There are" instead of "There is" with a plural subject?
      
      Actually, the long known solution to most of these *oh so difficult* translation problems is to translate everything into a neutral interlanguage like Interlingua and then translate that into other languages, sending the interlingua version along for the ride, thus preventing degradation in further translations. Then all that local linguists have to concentrate on is ONE set of problems: translating their local language into and out of Interlingua, and Interlingua, being tightly defined, is much easier to machine translate into and out of other languages. So...all this lunacy of trying to machine translate Chinese into English, German, Hungarian, Estonian...--you get the picture--is an incredible waste of time and resources and isn't the best way to solve the problem.
      
      Parent Share
      twitter facebook
      - Re:Interlingua, or Lojban? (Score:2)
        
        by Steve Franklin ( 142698 ) writes:
        
        Yes, I was specifically referring to IALA Interlingua. And no, Interlingua is not as tightly defined as Lojban (Loglan). But Lojban is fairly difficult to learn and is not instantly recognizable. Perhaps a good compromise would be the Interlingue (formerly Occidental) of the Interlingue-Union. This is similar to Interlingua but derives its vocabulary more precisely, as does Esperanto for that manner. The advantage of an interlanguage that is easily recognizable (at least to speakers of European languages--m
- Re:Automated slashdot? (Score:2, Funny)
  
  by Lord_Dweomer ( 648696 ) writes:
  
  "This technology is thankfully still in its infancy."
  I think Michael misspelled 'unfortunately'. But what am I saying...god forbid we have a day when scripts take over Slashdot. Of course, they'd probably program them to dupe and put in random M$-bashing statements.
- Re:Automated slashdot? (Score:2)
  
  by crapulent ( 598941 ) writes:
  
  I don't see why you'd need any fancy AI or genetic algorithms to mimic the slashdot submitters. Most of them just copy+paste the first two or three sentences of the article, without adding anything. That could easily be replaced by a perl script in about 20 minutes.
Interesting use of Technology (Score:2, Troll)

by rkz ( 667993 ) * writes:

Google news [google.com] already uses a similar technique to decide what to put in the summary beneath the headline, it does not paraphrase but it does actually extract a summary.

Also if you have Microsoft Word [microsoft.com] lying about there is a feature called Auto-summary which is suprisingly good, amost as effective as going through a document yourself looking for the main points.
- Re:Interesting use of Technology (Score:5, Informative)
  
  by Tim C ( 15259 ) writes: on Friday December 26, 2003 @02:51PM (#7813666)
  
  I've provided search engine functionality to a few sites using Verity's K2 product, which provides a similar piece of functionality. If you (programmatically) ask it to return a summary of each hit, what you get is what it considers to be representative of the document as a whole, not merely the first few lines, or a paragraph, or whatever. It actually works pretty well, but then it should, as (a couple of years ago) it cost almost as much as my house...
  
  Parent Share
  twitter facebook
- Re:Interesting use of Technology (Score:3, Interesting)
  
  by znu ( 31198 ) writes:
  
  Mac OS X users can select text and choose 'Summarize' from the Services menu in any Cocoa or Services-enabled Carbon application. Summarization is also available to any application programatically [apple.com] through the Find By Content API.
- Word Summary (Score:2)
  
  by cornjones ( 33009 ) writes:
  
  Hey, I went and played w/ this feature of word. Here is the summary of the article. hmmm... maybe if we set up an auto summary more people would RTFS?
  
  Anyway, here it is:
  Now, computers can play along
  
  Computers can't do nearly that well at paraphrasing. Now, using several methods, including statistical techniques borrowed from gene analysis, two researchers have created a program that can automatically generate paraphrases of English sentences.
  The program gathers text from online news services on specifi
- - Re:mod abuse?? (Score:2)
    
    by CrankyFool ( 680025 ) writes:
    
    I think it's mostly been modded down because the link in the sig is fairly solidly abusive, offensive, and misleading. You can't easily ignore something that someone is actively trying to obfuscate, and that link is a good example of such obfuscation. The post includes the sig, and as such the sig affects the post. It's perfectly reasonable to claim the post as a whole is a troll if the sig is egregious enough.
fox_news.sh (Score:5, Funny)

by sinclair44 ( 728189 ) writes: on Friday December 26, 2003 @02:47PM (#7813648) Homepage

#!/bin/sh curl $1 > paraphrase > slant -patriotic -stupid > fox_news_story.txt

Share
twitter facebook
- Re:fox_news.sh (Score:5, Funny)
  
  by drakaan ( 688386 ) writes: on Friday December 26, 2003 @03:00PM (#7813746) Homepage Journal
  
  perl makestory.pl -slant "liberal dem party-line" -severity "raving" -subject "Cheney Halliburton motives"
  Fair is fair ;)
  
  Parent Share
  twitter facebook
  - Re:fox_news.sh (Score:4, Funny)
    
    by niom ( 638987 ) writes: on Friday December 26, 2003 @03:43PM (#7813959)
    
    Fair is fair ;)
    Except when immediately followed by "and balanced".
    
    Parent Share
    twitter facebook
    - Re:fox_news.sh (Score:2)
      
      by drakaan ( 688386 ) writes:
      
      Scroll up, and be enlightened. There's news aside from Fox's that's unfair, as well.
  - - Re:fox_news.sh (Score:4, Insightful)
      
      by drakaan ( 688386 ) writes: on Friday December 26, 2003 @03:19PM (#7813854) Homepage Journal
      
      Well, some of us (me, for instance) listen to fox news and NPR...my own personal take on fair and balanced...and see that the party line is alive and well on both major sides of the political fence. That's part of the reason I'll never be a democrat or a republican (or a libertarian, or any other label you want to stick on a like-minded group of people). The news has information in it. Look for it, compare notes, and make up your own mind what's news.
      
      Parent Share
      twitter facebook
      - Re:fox_news.sh (Score:3, Interesting)
        
        by EvilTwinSkippy ( 112490 ) writes:
        
        I'm a big fan of completely blocking out the major new outlets and simply investigating matters on my own. I take a mental highlighter to the actual facts as stated in an article, and disregard the interpretation.
        I have discovered there are very few people actually collecting news. In many cases I boil a dozen or so stories down to a single quote from the same source, or even funnier, one reporter's misinterpretation of another reporter's work. My favorite is when an american reporter writes that "the bom
- - Re:bbc_news.sh (Score:2)
    
    by Bombcar ( 16057 ) writes:
    
    #!/bin/sh curl $1 | paraphrase | mispel -slashdot | slant -page-hits -linux | troll --max-troll=50 > slashdot_story.txt
Very small shell scripts (Score:5, Funny)

by bunnyman ( 121652 ) writes: on Friday December 26, 2003 @02:47PM (#7813650)

Yes, but until it can post duplicate articles with slightly different phrases, it will never replace CowboyNeal!

Share
twitter facebook
- Re:Very small shell scripts (Score:4, Funny)
  
  by EvilTwinSkippy ( 112490 ) writes: <yoda.etoyoc@com> on Friday December 26, 2003 @03:16PM (#7813834) Homepage Journal
  
  Yes, but a system will not take the place of CowboyNeal until it posts duplicate articles with slighty different phrases!
  
  Parent Share
  twitter facebook
  - Re:Very small shell scripts (Score:4, Funny)
    
    by matth ( 22742 ) writes: on Friday December 26, 2003 @03:18PM (#7813851) Homepage
    
    After taking the place of CowboyNeal will a system like posting duplicate articles, phrase slightly different? Yes!
    
    Parent Share
    twitter facebook
    - Re:Very small shell scripts (Score:4, Funny)
      
      by EvilTwinSkippy ( 112490 ) writes: <yoda.etoyoc@com> on Friday December 26, 2003 @03:24PM (#7813877) Homepage Journal
      
      CowboyNeal's system is posting slighty different phrases. Yes takeing me places!
      
      Parent Share
      twitter facebook
- Re:Very small shell scripts (Score:2)
  
  by mabhatter654 ( 561290 ) writes:
  
  But can it post without RTFM??? well can it? At least /. will be safe.
But still.... (Score:5, Insightful)

by AgBullet ( 624575 ) writes: on Friday December 26, 2003 @02:50PM (#7813664) Journal

won't you need someone to write the stuff to be paraphrased in the first place?? explain to me how that replaces reporters with small shell scripts.

Share
twitter facebook
- Re:But still.... (Score:2)
  
  by gregfortune ( 313889 ) writes:
  
  I think the joke was directed at "reporters" like those found on /. They would be first in line to be replaced as their entire job seems to be to collect interesting articles and repost. Heck, the shell script would probably get the whole dup problem figured out too ;o)
- Re:But still.... (Score:4, Insightful)
  
  by EvilTwinSkippy ( 112490 ) writes: <yoda.etoyoc@com> on Friday December 26, 2003 @03:07PM (#7813793) Homepage Journal
  
  There are reporters? Crap, every other article in my local fishwrap is Rueters, the other half is AP. There are one or two articles for local color, generally homicides or documenting yet more ways our local government is a) corrupt, b) inept, and/or c) playing partisen politics with/against the state goverment.
  By the time it's printed in the "News" its usually pretty old.
  
  Parent Share
  twitter facebook
  - Re:But still.... (Score:2)
    
    by inKubus ( 199753 ) writes:
    
    No Crap.
    
    It used to be that a reporter was there, got the facts and then got the feeling too. But I guess with all the information these days, there's no way people could do everything. Still, it's just going to seem unhuman when one day a news story is a few lines of XML:
    
    And the "reporter" is just a software program that turns that into a readable "story". Then you can choose how you want the news displayed, with various schema. Like if you're in a good mood, you can put in the happy schema and it doe
  - Re:But still.... (Score:2)
    
    by benjamindees ( 441808 ) writes:
    
    documenting yet more ways our local government is... inept
    They don't have to try very hard. I remember a couple of years ago the local news had a shot of a police car that had run into a postal truck. No comment necessary, just ten seconds worth of footage. That image is with me every time I vote.
- yr comment's a journalism integrity question... (Score:2, Interesting)
  
  by geekpuppySEA ( 724733 ) writes:
  
  ...not nec a problem to be solved by the code. Which BTW probably are a leetle more complex than small shell scripts, and see a good textbook like Jurafsky and Martin (pub 2000) for why.
  Re journalistic integrity - There's the possibility that a single entity could issue the release to the wire services, they could relase it in some kind of 'compiled' form (where it's just the syntax/semantic relations.) (How this could be different from how releases are issued now is a good question, but I guess there'd
- Re:But still.... (Score:2)
  
  by C10H14N2 ( 640033 ) writes:
  
  Just imagine all /. users replaced by very small shell scripts and you've got it. I'm suspicious the that process was begun long ago.
The Ultimate Tool For Plagiarism (Score:5, Interesting)

by popo ( 107611 ) writes: on Friday December 26, 2003 @02:51PM (#7813672) Homepage

All someone has to do now is marry this technology with a term-paper database, and "Hello Original Work!"

The question will then become, how many different unique "paraphrases" can the system ultimately generate?

Share
twitter facebook
- Re:The Ultimate Tool For Plagiarism (Score:3, Interesting)
  
  by EvilTwinSkippy ( 112490 ) writes:
  
  Actually you can use topic maps to decompose a body of work into individual statements and then use a set or randomly generated "flavors" to re-constitute the facts into an original work. The rules about what goes where are pretty cut and dry.
  More stuff to help people avoid shitwork, only for humanity to discover our purpose in life IS to do shitwork.
- Re:The Ultimate Tool For Plagiarism (Score:3, Interesting)
  
  by KrispyKringle ( 672903 ) writes:
  
  This isn't necessarily the big problem it appears. I've heard of many college professors and high school teachers using automated plagiarism detectors in the news, and that strikes me as stupid, as well. I mean, if a student has to write a paper on _The Bell Jar_, I'm sure he can find one online. But in most classes, you expect some level of familiarity with the students, on part of the teacher. If a kid who sleeps in every class and who's comments tend to be off topic or stupid turns in a paper worthy of T
  - Re:The Ultimate Tool For Plagiarism (Score:4, Interesting)
    
    by SurgeonGeneral ( 212572 ) writes: on Friday December 26, 2003 @04:33PM (#7814224) Journal
    
    Yes, we've all heard the arguments against cheating.
    
    Especially the, 'you're only cheating yourself' one.
    
    Its irrelevant because this will not affect the way we cheat so much as the way we learn and the way we write. Think about it beyond your personal experience in high school.
    
    1. On the micro scale, an autosummerize feature like this will allow someone to take another's essay and put their facts into their own words. But I dont see how this makes any difference to the cheater other than saving him an hour. To see this tech as a problem on this level is to ignore the future.
    
    2. On the medium scale, it will allow someone to take multiple papers, extrapolate all the facts and their sources and then string them together again with their own interpretation. This will allow the learner to come up with a new argument and possibly a fresh insight based on the available information. In this case, it saves the learner a few hours of reading, though he has to do the same amount of thinking and logical reasoning. Is it a shame that the person doesnt have to waste time reading irrelevant information? Still, looking at it on this level is not thinking very deep.
    
    I take history in university and the essays we have to write are done by data mining books. Lots of books. We have to read large amounts of material in as short a time as possible. We have to find out what is important and what is relevant. Am I really learning how to analyze facts? I dont think so. I am learning how to write university papers and theorize based on incomplete information. I am learning how to make a lot of wasted time look like a lot of work.
    
    3. The macro scale. What if every book ever written was replicated in full electronically and available for parsing. What if I could extrapolate every fact from every source even remotely relevant to a topic. I'm right back to where I was before : hours and hours of reading. Yet, my argument will be more solid and my information more complete then it ever could be using the outdated method of data mining: looking in the indexes of books. In this case, what am I learning? I am learning how to think. I am learning how to spot holes, inconsistancies, fallacies, and etc. In this case the technology has eliminated cheating altogether because there is no single source to copy from. And if I want to understand how all these facts are related to each other I either have to think about it or read an other authors interpretation of it. (thus I could still cheat in the classical sense)
    
    4. But lets look at it on one more level, the very tiniest level and the most futuristic. A well constructed paragraph or sentence cant be parsed down, and wouldnt make sense if it was. The facts contained in a paragraph only become important in relation to one another. So in the end, it could just change the way we write. Enough with this puffed up crap, enough with padding your papers - either state whats important or nothing at all. A well constructed essay in the future will be one that cant be "autosummerized" without losing all its intelligability.
    
    Parent Share
    twitter facebook
    - Re:The Ultimate Tool For Plagiarism (Score:2)
      
      by KrispyKringle ( 672903 ) writes:
      
      Your points all are valid, I think, and actually very interesting, but I don't think this means there's no such thing as cheating. If you plagiarize, the issue is not the text you copied, but the ideas. So yeah, perhaps this system would allow us to actually crack down on plagiarists, as well, by detecting copied ideas, even if restated. But I suspect we might find that a lot of papers aren't really as original as the authors thought.
  - - Re:you really are only cheating yourself (Score:2)
      
      by KrispyKringle ( 672903 ) writes:
      
      OK, so you're going to argue that there is no morality, that we all have free reign to do what we must to get ahead? I can't argue this position; if you don't feel that there is some imperative to do what is right--say, the greatest good for the greatest number--I certainly can't convince you otherwise. Or perhaps you don't feel that cheating harms anyone (and you may well be right; I acknowledged that most of those who cheated never took anything away from me, those conceivably they did take things away fr
- Re:The Ultimate Tool For Plagiarism (Score:2)
  
  by iabervon ( 1971 ) writes:
  
  And, if this technology is sufficient to write good term papers based on online information, what is the point of learning to write term papers? Certainly any students who have access to such technology will have no use for doing it themselves after school, when the technology will be more advanced and more money will be available for it.
  
  At that point, teachers ought to be teaching students how to get such software to produce the effect they want on the audience. For that matter, they could try teaching an
  - Re:The Ultimate Tool For Plagiarism (Score:2)
    
    by thrillseeker ( 518224 ) writes:
    
    if this technology is sufficient to write good term papers based on online information, what is the point of learning to write term papers ... teachers ought to be teaching students how to get such software to produce the effect they want on the audience.
    Since it's so comfortable one wonders why the baby ever leaves the womb.
  - - Re:The Ultimate Tool For Plagiarism (Score:2)
      
      by Saeger ( 456549 ) writes:
      
      The ability to do research and learn new things isn't going to be replaced by technology.
      Regurgitating boring facts and rote memorization WILL be replaced by technology eventually. A brain-computer interface -- which isn't that far off -- will, in essence, allow some future "Google" to be an extension of your brain's main memory. This still isn't the holy grail, though, because it only decreases access time to huge databases of information, but doesn't do much to decrease the time it takes to fully absorb
    - Re:The Ultimate Tool For Plagiarism (Score:3, Insightful)
      
      by iabervon ( 1971 ) writes:
      
      Since the point of term papers is not, in fact, to learn to write term papers, it is likely that, as the production of term papers becomes possible while missing the point, the assignment should be changed to retain the point.
      
      The ability to do research (of known information, at least) has already been changed by technology. Google, PubMed, and other sites make real literature research possible for high school students with just a web browser, and the kind of slogging through printed books that I learned in
Heh (Score:2)

by Mwongozi ( 176765 ) writes:

Just like the T-shirt says [livejournal.com]
Dupe (Score:5, Informative)

by greenhide ( 597777 ) writes: <jordanslashdotNO@SPAMcvilleweekly.com> on Friday December 26, 2003 @02:55PM (#7813707)

Unfortunately, there isn't yet a way to use computers to detect dupes [slashdot.org].

Or Is there?!? [google.com]

Share
twitter facebook
School Reports (Score:4, Insightful)

by gregfortune ( 313889 ) writes: on Friday December 26, 2003 @02:56PM (#7813709)

So, will there be difference between paraphrasing and copying now in an educational setting? Seems like this could make a report pretty easy...

1) Brainstorm some key points/ideas
2) Have this program data mine for relavent articles online
3) Feed sections of each article into the program and have a finished paper

Granted, the tech isn't quite that powerful yet and probably wouldn't do a whole paper, but it sure looks like it could supply several paragraphs of material per page...

Share
twitter facebook
- Re:School Reports (Score:2, Interesting)
  
  by roninmagus ( 721889 ) writes:
  
  I do very much hope so; as a computer science major who hhaaatteess general studies classes, I hope very much that the English/History classes which so graciously waste my programming time with useless writings go down the drain. Of course, my website [daveandrews.org] is entirely such useless writings, so I stand trumped.
  
  However, I did meet my girlfriend and hopefully future wife in Sophomore English at MTSU. Go figure.
Rethink English ! (Score:4, Informative)

by Thinkit3 ( 671998 ) * writes: on Friday December 26, 2003 @02:58PM (#7813728)

Lojban is among the more interesting newer languages. It can be parsed just like c! Esperanto is somewhat interesting. English will be regarded in the future as a curious artifact--it was swept along with the technology revolution simply because ASCII didn't include accents and extra marks on letters. Eventually we'll get away from vocalization all together and have purely numerical, written laguages.

Right now, trying to work with English in computers deals way more with the strangeness of the language than the more interesting issues of cognition that lie underneath.

Share
twitter facebook
- Re:Rethink English ! (Score:3, Insightful)
  
  by TwistedSquare ( 650445 ) writes:
  
  English will be regarded in the future as a curious artifact
  One man's informative is another man's troll... Esperanto was interesting and look where it got. Nowhere. People will speak in what's easiest. English is becoming a de facto standard that will continue to be the most spoken language in the world. People won't use odd designed languages because it will be harder than current languages, which got where they are today though iterative refinement to be the best suited language for us to communi
  - - Re:It's globalisation (Re:Rethink English !) (Score:2)
      
      by TwistedSquare ( 650445 ) writes:
      
      For example, the spelling system is just silly. For example, why are there five ways to write "k" (click, kick, suck, schedule, iraq)?
      And while we are at it, why not use "f.e." instead of "e.g."?
      And why such an irrelevant six-billionth of the whole deserves to be honored by the capital leter ("I")?
      I imagine almost every language has redundancy in it's alphabet. In French for example, ce and se would be pronounced the same. Over time this will probably be reduced, as I said, iterative refinement is
- Re:Rethink English ! (Score:3, Interesting)
  
  by Just Some Guy ( 3352 ) writes:
  
  Right now, trying to work with English in computers deals way more with the strangeness of the language than the more interesting issues of cognition that lie underneath.
  That's true. Computer languages that don't stick close to "regular" human expression are very popular [cloud9.net] and growing quickly. Languages that resemble written English [python.org] are dwindling rapidly.
  After all, code is meant to be written, not read [ioccc.org], and programmers should strive to write such that their work can't be understood [unsw.edu.au] by anyone not an expe
Fake literature (Score:4, Funny)

by MAPA3M ( 718897 ) writes: on Friday December 26, 2003 @02:59PM (#7813734)

Isn't this the way those trashy love novels are written?

Share
twitter facebook
- Re:Fake literature (Score:2)
  
  by amRadioHed ( 463061 ) writes:
  
  Not quite, but very similar [amazon.com]. That is by far the most stunningly dumb book I came across in my stint working at a bookstore.
- Re:Fake literature (Score:2)
  
  by Lord Kano ( 13027 ) writes:
  
  Isn't this the way those trashy love novels are written?
  
  Wow, it must take a hugh shell script to turn "Mary wanted the big strong muscle man to solve all of her problems. Henry, the big strong muscle man, was horny. They had unprotected sex. Everything was perfect from then on."
  
  Imagine what that could to to a Hello World program.
  
  LK
Or games... (Score:3, Funny)

by A55M0NKEY ( 554964 ) writes: on Friday December 26, 2003 @02:59PM (#7813736) Homepage Journal

Someone set up us the bomb!

Share
twitter facebook
Reference (Score:2)

by Epistax ( 544591 ) writes:

...most reporters with very small shell scripts...

I know I heard this phrase (loosely) before, but does someone know the name of the reference?
- Re:Reference (Score:2)
  
  by arth1 ( 260657 ) writes:
  
  ...most reporters with very small shell scripts...
  
  I know I heard this phrase (loosely) before, but does someone know the name of the reference?
  
  At ThinkGeek [thinkgeek.com] perhaps?
  Or one of myriads of signatures quoting this?
  
  Regards,
  --
  *Art
Replace reporters?? (Score:2, Insightful)

by dyj ( 590807 ) writes:

How is this going to replace reporters? Reporters don't just paraphrase other reports. They actually are supposed to search for stories (hopefully factual!) on their own.
- Re:Replace reporters?? (Score:2)
  
  by EvilTwinSkippy ( 112490 ) writes:
  
  I think someone just wanted Journalists to know what it feels like to be a tech in this day and age. What they can't get a computer or a trained chimp to do, they will find some guy in another country who will do it cheaper.
  We know we will be in trouble when every commentary article begines with "I am thinking that..."
  - Re:Replace reporters?? (Score:2)
    
    by tbone1 ( 309237 ) writes:
    
    Until you can get computers to drink on the job, get paid by businesses to write advertisement for them (a la Enron, CART, etc), and fall for any buncomb that someone says in a serious voice, then they won't exactly 'replace' journalists.
Something similiar existed on the Amiga (Score:4, Interesting)

by Serk ( 17156 ) * writes: on Friday December 26, 2003 @03:03PM (#7813768) Homepage

Back in the late 1980's I had a word processor for my Amiga that had a function whereby it would do a global search and replace of every Xth word (User settable) with a synonym from the built in Theasarus... Very handy for those term papers I so hated in high school...

I'm assuming this (Of course I didn't RTFA) is far more advanced than what we had back then, but the idea for this has been around for quite a while at least...

Share
twitter facebook
Possible outcome of computerized paraphrasing (Score:2, Funny)

by CapnCarrot ( 655580 ) writes:

AhaIndia submits story discussing paraphrasing of articles. This technology, destined to replace reporters shell, is still in its infancy. Huh, perhaps we'll still need humans after all . . .
Do you know what reporters DO? (Score:5, Insightful)

by DavidinAla ( 639952 ) writes: on Friday December 26, 2003 @03:08PM (#7813802)

For you to say that this technology will someday replace reporters makes me think that you're clueless about what reporters do. Do you realize that the biggest parts of a reporter's job are gathering facts and making judgments about 1) which stories are worth reporting, 2) which are the relevant facts about a story and 3) who's lying and who's telling the truth about a story? The actual writing that you see is many times almost incidental to most of what a reporter does. You might not like the judgments that a reporter makes (and I could agree with that in many cases), but software can't go out into the world and talk to people and use judgment and intuition to find information to write about.

As an ex-reporter and editor, I find it laughable that anyone might think this technology will replace reporters. It's sort of like suggesting that machines that can read source code and interpret it can somehow figure out what new software people want and then write it. Both possibilities are equally insane.

Share
twitter facebook
- Maybe you're not sure what linguists do... (Score:2, Insightful)
  
  by geekpuppySEA ( 724733 ) writes:
  
  Hey, don't troll this stuff out quite yet - sure it's future ware right now, but think ahead, and ... more to the point, read some about it. There's more to language and computational linguistics than you might think. Just because your (former) line of work stands to be partially replaced doesn't mean that the technology is insane.
  to wit, there are attributes of register, tone, and modality that can be applied not just to individual sentences, but to entire pieces of text that may be able to indicate a
  - Re:Maybe you're not sure what linguists do... (Score:3, Insightful)
    
    by DavidinAla ( 639952 ) writes:
    
    Maybe you're not clear about the difference between a reporter and an editor.
    
    It's theoretically possible that an editor could be replaced in some instances by software, but not the reporter. The reporter doesn't have anything to start with -- no sentences for software to analyze. A reporter normally starts with some vague thing like a source in the city clerk's office telling him that some bogus expenditures are being put into the sanitation department budget for next year, but nobody really knows what's
    - Re:Maybe you're not sure what linguists do... (Score:2)
      
      by Saeger ( 456549 ) writes:
      
      Until there is really perfect AI software -- which I think is so unlikely as to preclude reasonable speculation for the purpose of this conversation -- reporters won't be replaced by software.
      Whatever helps you sleep at night I guess.
      AI and IA is an inevitability in our lifetime as long as ancient exponential trends [kurzweilai.net] continue on track.
      --
    - - Re:Maybe none of are sure what Ashcroft does... (Score:2)
        
        by DavidinAla ( 639952 ) writes:
        
        Reporters would typically be happy if editors didn't exist, because they tend to believe their copy doesn't ever need ANY changes. ;-)
- Re:Do you know what reporters DO? (Score:2)
  
  by Linux_ho ( 205887 ) writes:
  
  The fact is that this technology is just replicating what people are already doing. The technology won't replace many reporters. It will replace the cut-n-paste people who have already replaced the reporters. Most "news organizations" are nothing more than AP chop-shops right now. You think there's a lot of fact-checking and analysis going on? I wish that were true, but it's just not.
- Re:Do you know what reporters DO? (Score:2)
  
  by ediron2 ( 246908 ) * writes:
  
  DavidinAla asked:
  
  Do you know what reporters DO?
  
  This is slashdot.
  This is Michael, Hemos, and Taco we're talking about. What kind of dumbass question is that?!
  Of *COURSE* they don't know. Heh, even avoiding dupes, spellcheck and fact checking are alien concepts...
- Re:Do you know what reporters DO? (Score:3, Interesting)
  
  by shaitand ( 626655 ) writes:
  
  "which stories are worth reporting"
  
  With this technology, ALL of the stories could be reported.
  
  "which are the relevant facts about a story"
  
  odd, I myself get very pissed about reporters who don't give ALL the facts. If you mean summarizing, that is EXACTLY what this is supposed to do.
  
  "who's lying and who's telling the truth about a story"
  
  That's for the reader to decide. A reporter who makes judgements concerning what they are reporting and expresses their view of the subject is a bad one. At least in
  - Re:Do you know what reporters DO? (Score:3, Insightful)
    
    by DavidinAla ( 639952 ) writes:
    
    I'm sorry, but you're SO ignorant about the way the process works that I can't begin to correct all of your misunderstandings. If you really and truly believe that it's even possible to give readers ALL of the available information every single day, you're completely unaware of how much information is out there.
    
    Do you want to report what is on the menu at every restaurant in town every day? What about an attendance list of who made it to school at every school in town? What about the results of every medic
- - Re:Do you know what reporters DO? (Score:3)
    
    by DavidinAla ( 639952 ) writes:
    
    But even BAD journalism requires abilities that software just doesn't have. Software can't have sources and the ability to call them on the phone. Software doesn't have the ability to differentiate between good information and patently false stuff. A bad journalist might write a sensational or even purely false story, but even doing THAT requires abilities that software can't have in anything like a forseeable future.
Pair of phrase (Score:2)

by Effugas ( 2378 ) writes:

HeySubcontinent's story linkage analyzes the automatic stegoplagarization of documents written in the language derived from Britain. Expected to displace at some point journalists, these hacks presently bash with the force of a small child. Good.
Someone must research a story . . . (Score:5, Interesting)

by kfg ( 145172 ) writes: on Friday December 26, 2003 @03:11PM (#7813812)

conduct interviews and generate original copy. These people are called reporters.

The people who take this copy off the wire and paraphrase it for publication in the local paper are called copy writers.

This software will reduce the number of copy writers needed, not reporters.

This is certainly an issue to the copy writers and their families, but overall it's really just a blue collar worker being replaced by a robot issue.

The idea of a 'style dial' I find a bit more disturbing.

KFG

Share
twitter facebook
- Re:Someone must research a story . . . (Score:3, Funny)
  
  by Andy_R ( 114137 ) writes:
  
  You get news that isn't just a bunch of paraphrased press releases?
  
  Man, I gotta find the preferences checkbox for that stuff!
- Re:Someone must research a story . . . (Score:2)
  
  by Florian Weimer ( 88405 ) writes:
  
  This software will reduce the number of copy writers needed, not reporters.
  
  Just license your content from The New York Times, and you can lay off both copy writers and reports.
  
  (You can use Google [google.com] to watch how many online news sites republish this story.)
Generation isn't that easy (Score:5, Insightful)

by Ezubaric ( 464724 ) writes: on Friday December 26, 2003 @03:13PM (#7813820) Homepage

The poster incorrectly assumes that this could be used to replace reporters. The problem is that computers have a difficult time generating new text. The methods that computers use to evaulate text (as any user of grammar-check would realize) aren't that great.

In fact, most language models cannot generate even a large portion of English text. Those that do have a good range rarely have good accuracy, because there are many things that we "just don't say that way." This is why when you're talking to a non-native speaker, you often cannot explain why something they said was wrong. This is because there is no real grammar rule against speaking in a given way.

So if we rule out syntax-based models, that just leaves statistical-based models. I worked in a NLP lab during the summer of 2002, and my prof there said that syntax and statistics are like the two sides of the force. Statistics are quick and easy but are seductive. They corrupt you and leave you unable to really think about the language itself. You only think in terms of bigrams and HMMs.

So even though these systems are doing well, they are mostly statistical. Thus, it's hard to get incremental improvement. You have to have larger corpora, and larger corpora usually have more errors, thus defeating any advantage you might get by capturing more aspects of a language.

In my opinion, only with well-developed language models that can effectively generate NL can we get anywhere. Which is what Barzilay is working on, but it's still a long, long, long way off.

Share
twitter facebook
The article, summarized by MacOS X (Score:5, Interesting)

by sakusha ( 441986 ) writes: on Friday December 26, 2003 @03:19PM (#7813856)

MacOS X has a summarization feature implemented in the Services menu. I decided to summarize the CNet article just to see what I got, and because I like the idea of summarizing an article about summarizing.

In the famous sketch from the TV show "Monty Python's Flying Circus," the actor John Cleese had many ways of saying a parrot was dead, among them, "This parrot is no more," "He's expired and gone to meet his maker," and "His metabolic processes are now history."

...The program gathers text from online news services on specific subjects, learns the characteristic patterns of sentences in these groupings and then uses those patterns to create new sentences that give equivalent information in different words.
The researchers, Regina Barzilay, an assistant professor in the department of electrical engineering and computer science at the Massachusetts Institute of Technology, and Lillian Lee, an associate professor of computer science at Cornell University, said that while the program would not yield paraphrases as zany as those in the Monty Python sketch, it is fairly adept at rewording the flat cadences of news service prose.

Share
twitter facebook
- Re:The article, summarized by MacOS X (Score:2)
  
  by 2nd Post! ( 213333 ) writes:
  
  I tried this on the last article that talked about summarizing; It's slightly less relevant in this article because they talk about using multiple sources to cross reference, correlate, and paraphrase rather than actually summarizing.
  - Re:The article, summarized by MacOS X (Score:2)
    
    by sakusha ( 441986 ) writes:
    
    I cut it down to only 3 paragraphs, the shortest version that contained the lead paragraph. The longer summaries did contain that content.
Hardly news... (Score:4, Informative)

by JayJay.br ( 206867 ) writes: <100jayto&gmail,com> on Friday December 26, 2003 @03:22PM (#7813867)

This article [slashdot.org] posted before already tells us all this, the paper that originated it [mit.edu] was mentioned in the comments, and this one is another of a series of papers by this researcher [mit.edu].

OK, nothing else to see here, move on to the next redundant post (Is that paraphrasing 'dupe'?)

Share
twitter facebook
well... (Score:2)

by ducomputergeek ( 595742 ) writes:

...that explains cowboyneal
Obligatory link (Score:2)

by Florian Weimer ( 88405 ) writes:

For your convenience, here's the link to the original article that requires registration [nytimes.com].
Bring on the Machines (Score:5, Interesting)

by DumbSwede ( 521261 ) writes: <slashdotbin@hotmail.com> on Friday December 26, 2003 @03:39PM (#7813936) Homepage Journal

I don't think many people read the article. While Michael suggest this could replace reporters, it is not about summarizing a whole article, but merely paraphrasing individual sentences and elements. This would be useful for checking for plagiarism where one author has merely line by line paraphrased another. Another useful area is in language translation, where the paraphrasing may make the translation more understandable. I don't think todays translation programs allow you to say the the same thing two or three times, but repeat it back differently (paraphrase) if not understood by your listener the first time.
Of course the time will come when machines summarize articles, and I believe I have seen where this has already been tried with mixed success. It would be kind of neat to see /. use both a summary engine and a paraphrase engine on submitted articles. Then we could have 3 article descriptions: the posters description; a machine summary of the same article; and a machine paraphrase of the original posters summary.

Share
twitter facebook
Paraphrase (Score:2, Insightful)

by JediDan ( 214076 ) writes:

Would be nice to be able to summarize + paraphrase large articles and documents. Not all of us have the necessary time to read 20+ page documents.

It won't replace original works, but it could help reduce a lot of extraneous data on the web :)
- Re:Paraphrase (Score:2)
  
  by arose ( 644256 ) writes:
  
  Open Text Summarizer [sourceforge.net] may be your friend.
Typical /. story.. maybe they need the engine? (Score:5, Insightful)

by mattr ( 78516 ) writes: on Friday December 26, 2003 @03:48PM (#7814000) Homepage Journal

Slashdot needs to implement another new editorial policy: if you have nothing intelligent or really funny/biting to say, don't! An interesting topic with a another half-assed presentation.
Obviously this is a developing field. The best models seem to use phrases from the original text, anyway the Mac OSX example above shows that it is useful to users willing to take it with a massive grain of salt, even if we are not into full computational sentience yet.
When it works even a little better it will replace all those awful grade school teachers who assign paraphrasing as a homework assignment. The reporters who might have been replaced by it will have already lost their jobs, except for the ones in AhaIndia of course who will paraphrase for the rest of us, usually at a marginally better level than the machine.
The research is interesting - and I'd like to understand Barzilay's notation is that APL or calculus of statement? - in the paper (pdf) [jhu.edu] I found on google. Also see the papers on her site [mit.edu].
Of course structured text is easier, and news stories are known to have most of the meat in the beginning, but this is great stuff.
One interesting older system is ThoughtTreasure [signiform.com] which was built to understand a story and answer questions about it. The author also did work on news analysis ("NewsForms") too. There are tools out there, I've been making a survey myself too. If anyone has information about practical NLP tools for real world tasks please post.

Share
twitter facebook
It's unlikely to catch on... (Score:3, Insightful)

by ChunKing ( 513714 ) writes: on Friday December 26, 2003 @03:56PM (#7814023)

The main problem is that languages, especially English, are so idiomatic that mechanical translators will be a too much of a disadvantage - take the Babelfish [altavista.com] translator for instance.

Furthermore, the English language is so flexible that just about any word can arbitrarily substitute for anything else - for instance, take 'bad' meaning 'good'.

It would be impossible to program a machine to be able to understand the full spectrum of idiomatic phrases but the future may lie in employing neural net technologies so that computers can do some limited learning. [cornell.edu]

Share
twitter facebook
Columbia News Blaster (Score:3, Informative)

by Richard Allen ( 213475 ) writes: on Friday December 26, 2003 @04:04PM (#7814077)

I believe this was covered in a related Slashdot before regarding to this site: http://www1.cs.columbia.edu/nlp/newsblaster/

Here is a quote from their site:
Columbia Newsblaster is a system to automatically track the day's news. There are no human editors involved -- everything you see on the main page is generated automatically, drawing on the sources listed on the left side of the screen.

Every night, the system crawls a series of Web sites, downloads articles, groups them together into "clusters" about the same topic, and summarizes each cluster. The end result is a Web page that gives you a sense of what the major stories of the day are, so you don't have to visit the pages of dozens of publications.

Newsblaster is an academic project from the Natural Language Processing group at Columbia University's Department of Computer Science. It is designed to demonstrate the Group's technologies for multidocument summarization, clustering, and text categorization, among others. It is funded under DARPA TIDES and KDD and has been operational online since September 2001.

Current and future enhancements include international perspectives, multilingual capability, and tracking events across days.

Share
twitter facebook
- Re:Columbia News Blaster (Score:2)
  
  by Frisky070802 ( 591229 ) * writes:
  
  Yeah, I saw a demo of this, and was pretty impressed, though eventually I decided that paraphrasing wasn't nearly as interesting as simply identifying the big news... so I went back to Google News. Now I just read CNN, the New York Times (print), and Slashdot, and I figure that between the three, everything's covered.
Creativity (Score:2)

by dacarr ( 562277 ) writes:

Silliness aside about an Apple 2 being able to gather the news for us and feed it, the thing about wordsmithery is that there is a certain amount of creativity that needs to go into it. Otherwise you have the literary equivalent of the Backstreet Boys and such. Not a good mix.
I'm a sexist, so what? (Score:3, Funny)

by Lord Kano ( 13027 ) writes: on Friday December 26, 2003 @04:24PM (#7814183) Homepage Journal

From the article:
The researchers, Regina Barzilay, an assistant professor in the department of electrical engineering and computer science at the Massachusetts Institute of Technology, and Lillian Lee, an associate professor of computer science at Cornell University, said that while the program would not yield paraphrases as zany as those in the Monty Python sketch, it is fairly adept at rewording the flat cadences of news service prose.

Two women came up with this! Why doesn't it surprise me in the least that women are officially researching ways to automate the process of saying the exact same thing in an infinite number of different ways?

LK

Share
twitter facebook
Hollywood (Score:2)

by wampus ( 1932 ) writes:

Now, correct me if I am wrong, but hasn't Hollywood beem using this system for some time now? If a movie isn't a direct rip of something that was made in the past, then it takes familiar characters and tosses them in a blender with a dash of CG effects and frappes until smooth.

Television uses this system, too. The formula there seems to also involve borrowing a successful British TV show's concept, just to keep things a little fresher.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Old news! (Score:2)

by Alsee ( 515537 ) writes:

Slashdot has been using this system to generate its articles for a while now. Obviously it's still loaded with bugs.

-
Wake me up when... (Score:3, Funny)

by Megane ( 129182 ) writes: on Friday December 26, 2003 @10:35PM (#7815523)

...when we can replace upper level management with small shell scripts.

Share
twitter facebook
*WHEW* - Still in infancy (Score:2)

by ToadMan8 ( 521480 ) writes:

Boy, I'm glad that computers don't have their (hands?) in reporting news; it'd be terrible to get rid of all that slant in the media this way and that. I mean who wants fair, equitable stories?! You read the NYT to ra ra for the Bleeding Heart shit, or if you're a heartless republican the Journal is for you. Now how would they sell if they just told the facts as they were and left interpretation up to the readers?!

Well, at least Slashdot will always be biased, thank god for that.
I used the OS X Summarize function... (Score:2)

by constantnormal ( 512494 ) writes:

... (in the Services menu) to summarize the referenced article:
The program gathers text from online news services on specific subjects, learns the characteristic patterns of sentences in these groupings and then uses those patterns to create new sentences that give equivalent information in different words.
The researchers, Regina Barzilay, an assistant professor in the department of electrical engineering and computer science at the Massachusetts Institute of Technology, and Lillian Lee, an associate prof
- Re:Nice ad... (Score:1)
  
  by d3faultus3r ( 668799 ) writes:
  
  What about the Despair calendar motivation: if a pretty picture and a cute saying are all it takes to motivate you, you probably have an easy job the kind robots will be doing soon. Is it really any surprise? After all, that t shirt saying was probably invented by someone who reads /.
- Re:Nice ad... (Score:1)
  
  by danidude ( 672839 ) writes:
  
  Well, after all, they (/. and thinkgeek) are owned by the same group: [thinkgeek.com]:
  
  "A month or so later we were Slashdotted. And promptly thereafter ThinkGeek was acquired by the good folks at Andover.Net who have since been acquired by the great folks at VA Software. Andover.Net then became OSDN which is the central entry point for the Open Source community's favorite web sites such as ThinkGeek (hey that's us!), slashdot.org, linux.com, sourceforge.net, and freshmeat.net. Pretty nice company to be amongst, eh? We're
- Re:DUPE (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  It's not a dupe; it's a computer generated paraphrasing of the earlier story.
  ~~~

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

hmm... soudns familiar... (Score:4, Funny)

Re:hmm... soudns familiar... (Score:2)

Automated slashdot? (Score:3, Funny)

Re:Automated slashdot? (Score:2)

Re:Automated slashdot? (Score:4, Insightful)

Re:Interlingua, or Lojban? (Score:2)

Re:Automated slashdot? (Score:2, Funny)

Re:Automated slashdot? (Score:2)

Interesting use of Technology (Score:2, Troll)

Re:Interesting use of Technology (Score:5, Informative)

Re:Interesting use of Technology (Score:3, Interesting)

Word Summary (Score:2)

Re:mod abuse?? (Score:2)

fox_news.sh (Score:5, Funny)

Re:fox_news.sh (Score:5, Funny)

Re:fox_news.sh (Score:4, Funny)

Re:fox_news.sh (Score:2)

Re:fox_news.sh (Score:4, Insightful)

Re:fox_news.sh (Score:3, Interesting)

Re:bbc_news.sh (Score:2)

Very small shell scripts (Score:5, Funny)

Re:Very small shell scripts (Score:4, Funny)

Re:Very small shell scripts (Score:4, Funny)

Re:Very small shell scripts (Score:4, Funny)

Re:Very small shell scripts (Score:2)

But still.... (Score:5, Insightful)

Re:But still.... (Score:2)

Re:But still.... (Score:4, Insightful)

Re:But still.... (Score:2)

Re:But still.... (Score:2)

yr comment's a journalism integrity question... (Score:2, Interesting)

Re:But still.... (Score:2)

The Ultimate Tool For Plagiarism (Score:5, Interesting)

Re:The Ultimate Tool For Plagiarism (Score:3, Interesting)

Re:The Ultimate Tool For Plagiarism (Score:3, Interesting)

Re:The Ultimate Tool For Plagiarism (Score:4, Interesting)

Re:The Ultimate Tool For Plagiarism (Score:2)

Re:you really are only cheating yourself (Score:2)

Re:The Ultimate Tool For Plagiarism (Score:2)

Re:The Ultimate Tool For Plagiarism (Score:2)

Re:The Ultimate Tool For Plagiarism (Score:2)

Re:The Ultimate Tool For Plagiarism (Score:3, Insightful)

Heh (Score:2)

Dupe (Score:5, Informative)

School Reports (Score:4, Insightful)

Re:School Reports (Score:2, Interesting)

Rethink English ! (Score:4, Informative)

Re:Rethink English ! (Score:3, Insightful)

Re:It's globalisation (Re:Rethink English !) (Score:2)

Re:Rethink English ! (Score:3, Interesting)

Fake literature (Score:4, Funny)

Re:Fake literature (Score:2)

Re:Fake literature (Score:2)

Or games... (Score:3, Funny)

Reference (Score:2)

Re:Reference (Score:2)

Replace reporters?? (Score:2, Insightful)

Re:Replace reporters?? (Score:2)

Re:Replace reporters?? (Score:2)

Something similiar existed on the Amiga (Score:4, Interesting)

Possible outcome of computerized paraphrasing (Score:2, Funny)

Do you know what reporters DO? (Score:5, Insightful)

Maybe you're not sure what linguists do... (Score:2, Insightful)

Re:Maybe you're not sure what linguists do... (Score:3, Insightful)

Re:Maybe you're not sure what linguists do... (Score:2)

Re:Maybe none of are sure what Ashcroft does... (Score:2)

Re:Do you know what reporters DO? (Score:2)

Re:Do you know what reporters DO? (Score:2)

Re:Do you know what reporters DO? (Score:3, Interesting)

Re:Do you know what reporters DO? (Score:3, Insightful)

Re:Do you know what reporters DO? (Score:3)

Pair of phrase (Score:2)

Someone must research a story . . . (Score:5, Interesting)

Re:Someone must research a story . . . (Score:3, Funny)

Re:Someone must research a story . . . (Score:2)

Generation isn't that easy (Score:5, Insightful)

The article, summarized by MacOS X (Score:5, Interesting)

Re:The article, summarized by MacOS X (Score:2)

Re:The article, summarized by MacOS X (Score:2)

Hardly news... (Score:4, Informative)

WHEW - Still in infancy (Score:2)