Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Graphics Software

Words That Speak a Thousand Pictures 102

venolius writes: "The New York Times (free registration required) has an article on TextArc (created by W.Bradford Paley), a site that "aids in the discovery of patterns and and concepts in arbitrary text" (from the detailed overview at TextArc). The site serves an applet that performs the task (texts on which analysis is available include Alice in Wonderland, Hamlet, and thousands of others -made available by Project Gutenberg-). The NYTimes article reports that Paley found that "Dracula", which relies on a strong storyline had a few keywords clustered hotly at the center, and that the metaphoric "Frankenstein" generated a circle of 50 words of modest intensity that faded towards the edges. "Portrait of the Artist as a Young Man" with evenly distributed key words produces tight and round lines and "Alice in Wonderland" produces loopier lines. Check it out! (the applet was tested on better hardware, but I did well enough with 98/IE6/550MHz/64MB)"
This discussion has been archived. No new comments can be posted.

Words That Speak a Thousand Pictures

Comments Filter:
  • by tessellation ( 133537 ) on Tuesday April 16, 2002 @07:46AM (#3349231)
    ...the one we already have, that is:

    map [lexfn.com] connections between two words, concepts, or famous names

    see [rhymezone.com] a word's rhymes, synonyms, definitions

    and I leave the rest to you.
  • Gutenberg (Score:5, Interesting)

    by proxybyproxy ( 561395 ) on Tuesday April 16, 2002 @07:56AM (#3349250)
    Once again Project Gutenberg shows its beautiful face. If you haven't heard about it before, then read a Wired feature here [wired.com]. Michael Hart started the project years ago and he wants to digitize anything which is out of copyright. The uses are infinite (think of the blind who can fead texts to tactile printers, for example), which this story also shows.

    Anyway, Hart is a big supporter of sensible copyrights (read the feature) and if you can spare the time, help him by digitizing your favourite book.
  • Just so you know (Score:2, Interesting)

    by sielwolf ( 246764 ) on Tuesday April 16, 2002 @08:07AM (#3349279) Homepage Journal
    The Jargon File is out there and, oddly enough it too looks pretty similar to the others described. I don't know that is speaking highly of the JF or poorly abou the rest of the work out there.
  • Market trends. (Score:3, Interesting)

    by Faux_Pseudo ( 141152 ) <Faux.Pseudo@gmail.cFREEBSDom minus bsd> on Tuesday April 16, 2002 @08:33AM (#3349344)
    This is very nice looking.
    Would make a really cool screen saver if it where in c and not java. Any volentears?
    But now I must put on my "think like corp. hat"
    Some publisher goes out and maps all the great books and compairs them with current best sellers. Coralate the patterns and then decide that Fromat X creates the best sellers that people buy. Now they refuse to print any book that does not fit their demo graphic of what they concider to be the next best seller.

    Its only a matter of time befor these kinds of things are used like a DNA test to see weather a book has good "genes" or bad "genes".

    I know it sounds like a conspearicy but I have seen corp.s do stranger things in attempting to repeat past successes. Just look at the movies. We are about to release Star Wars -2 in the name of working on a tried and true formula that started with the release of Jaws II. Did anyone else catch the Special on PBS (frountline i think) that talked about how Jaws was the birth of the end of original movies as we knew it?
  • Word linkages (Score:3, Interesting)

    by Blue23 ( 197186 ) on Tuesday April 16, 2002 @08:55AM (#3349420) Homepage
    This is exceedingly facinating. I've worked with word associations for computer authoring, mostly Markov chains of various lengths and phrase-structure stuff. While this takes works for human authors and works out from that, there are some very interesting concepts in here which may be useful in the other direction.

    And on top, a wonderful way of displaying it, to catch the eye so the brain has time to engage. 8)

    =Blue(23)
  • Rosetta Stone (Score:3, Interesting)

    by JJ ( 29711 ) on Tuesday April 16, 2002 @09:05AM (#3349464) Homepage Journal
    TextArc would certainly be a useful tool for analysis of undeciphered languages and texts. Ventris certainly could have used this for Linear B. The only big limitations would be requiring a suitable sized text and having a consistent meaning to that text. As in, the Rosetta stone probably was not a long enough text to analyse this way.
  • by iotk ( 547332 ) on Tuesday April 16, 2002 @09:09AM (#3349483)
    Does anyone besides me think that this kind of technique could provide stronger protection in cases of source code piracy such as GPL violations, theft of codebase, etc.,? By generating visual patterns based on the occurrence of keywords (or even compiled bytecodes) a signature of a codebase could be generated that is still recognizable even after comments have been stripped out or subtle changes introduced. This could be immensely valuable in GPL infringement cases.
  • Re:You know.. (Score:3, Interesting)

    by big.ears ( 136789 ) on Tuesday April 16, 2002 @10:41AM (#3350110) Homepage
    You are probably thinking of LSA: (Latent Semantic Analysis) [colorado.edu], which was 'Invented' by former Bell labs researcher (and current U of Colorado psych. prof) Tom Landauer. He uses it to grade his papers, and others probably do as well. It uses the same principle that some search engines (e.g., excite) are based on, and essentially amounts to factor analysis on text. It maps every word in a text into about a 100-dimensional space, based on how often they co-occur in similar contexts. If you feed those factors into a clustering algorithm or and multi-dimensional scaler in order to present it graphically, you probably get something very close to this trick.
  • by tom's a-cold ( 253195 ) on Tuesday April 16, 2002 @11:11AM (#3350371) Homepage
    Notice the "patent pending" notice on the site.

    While this is a delightful little entertainment, and quite fun to play with (though a bit of a hog while it's running, not to mention my difficulty in getting it to run in Mozilla on Win32), semantic networks have been around forever. Let's hope the patent application is meant to keep things like this in the public domain, rather than fencing in yet another area of the commons.

One man's constant is another man's variable. -- A.J. Perlis

Working...