Words That Speak a Thousand Pictures 102
venolius writes: "The New York Times (free registration required) has an article
on TextArc (created by W.Bradford
Paley), a site that "aids in the
discovery of patterns and and concepts in arbitrary text" (from the detailed
overview at TextArc). The site serves an applet that performs the task
(texts on which analysis is available include Alice
in Wonderland, Hamlet, and thousands
of others -made available by Project
Gutenberg-). The NYTimes article reports that Paley found that
"Dracula", which relies on a strong storyline had a few keywords
clustered hotly at the center, and that the metaphoric "Frankenstein"
generated a circle of 50 words of modest intensity that faded towards the edges.
"Portrait of the Artist as a Young Man" with evenly distributed key
words produces tight and round lines and "Alice in Wonderland"
produces loopier lines. Check it out! (the applet was tested on better
hardware, but I did well enough with 98/IE6/550MHz/64MB)"
Free reg. (Score:2, Informative)
archive.nytimes to acces the article without registration.
/. interview with Michael Hart (Score:2, Informative)
Nupedia and Project Gutenberg Directors Answer [slashdot.org] - a
The Emperor Is Naked! (Score:1, Informative)
Blind and copyright books (Score:1, Informative)
Re:Please... (Score:2, Informative)
Thanks for all the discussion!
Here are some notes from the perpetrator (Brad)...
>by morhoj on Tuesday April 16, @07:29AM (#3349188)
>Don't ever do that to my browser again...
Valuable feedback; perhaps more gracefully put by
>by Paradise Pete on Tuesday April 16, @09:34AM (#3349613)
>I think his complaint was that it did it unexpectedly.
I have put in a warning about the screen takeover; Others say there are ample warnings about the research & speed issues, so I left that alone. I agree that
Hamlet.html and Thousands.html, where the warnings are, rather than directly to the page that opens the applets. Can this be changed now, so others don't have morhoj's problem?
---
>by reo_kingu on Tuesday April 16, @07:33AM (#3349201)
>is this really new? I think maybe some of my teachers having
>been using this thing to grade papers.
Don't know if it's new, but I haven't seen it before.
>by big.ears on Tuesday April 16, @10:41AM (#3350110)
>...factor analysis on text. It maps every word in a text into about a
>100-dimensional space, based on how often they co-occur in similar
>contexts. If you feed those factors into a clustering algorithm or and
>multi-dimensional scaler in order to present it graphically, you probably
>get something very close to this trick.
Flattering, but I was trying to come up with something easier to write and explain. This trick uses arithmetic (each word is drawn at its average position) not math. Net pull of a bunch of rubberbands is easier to explain _and_ conceptualize for a lot of my audience.
---
>by proxybyproxy on Tuesday April 16, @07:56AM (#3349250)
>Once again Project Gutenberg shows its beautiful face.
Hear, here! Inspiring and generous work.
---
>Just ran Slashdot through it (Score:2, Funny)
>by Anonymous Coward on Tuesday April 16, @08:43AM (#3349375)
;)
---
>by TheCrunch on Tuesday April 16, @09:28AM (#3349577)
>(User #179188 Info | http://www.slippersandpipe.co.uk/) But a word
>of warning to anyone else running Win98 on a P133 with 64MB RAM.
>This thing nuts your machine. I can't get it off my desktop. I'm gonna
>have to reboot again.. arg.
Sorry... That warning's now on the intro pages to each applet
---
>The Emperor Is Naked! (Score:1, Informative)
>by robbway on Tuesday April 16, @09:47AM (#3349706)
>I have to say it: I see no value in this. The mathematical algorithms do
>more to shape the images than the words themselves. My opinion is
>that this is rather unartistic, uninspiring, and doesn't reveal anything
>about language at all.
A damning observation, if it were true. I also have little respect for artsy code that doesn't express the variability in the data. In fact, the only "algorithm" here is the averaging, so any variation _must_ come from the language. They initially look similar, but so do leaves to people who don't get into the country a lot. For some people developing a feel for how different texts reveal themselves here might be worth the time. But I expect that will take more than a few minutes.
As to unartistic--I'll weigh your opinion with Larry at the Whitney, Bruce at Columbia, Matt at the Times, Sara at Banff, and a few dozen others as I decide whether it's art. (I made it as an ndex/concordance).
I agree that it doesn't say anything about language, but leaves don't say anything about biology. _You_ gotta provide the intelligence.
Actually, it was built to tap into the human brain's pre-attentive processing abilities. (Oh no, do I need to provide a warning now that it'll take over your brain as well as your desktop?
in a TextArc it wasn't jumping randomly, but to the next most "important" word, where "importance" is some function of brightness (frequency), position (distribution), and recency of concept activation,
or level of interest (in your own head). It seems to work especially well in the 32" x 20" printed versions. Different people read different things.
---
>Wishing I could see an example... (Score:1)
>by BobTheJanitor on Tuesday April 16, @10:33AM (#3350032)
Some screen shots are on the site, lower right button. (Guess I should make it more prominent.) http://textarc.org/Stills.html
---
>Dark grey text on black background? (Score:1)
>by an_mo on Tuesday April 16, @11:20AM (#3350462)
>If textarc.org [textarc.org] continues to publish their stuff
>with dark grey text on a black bacground they're not
>reacing for the masses.
Oops. Fixed, I think. (Do you?)