Words That Speak a Thousand Pictures 102
venolius writes: "The New York Times (free registration required) has an article
on TextArc (created by W.Bradford
Paley), a site that "aids in the
discovery of patterns and and concepts in arbitrary text" (from the detailed
overview at TextArc). The site serves an applet that performs the task
(texts on which analysis is available include Alice
in Wonderland, Hamlet, and thousands
of others -made available by Project
Gutenberg-). The NYTimes article reports that Paley found that
"Dracula", which relies on a strong storyline had a few keywords
clustered hotly at the center, and that the metaphoric "Frankenstein"
generated a circle of 50 words of modest intensity that faded towards the edges.
"Portrait of the Artist as a Young Man" with evenly distributed key
words produces tight and round lines and "Alice in Wonderland"
produces loopier lines. Check it out! (the applet was tested on better
hardware, but I did well enough with 98/IE6/550MHz/64MB)"
Please... (Score:1)
You would expect... (Score:1)
guess not...
Re:You would expect... (Score:1)
However, when I loaded the Bible, it chewed up just about all of the available memory. The machine started choking a little and swapping to the hard disk, something I haven't heard it do in a *long* time.
See this [frontiernet.net].
Re:Please... (Score:2, Funny)
Re:Please... (Score:2)
Maybe you're using a crappy browser [microsoft.com]?
Re:Please... (Score:2, Funny)
Heh
Re:Please... (Score:5, Funny)
Point: Somehow if it uses *your* CPU it's different, but when Google's machines do all the work it's somehow OK. Next you'll be complaining about websites doing a DoS by forcing your browser to use CPU by rendering HTML and your TCP stack having to store a sliding window whenever you view a webpage. This selfish attitude is why all filesharing software must be redesigned to NOT allow anyone to kick uploaders. If you want to kick uploaders then shut down the filesharing App, but then you lose download karma.
My personal opinion: Due to heavy-client structure of the majority of machines, this applet uses the correct CPU (yours). Reduced ad revenues means that Internet companies can no longer afford massive server farms unless they require Cydoor installed on all client machines, alternatively they can decrease their equipment costs by delegating processing to client's (suerfers' browser's) CPUs. *nix systems, especially if apps are in a Java sandbox are heavily protected against most attacks, including process DoS attacks via kill -4, kill -9 and a scheduler+VM designed to stand up under heavy loads. Poor Microshaft Win9x people have to do Ctrl-Alt-Del which halts all processes while they look at the sceduler's contents, and it takes 30 seconds of a sucesful DoS for the offending process to be recognised as "unresponsive and kill -9'able" otherwise it is merely "kill -4'able". If you don't want them to use your CPU, then don't visit their website, disable JVM, disable HTML parsing in the browser (this takes CPU), read the raw HTML yourself, disable TCP stack, write packets by hand to minimise CPU usage or even bypass the CPU completely by setting the modem's line inputs yourself with a logic probe, then probe for response using oscilloscope and logic probe. Do PPP-IP-TCP by hand. That won't use your CPU.
Pop quiz hotshot, this is an Informative flamebait, what mod do you do, what mod do you do?
Re:Please... (Score:1)
Right On! (Score:1)
I love Linux & have used it as my server OS of choice for 5 years now. But to this day, I can't deal with it on the desktop. This applet worked just fine for me on Win2k with JDK1.4 and Opera 6.1 as my browser.
By the way, the applet is really cool - I love this concept!
Re:Please... (Score:2)
I'm fairly certain that Google is inviting people to use their machines. If the original poster was doing something like SETI@home, and got mad because his CPU was bogged down, then yes, that is silly. But as that is not the case, and the behavior of that site was not in line with most (considerate) websites, it is reasonable that the poster would be annoyed.
I mean, it's like if someone tried to borrow $100 from you, and you got annoyed, and they said "oh, so when banks lend out money it's supposed to be okay, but if I want to borrow money from you, suddenly *that* isn't ok." Darn right it isn't!
If you say that the person should just not visit the site... well, I guess we can defend any crappy website that throws ettiquite to the wind with that line of thought (not that this site was that bad).
mark
Java... 'nuff said (Score:2, Insightful)
How???? He had to go to the site and then go to the prefs in his browser to turn on Java and then click on the link that said it was going to analyze the entire text of some long book and make pretty pictures out of it...in Java. (and if he didn't have to turn on Java, then he's probably due for some more disappointment in the future) What alternative does the site have to make their research available to others? Should they have just put up this note?
We are doing some cool research, and we've
developed this really cool tool that we'd
love to let you play with, but we're worried
that some individuals may have unreasonable
expectations of how powerful their machines
are and we don't want to burst their bubbles,
so instead, we'll just keep it to ourselves.
that's just silly. I mean, the system recommendation contains the following:
Sounds like a good enough warning to me that if you're using a 486 with 32MB of RAM over a dialup, that, perhaps, you don't want to try running it.
IMHO,
Michael
Re:Java... 'nuff said (Score:1)
Sorry about that, I was talking more in general, and I didn't mean that this particular site necessarily fell into the "annoying" category.
I was thinking of other sites that just assume things, like that you want to go full screen or something like that.
mark
Re:Please... (Score:1)
1. Start flamewar on /. - equivalent to road rage
2. Ignore it, go home and bitch about it to your spouse/friends - normal reaction
What are they supposed to say anyway? "This java applet is going to use your CPU, please brace yourself". Quite a redundant dialogue box. I remember people on /. getting real unhappy at all these modal dialogue boxes in apps. I'm no expert, but "This app is now initialising to prepare for executing main(), please be advised your CPU is being used" modal dialogue is something we will all cuss, plus if it's in Swing it'll probably take 100 times more CPU than any sensible app anyway.
10 seconds into execution it could maybe show a quick status message that it's doing something complicated, after all many JVMs seem to use 100% CPU usage for "Hello World" anyway, so I doubt anyone will notice the difference. I suggest you simply file a bug report with the app authors and/or the JVM authors
Re:Please... (Score:2)
And then I was more arguing the principle of the matter in terms of doing unexpected things, which many other sites do. The post I was replying to suggested that if you are making their computer do things, don't complain if they do things to your computer. And I was pointing out how it's different.
I wasn't trying to say that this site here was actually so terrible, because it wasn't that bad.
mark
Re:Please... (Score:1)
Re:Please... (Score:2)
probably not properly load tested or something.
well we just took care of that.
Re:Please... (Score:1)
Re:Please... (Score:2)
Ran fine in JRE 1.4 on my Opera 6.01 at least, though I kinda failed to see the point of it.
Re:Please... (Score:1)
Current Statistics:
Logged in FPs: 4
AC FPs: 0
First Posters:
1 - morhoj
1 - Spanko
1 - teambpsi
1 - Tensor
Just fine on Konqueror (Score:1)
And that's not even Konqi 3...
Konqueror 2.2.2, Java 1.3.1, Linux.
Re:Please... (Score:2, Informative)
Thanks for all the discussion!
Here are some notes from the perpetrator (Brad)...
>by morhoj on Tuesday April 16, @07:29AM (#3349188)
>Don't ever do that to my browser again...
Valuable feedback; perhaps more gracefully put by
>by Paradise Pete on Tuesday April 16, @09:34AM (#3349613)
>I think his complaint was that it did it unexpectedly.
I have put in a warning about the screen takeover; Others say there are ample warnings about the research & speed issues, so I left that alone. I agree that
Hamlet.html and Thousands.html, where the warnings are, rather than directly to the page that opens the applets. Can this be changed now, so others don't have morhoj's problem?
---
>by reo_kingu on Tuesday April 16, @07:33AM (#3349201)
>is this really new? I think maybe some of my teachers having
>been using this thing to grade papers.
Don't know if it's new, but I haven't seen it before.
>by big.ears on Tuesday April 16, @10:41AM (#3350110)
>...factor analysis on text. It maps every word in a text into about a
>100-dimensional space, based on how often they co-occur in similar
>contexts. If you feed those factors into a clustering algorithm or and
>multi-dimensional scaler in order to present it graphically, you probably
>get something very close to this trick.
Flattering, but I was trying to come up with something easier to write and explain. This trick uses arithmetic (each word is drawn at its average position) not math. Net pull of a bunch of rubberbands is easier to explain _and_ conceptualize for a lot of my audience.
---
>by proxybyproxy on Tuesday April 16, @07:56AM (#3349250)
>Once again Project Gutenberg shows its beautiful face.
Hear, here! Inspiring and generous work.
---
>Just ran Slashdot through it (Score:2, Funny)
>by Anonymous Coward on Tuesday April 16, @08:43AM (#3349375)
;)
---
>by TheCrunch on Tuesday April 16, @09:28AM (#3349577)
>(User #179188 Info | http://www.slippersandpipe.co.uk/) But a word
>of warning to anyone else running Win98 on a P133 with 64MB RAM.
>This thing nuts your machine. I can't get it off my desktop. I'm gonna
>have to reboot again.. arg.
Sorry... That warning's now on the intro pages to each applet
---
>The Emperor Is Naked! (Score:1, Informative)
>by robbway on Tuesday April 16, @09:47AM (#3349706)
>I have to say it: I see no value in this. The mathematical algorithms do
>more to shape the images than the words themselves. My opinion is
>that this is rather unartistic, uninspiring, and doesn't reveal anything
>about language at all.
A damning observation, if it were true. I also have little respect for artsy code that doesn't express the variability in the data. In fact, the only "algorithm" here is the averaging, so any variation _must_ come from the language. They initially look similar, but so do leaves to people who don't get into the country a lot. For some people developing a feel for how different texts reveal themselves here might be worth the time. But I expect that will take more than a few minutes.
As to unartistic--I'll weigh your opinion with Larry at the Whitney, Bruce at Columbia, Matt at the Times, Sara at Banff, and a few dozen others as I decide whether it's art. (I made it as an ndex/concordance).
I agree that it doesn't say anything about language, but leaves don't say anything about biology. _You_ gotta provide the intelligence.
Actually, it was built to tap into the human brain's pre-attentive processing abilities. (Oh no, do I need to provide a warning now that it'll take over your brain as well as your desktop?
in a TextArc it wasn't jumping randomly, but to the next most "important" word, where "importance" is some function of brightness (frequency), position (distribution), and recency of concept activation,
or level of interest (in your own head). It seems to work especially well in the 32" x 20" printed versions. Different people read different things.
---
>Wishing I could see an example... (Score:1)
>by BobTheJanitor on Tuesday April 16, @10:33AM (#3350032)
Some screen shots are on the site, lower right button. (Guess I should make it more prominent.) http://textarc.org/Stills.html
---
>Dark grey text on black background? (Score:1)
>by an_mo on Tuesday April 16, @11:20AM (#3350462)
>If textarc.org [textarc.org] continues to publish their stuff
>with dark grey text on a black bacground they're not
>reacing for the masses.
Oops. Fixed, I think. (Do you?)
You know.. (Score:1)
in my school.. (Score:1)
I heard a rumour that grades were assigned by how close the teacher got to the target while holding the paper in her hand in a game of "pin the tail on the donkey"
Re:You know.. (Score:3, Interesting)
Darn... (Score:1)
Let's try it on certain newsgroups postings (Score:3, Funny)
Analise this (Score:2, Funny)
"Somebody set up us the bomb"
Is there any pattern there?
Re:WTF? (Score:1)
Other tools for exploring the Semantic Web... (Score:4, Interesting)
map [lexfn.com] connections between two words, concepts, or famous names
see [rhymezone.com] a word's rhymes, synonyms, definitions
and I leave the rest to you.
Re:Other tools for exploring the Semantic Web... (Score:2)
While it doens't paint pretty pictures, it shows some interesting results when pointed at a month of e-mail composed mostly by me. Given much more than a month or from too many people word use tends to normalize and the results seem very random.
Re:Other tools for exploring the Semantic Web... (Score:1)
Re:Other tools for exploring the Semantic Web... (Score:1)
Re:Other tools for exploring the Semantic Web... (Score:1)
Free reg. (Score:2, Informative)
archive.nytimes to acces the article without registration.
Re:Free reg. (Score:1)
Re:Free reg. (Score:1)
-Fantastic Lad
Random New York Times Registration Generator (Score:2)
Dept. of Redundancy Department (Score:2)
Brought to you by the Associated Federation of Organizations.
Gutenberg (Score:5, Interesting)
Anyway, Hart is a big supporter of sensible copyrights (read the feature) and if you can spare the time, help him by digitizing your favourite book.
/. interview with Michael Hart (Score:2, Informative)
Nupedia and Project Gutenberg Directors Answer [slashdot.org] - a
Blind and copyright books (Score:1, Informative)
This is pretty interesting (Score:3, Insightful)
I also imagine that a college professor might be interested to run this against term papers!
Re:This is pretty interesting (Score:1)
This was exactly what I was thinking. I'm studying an authorship problem of a Sanskrit text, and would like to try this on the work and other works allegedly from the same author.
By the way, the applet rendered properly on my iBook with OmniWeb. Didn't need Windows or Pentium III or 256MB RAM.
Just so you know (Score:2, Interesting)
Netscape 4.79 on SGI IRIX (Score:2)
Got the latest versions from here:
http://www.sgi.com/products/evaluation/
Zipping thru some CS Lewis right now. Very, very cool!
[snazzy sig here]
/me (Score:1)
nice try. but staring at white noise on my TV is more fun.
or listening to "cat
Market trends. (Score:3, Interesting)
Would make a really cool screen saver if it where in c and not java. Any volentears?
But now I must put on my "think like corp. hat"
Some publisher goes out and maps all the great books and compairs them with current best sellers. Coralate the patterns and then decide that Fromat X creates the best sellers that people buy. Now they refuse to print any book that does not fit their demo graphic of what they concider to be the next best seller.
Its only a matter of time befor these kinds of things are used like a DNA test to see weather a book has good "genes" or bad "genes".
I know it sounds like a conspearicy but I have seen corp.s do stranger things in attempting to repeat past successes. Just look at the movies. We are about to release Star Wars -2 in the name of working on a tried and true formula that started with the release of Jaws II. Did anyone else catch the Special on PBS (frountline i think) that talked about how Jaws was the birth of the end of original movies as we knew it?
Just ran Slashdot through it (Score:2, Funny)
Word linkages (Score:3, Interesting)
And on top, a wonderful way of displaying it, to catch the eye so the brain has time to engage. 8)
=Blue(23)
TextArc (Score:2)
Rosetta Stone (Score:3, Interesting)
Could this be useful for source code watermarking? (Score:3, Interesting)
Very Interesting (Score:1)
Re:Very Interesting (Score:1)
The Emperor Is Naked! (Score:1, Informative)
Re:The Emperor Is Naked! (Score:1)
Amazing... (Score:3, Funny)
I like it!
Wishing I could see an example... (Score:1)
This reminds me.. (Score:1)
Unpossible! (Score:2, Funny)
Re:Unpossible! (Score:1)
Another Software Patent, I See (Score:2, Interesting)
While this is a delightful little entertainment, and quite fun to play with (though a bit of a hog while it's running, not to mention my difficulty in getting it to run in Mozilla on Win32), semantic networks have been around forever. Let's hope the patent application is meant to keep things like this in the public domain, rather than fencing in yet another area of the commons.
Dark grey text on black background? (Score:1)
I have to higlight the text in order to be able to read it.
Re:Dark grey text on black background? (Score:2)
erm. it's intentional that words fade away the longer it is since they were last used.
least, that's how i interpreted it.
The actual story (which appears along the bottom) is nicely highlighted as it goes along. If you can't read it, check your monitor contrast settings.
personally i think it was interesting, and would (as someone else mentioned) make a classy screen-saver. But I can't actually see a decent use for it.
~Cederic
W.Bradford Paley Bio (Score:2)
Semi-interesting javascripts - 1
Dates with real women in the last 6 months - 0
The Bible (Score:2)
Clicking on "God" linked to damn near everything. My screen lit up yellow like the sun. Well, I guess that's one book that knows its topic!
Unforunately, the text is so large that it really didn't render very beautifully. It was really jumbled. It might be time to crank up to super-res...
-AP
TextArc analysis of Slashdot postings: (Score:2)
/--------\
|first post|
\--------/
grits
i love it (Score:1)
i do not know what this is exactly good for, but i do know that it is fscking beautiful!
caroll the old mathematician would have loved it too i believe.
A quantifiable measure added to English... (Score:1)