On Finding Semantic Web Documents 67
Anonymous Coward writes "A research group at University of Maryland has published a blog describing the latest approach for finding and indexing Semantic Web Documents. They have published it in reaction to Peter Norvig's (director of search quality at Google) view on the Semantic Web (Semantic Web Ontologies: What Works and What Doesn't): 'A friend of mine [from UMBC] just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go.'"
LiveJournal and other weblogging services (Score:3, Informative)
* LiveJournal.com: 5751567
* GreatestJournal.com: 717406
* DeadJournal.com: 474435
* Weedweb.net: 22650
* InsaneJournal.com: 12970
* JournalFen.net: 7629
* Plogs.net: 7086
* journal.bad.lv: 4530
(This list is most likely incomplete.)
In addition to this, every Typepad user has an account: according to the 6A merger stories, that's another million users. Add in the RDF from all the Typepad RSS files, and that's another 1 million.
All Wordpress blogs have a feed, located at
So, we've got, just as a guess, about 9 million RDF files out there in the blogging world alone. Throw in a hell of a lot of scientific data, and everything on RDFdata.org [rdfdata.org], and you start to get an idea that the world is a lot more Semantic Web enabled than you seem to think it is.
Re:LiveJournal and other weblogging services (Score:2, Informative)
Re:LiveJournal and other weblogging services (Score:3, Informative)
Care to venture a guess as to how many of those actually contain useful information? Really, who cares if Melanie in Oshkosh really, really loves Justin Timberlake, or Winthorpe in Des Moines really, really wants people to sign up so he can get an Ipod?
Furthermore, once you start tying all this information together, doesn't that just make the work for corporate data miners just that much easier?
Of course, you could salt in a bunch of useless, random data, which of course, means that the whole shooting match is useless.