Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Internet Businesses Google

On Finding Semantic Web Documents 67

Anonymous Coward writes "A research group at University of Maryland has published a blog describing the latest approach for finding and indexing Semantic Web Documents. They have published it in reaction to Peter Norvig's (director of search quality at Google) view on the Semantic Web (Semantic Web Ontologies: What Works and What Doesn't): 'A friend of mine [from UMBC] just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go.'"
This discussion has been archived. No new comments can be posted.

On Finding Semantic Web Documents

Comments Filter:
  • by faust2097 ( 137829 ) on Friday January 14, 2005 @07:02PM (#11368636)
    Semantic web stuff if cool and all but I honestly don't believe that it will ever really take off in any meaningful way. For one, it takes a paradigm that people know and understand and adds a lot of complexity to it, both on the user end and the engineering end.

    Plus a lot of the rah-rah booster club that's grown up around it sound a whole lot like the Royal Society folks in Quicksilver who keep trying to catalog everything in the world into a 'natural' organization.

    What it basically comes down to for me is that it seems like a great framework for single-topic information organization but at a point we need to keep our focus on the actual content of what we're producing more than the packaging. For this to be ready for prine time the value proposition needs to move from a 30-minute explanation involving diagrams and made-up words ending in '-sphere' to something even less than an "elevator pitch" like 2 sentences.
  • Two sentences, eh? (Score:3, Interesting)

    by misuba ( 139520 ) on Friday January 14, 2005 @09:08PM (#11369886) Homepage
    You're on.

    1) A simple human- and machine-readable schema is defined for marking up descriptions of items for sale or wanted.
    2) Google learns how to read them, thereby putting eBay, Craigslist, and other sundry companies out of business and putting your data back in your hands.

    Okay, so the second sentence is a bit of a run-on, and this use case has a whole lot of hairy details I'm leaving out. But the possibilities are pretty exciting nonetheless.
  • by old_guys_can_code ( 144406 ) on Friday January 14, 2005 @11:37PM (#11370869)
    I work at one of the few places that crawls billions of URLs each month, and I observed exactly the same thing as Peter. There just isn't that much xml/rdf/daml/owl on the web. At the point when we had crawled 6 billion URLs, I found only 180,000 URLs that had a mime type or extension to indicate that they were machine-readable metadata.

    The reason is something that people in the semantic web community are loathe to talk about - that there isn't enough incentive for people to create metadata that they put out for others to read. When we write web pages or blogs, we are able to express ourselves to other humans, but when we put out data there is no clear incentive (economic or otherwise) to justify the effort. This is probably why there is so little metadata being published.

    If you wish to dispute the small amount of data, feel free to put up a web server showing a million URLs of metadata created by others.

"Gravitation cannot be held responsible for people falling in love." -- Albert Einstein

Working...