On Finding Semantic Web Documents

On Finding Semantic Web Documents 67

Posted by michael on Friday January 14, 2005 @06:52PM from the location-location-location dept.

Anonymous Coward writes "A research group at University of Maryland has published a blog describing the latest approach for finding and indexing Semantic Web Documents. They have published it in reaction to Peter Norvig's (director of search quality at Google) view on the Semantic Web (Semantic Web Ontologies: What Works and What Doesn't): 'A friend of mine [from UMBC] just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go.'"

On Finding Semantic Web Documents

This discussion has been archived. No new comments can be posted.

Search 67 Comments Log In/Create an Account

Comments Filter:

Solution without a problem? (Score:5, Interesting)

by faust2097 ( 137829 ) writes: on Friday January 14, 2005 @07:02PM (#11368636)

Semantic web stuff if cool and all but I honestly don't believe that it will ever really take off in any meaningful way. For one, it takes a paradigm that people know and understand and adds a lot of complexity to it, both on the user end and the engineering end.

Plus a lot of the rah-rah booster club that's grown up around it sound a whole lot like the Royal Society folks in Quicksilver who keep trying to catalog everything in the world into a 'natural' organization.

What it basically comes down to for me is that it seems like a great framework for single-topic information organization but at a point we need to keep our focus on the actual content of what we're producing more than the packaging. For this to be ready for prine time the value proposition needs to move from a 30-minute explanation involving diagrams and made-up words ending in '-sphere' to something even less than an "elevator pitch" like 2 sentences.

Two sentences, eh? (Score:3, Interesting)

by misuba ( 139520 ) writes: on Friday January 14, 2005 @09:08PM (#11369886) Homepage

You're on.

1) A simple human- and machine-readable schema is defined for marking up descriptions of items for sale or wanted.
2) Google learns how to read them, thereby putting eBay, Craigslist, and other sundry companies out of business and putting your data back in your hands.

Okay, so the second sentence is a bit of a run-on, and this use case has a whole lot of hairy details I'm leaving out. But the possibilities are pretty exciting nonetheless.

Re:It's not about the filename (Score:3, Interesting)

by old_guys_can_code ( 144406 ) writes: on Friday January 14, 2005 @11:37PM (#11370869)

I work at one of the few places that crawls billions of URLs each month, and I observed exactly the same thing as Peter. There just isn't that much xml/rdf/daml/owl on the web. At the point when we had crawled 6 billion URLs, I found only 180,000 URLs that had a mime type or extension to indicate that they were machine-readable metadata.

The reason is something that people in the semantic web community are loathe to talk about - that there isn't enough incentive for people to create metadata that they put out for others to read. When we write web pages or blogs, we are able to express ourselves to other humans, but when we put out data there is no clear incentive (economic or otherwise) to justify the effort. This is probably why there is so little metadata being published.

If you wish to dispute the small amount of data, feel free to put up a web server showing a million URLs of metadata created by others.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

On Finding Semantic Web Documents 67

On Finding Semantic Web Documents More Login

On Finding Semantic Web Documents

Solution without a problem? (Score:5, Interesting)

Two sentences, eh? (Score:3, Interesting)

Re:It's not about the filename (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot