Semantic Web Getting Real 135
BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."
Semantic Spam (Score:2, Insightful)
Actually, I think it's beaten the rest of the content to the punch. =(
Re:Semantic Spam (Score:5, Funny)
Re:Semantic Spam (Score:5, Insightful)
It just doesn't seem like the best idea in the world to me.
Re:Semantic Spam (Score:5, Funny)
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:2)
Re:Semantic Spam (Score:5, Informative)
I think you are missing the point of Semantic Web: you can refer or link to an object, not just document.
The company declares its URI. Now, If you are writing an article about this company, you can uniquely identify it and every web crawler knows *exactly* what company are you talking about. If the URI for the company is a hyperlink to its web site, then it can't be abused: the company itself declares what it is. The unique URI will in fact be a link to some file with information about company (maybe an RDF file -- doesn't really matter for the concept)
The system can (and will be abused) in the same way as an old web: irrelevant links, words, concepts -- nothing new for the crawler and can be defeated with existing techniques.
Again, Semantic Web = Links between concepts, not just documents, so please do not bury the good idea under the pile of misunderstanding.
Re: (Score:2)
Re: (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2)
In Soviet Russia, the system abuses you !
Re: (Score:2, Interesting)
Re: (Score:2)
Re: (Score:2)
So the question I would have liked to pose is:
Since we can't filter out bias, how can the technology help to make the news biases more transparent and quantifiable?
For example, work like this about VP Cheney [newsbusters.org] deserves to be bagged, tagged, and ignored, for it is a blemish on the face of legitimate journalism.
Symantec Web? AHHHHHHH!!! (Score:2)
Re:Symantec Web? AHHHHHHH!!! (Score:4, Funny)
Semantic web getting real [player]
and immediately thought "it was bad enough when the original web got it"
Re: (Score:3, Funny)
The links in that article are neat. I am looking forward to watching the maturity of this!
Re: (Score:2)
What? (Score:1, Offtopic)
Re: (Score:3, Interesting)
And the only reason we moved from Web 1.0 to web 2.0, and the only reason we need to move from Web 2.0 to Web 3.0 is...
We are still stuck on Search 1.0
Well, ok, to be fair to Google -- Search 1.5
Sorry, but we won't see much improvement in utility until someone rolls out Search 2.0. That is a product LONG overdue.
Re: (Score:2)
Re: (Score:2)
Google fails to provide useful search results for a lot of searches. Some of them may well have no internet content available. Many others the content is swamped by a pletheora of other site aggregotors, link farms, or even genuine vendors selling an item you're searching for, but not giving you the information you're after.
Sure, I can refine my Google searches to cut out all these distractions. But I'm lazy; I want a two word search to give me the link I need straight away.
Re: (Score:2)
Then again, less than 1% of my searches have to do with buying something. Maybe thats the difference.
Re: (Score:3, Insightful)
If by that you mean "a collection of buzz-words that everyone uses without having Clue 1 what the hell they're talking about," yes.
Content? (Score:4, Insightful)
Command line vs GUI all over again (Score:4, Interesting)
Semantic webs might be OK for small document sets where you can visualy search tags and click them. Want to look up something about monkeys? Look for the tag that says monkeys (or maybe find primates first, then monkeys) and click it.
But for huge data sets this sucks. After a smallish number of documents & subjects it must be far easier to type monkeys in search box and have Google etc do the search.
This might work for handling some queries, but will suck supremely for complex queries over large data sets (eg. the whole www).
Re: (Score:3, Interesting)
We need to couple the proposed "semantic web" with more than the single-box search page or rathe
Where's the Money? (Score:3, Interesting)
Re:Where's the Money? (Score:5, Insightful)
long way to go.. (Score:2)
Re: (Score:3, Insightful)
Asking a question and getting a sensible answer, that's the killer app.
Re: (Score:3, Funny)
Re: (Score:3, Insightful)
Here is a basic scenario for ten years down the line:
1. You build a profile probably through a combination of allowing your online activities to be profiled, filling out in-depth surveys, and rating certain types of web-content on a semi-regular basis.
2. A proxy identity is imbued with a 'personality' based on both your preferences as represented in step one, and ongoing analysis of content that causes you to register a strong reaction
Re: (Score:3, Interesting)
Anti- Semantic comments in 3 ... 2 ... 1 ... (Score:1, Funny)
Well, I am sure the authors will just call them Anti-Zio[a]ntic comments.
Yawn... (Score:5, Interesting)
Most websites have little to say, and take all day to say it.
Having a detailed graphical analysis of the blather seems unlikely to improve the situation. GI,GO.
It would seem spending just a tad more time writing for HUMANS would be way more productive than writing for machines. Having a thousand computers watching your 100 monkeys seems unlikely to bring enlightenment or useful knowledge out of a pile of garbage and human blathering that passes for information on the web these days.
People used to write web pages.
Now they write software to write web pages.
Its not surprising they now need to write software to understand the web pages.
Whats the point?
Re: (Score:2, Informative)
Re:Yawn... (Score:5, Interesting)
Well, if we are very forgiving we can get this kind of thing happening with current technology, we just have to supply all the "content" in a form that our primitive algorithms can handle. The Semantic Web is that. Maybe around the 3rd generation of these algorithms we might be ready to do the translation to machine form automatically.. maybe not.. but at least the Semantic Web people are again talking about translation.. was a time when they all said it was a fruitless path and the best way was to just supply applications for creating machine readable content easily.
Re: (Score:1)
Re: (Score:2)
I can already ask Yahoo or Google a question and get a sensible answer. I guess I'm missing how this "semantic web" thing equates with AI that understands the meaning of English.
Besides that, if you rely on the "content providers" to provide the meta-data the system is less than useless. Legitimate sites won't use it or update it, and illegitimate sites will abuse the system.
Re:Yawn... (Score:4, Interesting)
When is the next shuttle launch? [google.com]
This is the first hit, not shuttle launch info. [nasa.gov]
This is the second hit.. [nasa.gov] ah hah! The next launch is on Feb 7.. wait a minute, it's Feb 10! Was it delayed or something? Oh, I see, it says "Launched".. great, when's the next one.. March 11 +.. hmm.. wtf does + mean? Apparently I need to read this [nasa.gov] and hmm.. nothing there about what the + means.. I guess it means it might get delayed, they do that.
See all that reasoning I had to do? See how long that took me? That's what the Semantic Web is for.
Re: (Score:2)
How does this stuff handle abuse? I mean, what's to stop Senior Spamalot from marking up all his machine-readable stuff for shuttle launches, but actually dishing you to a Viagra page? I don't understand how the "Semantic Web" won't be terribly abused.
Re:Yawn... (Score:4, Insightful)
How does Google's pagerank algorithm?
Re: (Score:3, Interesting)
But I wonder whether that approach is going to be any simpler or more effective than just developing better or more int
Re: (Score:2)
The answer, as I see it, is computer-generated metadata... at which point, why not just build that functionality into your search engine?
Yahoo are already doing that. If you go to their search page [yahoo.com], enter some search term (e.g. "linux") and search. Now, on the results page there should be a little arrow down at the bottom of the top bar; click on that and it will open up a panel that includes concepts linked to the search terms (and also possible refinements of the search). I know (from talking to the people at Yahoo) that they're deriving the concepts automatically from their spidered data, and it works really well.
How resistant is it to s
Re: (Score:2)
It's a cool trick, but it doesn't really do much useful right now. I tried the space shuttle example, and it didn't really add any value over and above what google does. On the other hand, it does a pretty good job when your search is not very specific - like just typing "Britney Spears".
They should make it more obvious that you need to push that little arrow! I never would have tried that!
Re: (Score:2)
Re:Yawn... (Score:5, Insightful)
So am I talking about search? Well, yes, but its an algorithm that uses search to answer my questions.. instead of me having to do it.
Think about that soup question.. how would you do it now? I'd go to Google maps.. enter the location of my office, search businesses for restaurants, click on one of the top 5 to see if they have a daily updated menu, note the soup of the day, go back to Google maps, click on the next one, etc, until I had the answer I wanted. That's a pretty simple algorithm.. it's something a machine learning system could come up with.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
We can't even ask questions about systems which are designed to be machine readable. Look at software debuggers.
Re: (Score:2)
Back to search and the semantic web, I think that we are using formal languages to ask questions in search every day. I would lo
Re: (Score:3, Insightful)
Sorry, there's a flaw in your reasoning: Who gets to pre-tag the data? Everybody. But you can't trust everybody on the net. So you'll get a lot of data that's specifically designed to confuse and subvert the weak algorithms, and by definition such algorithms aren't strong enough to rise to the challenge.
The Semantic Web people wi
Re: (Score:2)
Re: (Score:2)
2) "we have good ways of handling it" is a euphemism for human beings. Yes, just throw people at the problem and let them censor the bits of data that they don't like. Again, you're just letting in what you want to see coming out. Search engines have teams who get paid to scrub their data. It's not AI. We still
Re: (Score:2)
Re: (Score:2)
PageRank itself is merely about counting links, which is entirely independent of content, and not as useful on its own as you might think. For example, there's no guarantee that an index page will appear befor
Re: (Score:2)
Re: (Score:2)
The amount of garbage out there only makes these tools more necessary.
Re:Yawn... (Score:5, Interesting)
1. Let's say you work for a Fortune 500 company and you get over 10,000 emails a day from customers complaining. Do you think it is better to read each one or have a tool that abstracts it to graphically display key concepts that they are complaining about so management can do something about it today?
2. You are a clinical researcher in Cancer and have a terabyte of unstructured patient data. Can you think how text descriptions of pathology reports might be displayed graphically against outcomes to suggest some interesting insights?
There's a lot of useful information that isn't on blogs - although it would be useful for them too. You need to exercise a bit more imagination.
Re: (Score:2)
Re: (Score:2)
We also have software to write software (see [[Compiler]]). Now that is just lazy and decadent.
Re: (Score:3, Informative)
Here's one example, say I want to do a little semi-political research. I ask semantic google (which, for the sake of argument, has a more advanced query language) for the relationship between the price of RAM and the price of oil.
Right now, google could at best look for an article on that
Great, just great ... (Score:5, Funny)
Just what we need. Yet another version of RealPlayer.
you're not the only one who misread (Score:2)
Oblig. Matrix (Score:2)
"If real is what you can feel, smell, taste and see, then 'real' is simply electrical signals interpreted by your brain."
pfft... (Score:4, Funny)
Mr. Wenig must not be all that familiar with
Oops... (Score:1)
In case you have no clue what they're talking abou (Score:5, Informative)
According to the Wikipedia history, this concept has been around since at least 2001.
Re:In case you have no clue what they're talking a (Score:1)
hype, waste of time, or big mess (Score:4, Interesting)
On first read, I like what they are trying to do, but I see so many problems with what they are thinking, and I am not a web designer in any sense.
First, I don't have a problem finding things to buy on the internet. The problem is, signal to noise ratio. There are TOO MANY google results for something like 'plasma tv.' No matter what kind of RDF is used, it will be abused by people who want their URL to show up in your search for whatever reason. I think someone touched on this earlier a little in this thread, but it deserves repeating.
Second, can you imagine a scenario where, say, best buy or fry's uses some 'semantic web' application to do real time web searchable updates of their inventory? That's what would have to happen for this to work, and do something that isn't already possible.
Right now, I can search for 'plasma tv' in google or ebay. Then I can call my local retailers to see if they carry that item, and have it in stock. In order for this system to make any kind of tangible change in the example given, retail chains would have to update their inventories online, whenever a purchase is made, or new items delivered to the store.
It's an interesting idea. I wonder if the retailers would go for it? All it means for them is fewer people comming into their stores...sounds like that would hurt sales.
I also hate internet hype. I really fouls things up, more than some want to acknowledge. I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc. He doesn't need a new buzzword to motivated to shop online or whatever.
he has the motivation already...silly contrived 'new meida' buzzwords just waste time and confuse people
Fallacy: Designing for Old People (Score:3, Insightful)
I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc.
I'm afraid whenever I see this argument I immediately tend to discredit all the rest that I've read in that post. Designing technology for those who are least able to uptake it is a losing proposition at best; at worst a
Fallacy: Not reading comment (Score:2)
I never mentioned Design! You didn't read my post very well, did you. I said that the HYPE of buzzwords like 'semantic web' or 'web 2.0' is lame, unnecesarily confusing, and annoying. The word hype was the first word in the subject of my post!
Here, I've copied the paragraph from my post that you read incorrectly, emphasis mine
Re: (Score:2)
I think that I spoke to the point you made, if not to the point you think you made. This 'discussion' - which I will leave vaguely defined, as it is - be it the design, the hype, or whatever, is taking place between people who are actively seeking this technology. It has really little to do with those people who due to some circumstance (of which age may be one) could care less until it's matured.
It is therefore a fallacy to use your grandfather as an example of why things are 'too confusing'. There IS a l
Re: (Score:2)
Calm down! (Score:2)
I don't know what provoked your vitriol. I'm not a troll - but the moderators are welcome to disagree. Since they haven't yet, I'm currently disposed to thinking you're overreacting to my disagreement with your viewpoint. I'm as happy as you seem to be to let it lay, however.
Re: (Score:2)
Not the Semantic Web (Score:5, Insightful)
natural language processing in search? (Score:3, Interesting)
But then if you're creating an addon for joomla (or any template elements really) to display event listings why not add a semantic tag so that a search engine could limit the domain by "tag:events". The extra effort involved is pretty minimal, especially when, if you code well, each event is probably in a "<div class="event eventtype">
Once people realise that searc
Why can't AI get the semantics from the plain text (Score:3, Insightful)
The massive corpus size, when measured carefully, acts to filter semantic signal from expressive difference "noise".
Combine that kind of latent semantic analysis of global human text with conceptual knowledge representation and inference
technologies (which would use a combination of higher-order logic, bayesian probability, etc) and it should be possible to
create a software program that could start to get a basic semantic understanding of documents and document relationships
in the ordinary "dumb" web.
Could the proponents of the semantic web please tell me what it will add to this?
My basic proposition is that if an averagely intelligent human can infer the semantic essence (the gist, shall we say), of
individual documents, and relationships between documents on the web, why can't we build AI software that does
the same thing, and then reports its results out to people who ask.
Re:Why can't AI get the semantics from the plain t (Score:1, Insightful)
Re: (Score:3, Insightful)
Re:Why can't AI get the semantics from the plain t (Score:3, Informative)
Actually, the story is about a tool which does (a part of) what you are describing.
Because "AI" is a misnomer (Score:3, Informative)
We're so early in the development of this field that no one can even define what "self awareness" or "consciousness" really is, let alone how to create it or scale i
Re: (Score:2)
OpenCalais (Score:3, Funny)
Re: (Score:2)
Trains-on-rails, tunneled, would be the most secure: less chance of someone seeing your bytes ferried across, and a man-in-the-middle attack would be much more difficult !!
Really real this time? (Score:1, Flamebait)
Given that the Semantic Web is neither Semantic nor Web, I think we've got another data point for that theory.
Re: (Score:2)
Dude, you forgot the ending `Discuss'.
Kids...
Confusing terms. (Score:1)
http://www.w3.org/2001/sw/ [w3.org]
This article is about a news organization using semantic tools to help extract and manipulate certain data. Sure, they are related a little maybe, but if related meant equal, then every computer would break.
Just because the word "semantic" matches, they've confused the two domains, and if humans can't even do it, I wonder what our automated semantic web would look like with robots trying to make connections. I ca
"Free" for "anyone"? Not so fast. (Score:3, Informative)
Also, it's not yet for "anyone." According to the Calais roadmap [opencalais.com], only English documents are accepted: "Calais R3 [July 2008] begins
A Little too Cynical (Score:4, Insightful)
I understand being jaded about internet hype and buzzwords but I'm still surprised that after nearly eighty comments there doesn't seem to be anyone who has anything to say other than "vaporware" and "it won't work because of the spammers." Yes, maybe it has been overhyped and yes it is taking a while for the envisioned ideas to come to fruition but that doesn't mean that those ideas aren't worthwhile.
I'll use the following example because I recently had to do this with non semantic tools. Lets say you wanted to see how good or bad a job a transit agency is doing in its city in comparison to other similar cities. A couple of metrics you might use to find similar cities would be population size, population density and land area. Google doesn't do a good job with something like that. You end up needing to search for cities individually and then finding their data points. Or you can find a list of cities ranked by population or population density. If you search on Google for something like that you end up at one of the Wikipedia lists. These lists are helpful but....still lacking. They don't contain all the cities you need or they don't provide a way to look at multiple data sets at the same time. The lists are also compiled by hand and aren't automatically updated when the information on the city page is changed. The data is in wikipedia though. Every city page lists that information in a little box near the start of the article. But how do I take this data that is in Wikipedia from the form that its in into a form that I can use to find what I need to know? Enter the semantic web.
Lets say that wikipedia, or at least the parts dealing with geography, were semantic. Now, there are tens of thousands of pages describing countries, regions, states, counties, parishes, cities, towns and villages. Then those pages are translated into many other languages. Some of the data that these pages contain is of the same type . They all contain the name of the locality, latitude, longitude, size, population size and elevation. For data such as this it would be pretty easy to have a form to enter the data into as opposed using the usual markup and the form could put the data into the proper markup for the page and the proper RDF. Once the data is in proper RDF form it would be easy to automate the process of updating translations of that page with the new data as well as updating any pertinent lists. It would also make it easier for people who want to analyze or use the data because they would be able to access it much more easily.
But nobody really wants machine readable access to this information, you might say, except for the random geek and researcher. I would disagree. Lets say you're using a program like Marble which is similar to Google Earth in some ways but is completely open source. If they wanted to display the population of a city when you hover over it they would currently have to create and maintain their own dataset or they'd have to write a parser to extract it from wikipedia. Neither of those options is particularly easy at the moment but if the information was in semantic form on wikipedia it would be a piece of cake.
The strength of the semantic web isn't, in my opinion, going to be AI like personal agents or anything like that. It'll be things that in many ways are already here. Like Yelp putting geotags on the restaurants they reviews and apps like Google Earth taking that data thats available in machine readable (Semantic!) for to overlay that data on a map so that you can see whats nearby. It'll be applications doing the same with the geotags from flickr. Its really useful mashups like http://www.housingmaps.com/ [housingmaps.com]. Its the transit agency putting realtime bus data up in semantic form so you can see on your iphones google map how far away the bus is. So yeah, maybe the semantic web is overhyped but that doesn't mean there isn't a lot of substance there, too.
Cheers,
Greg
Semantic MediaWiki already exists (Score:2)
It's basically just a matter of tweaking it and putting some real data in.
Vapourware my arse (Score:4, Insightful)
The company I work for, Garlik [garlik.com] has two products that are run off semantic web technology. DataPatrol [garlik.com] (for pay) and QDOS [qdos.com] (free, in beta).
We use RDF stores instead of databases in some places as they are very good at representing graph structures, which are a real pain to real with in SQL. You often hear the "what can RDF do that SQL can't" type arguments, which are all just nonsense. What can SQL do that a field database, or a bunch of flat files can't? It's all about what you can do easily enough that you will be bothered to do it.
A fully normalised SQL database has many of the attributes of an RDF store, but
a) when was the last time you saw one in production use?
b) how much of a pain was it to write big queries with outer joins?
RDF + SPARQL [w3.org] makes that kind of thing trivial, and has other fringe side benefits (better standardisation, data portability) that you don't get with SQL.
I guess it shouldn't be a surprise to see the comments consisting of the usual round of more-or-less irrelevant jokes and snide commentary - this is Slashdot after all - but I can't help responding.
Re: (Score:2)
Without knowing the details of your circumstances, it sounds like, maybe, the real point is that what you wa
Re: (Score:2)
For the record; I am a researcher working in the Semantic Web area, and I am primary developer of the system IkeWiki [salzburgresearch.at] and the reasoning language Xcerpt. Since this discussion seems to pop up again and again on Slashdot, I didn't want to add comments to the same issues (trust, search) again. But your comment might add something new to the discussion:
Without knowing the details of your circumstances, it sounds like, maybe, the real point is that what you want is an object oriented database rather than rela
Re: (Score:2)
I work for a research lab in the Netherlands; we've also finished quite a few projects using Semantic Web technology. Our use case is large heterogenous data sets in agrotech, like representing all knowledge on growing tomatoes and tomato quality in the Dutch agro sector.
Finally a comment that compares Semantic Web technology to RDBMS technology. It's very unfortunate that it has "Web" in the name. Makes the clueless think it's supposed to be a try for WWW 3.0, or something...
Re: (Score:2, Insightful)
Re: (Score:2)
One shortcoming is the lack of interactivity on large datasets. Most web searchers iterate through a few queries until they get what they want,