Extracting Meaning From Millions of Pages

Extracting Meaning From Millions of Pages 138

Posted by kdawson on Friday June 12, 2009 @08:48AM from the data-mining-gone-large dept.

freakshowsam writes "Technology Review has an article on a software engine, developed by researchers at the University of Washington, that pulls together facts by combing through more than 500 million Web pages. TextRunner extracts information from billions of lines of text by analyzing basic relationships between words. 'The significance of TextRunner is that it is scalable because it is unsupervised,' says Peter Norvig, director of research at Google, which donated the database of Web pages that TextRunner analyzes. The prototype still has a fairly simple interface and is not meant for public search so much as to demonstrate the automated extraction of information from 500 million Web pages, says Oren Etzioni, a University of Washington computer scientist leading the project." Try the query "Who has Microsoft acquired?"

Extracting Meaning From Millions of Pages

This discussion has been archived. No new comments can be posted.

Search 138 Comments Log In/Create an Account

Comments Filter:

Re:Not entirely helpful (Score:3, Insightful)

by John Hasler ( 414242 ) writes: on Friday June 12, 2009 @09:03AM (#28306859) Homepage

The major problem is that it assumes the presence of meaning in Web pages in the first place.

Towards a web with only one page: Google (Score:1, Insightful)

by Anonymous Coward writes: on Friday June 12, 2009 @09:13AM (#28306961)

Are we moving towards a web in which Google centralises everything on their own pages? These new engines present content without the need to visit pages it originates from. Is Google basically mooching off other people's websites with hardly anything - if anything at all - in return?
It could be dangerous if the only visitor a web site can expect is the Google bot.

Re:Not entirely helpful (Score:3, Insightful)

by jerep ( 794296 ) writes: on Friday June 12, 2009 @09:43AM (#28307297)

it just repeats what other people have said
I don't see anything new here, most people have done this since the beginning of time.

Re:Wikipedia tried and failed (Score:4, Insightful)

by Colonel Korn ( 1258968 ) writes: on Friday June 12, 2009 @09:50AM (#28307379)

That is how Wikipedia was meant to be. A group of statements about subjects, all of which can be referenced to some original source. So that people can look up something quickly and then look at the sources for more definite information....
Seeing how many people cite Wikipedia directly, use it as the main source for their research and the amount of newspapers that have been reported to directly quote inaccurate facts from Wikipedia... I don't think it is working properly. It requires a lot of optimism to believe "People will use that as a initial source and then verify the information"
That's not wikipedia's failure. Those same people would just be referencing nothing or a web site with zero public review and commenting without it.

Correction.... (Score:5, Insightful)

by wowbagger ( 69688 ) writes: on Friday June 12, 2009 @10:32AM (#28307983) Homepage Journal

"...that pulls together facts by combing through more than 500 million Web pages."
Correction:
"...that pulls together assertions by combing through more than 500 million Web pages."
Whether those assertions are correct or even reasonable is a completely different issue.
It might be interesting to then take those assertions and have some means to validate or invalidate them, but currently that's going to require meat, not metal.
Now, if you could come up with some form of AI^Walgorithm to do that automatically, then you would have something.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Extracting Meaning From Millions of Pages 138

Extracting Meaning From Millions of Pages More Login

Extracting Meaning From Millions of Pages

Re:Not entirely helpful (Score:3, Insightful)

Towards a web with only one page: Google (Score:1, Insightful)

Re:Not entirely helpful (Score:3, Insightful)

Re:Wikipedia tried and failed (Score:4, Insightful)

Correction.... (Score:5, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot