Extracting Meaning From Millions of Pages 138
freakshowsam writes "Technology Review has an article on a software engine, developed by researchers at the University of Washington, that pulls together facts by combing through more than 500 million Web pages. TextRunner extracts information from billions of lines of text by analyzing basic relationships between words. 'The significance of TextRunner is that it is scalable because it is unsupervised,' says Peter Norvig, director of research at Google, which donated the database of Web pages that TextRunner analyzes. The prototype still has a fairly simple interface and is not meant for public search so much as to demonstrate the automated extraction of information from 500 million Web pages, says Oren Etzioni, a University of Washington computer scientist leading the project." Try the query "Who has Microsoft acquired?"
Re:Not entirely helpful (Score:2, Informative)
There are some that incorporate some intention or opinion polarity detection, but even those are not capable to sorting "truth" versus "conspiracy".
Additionally, semantic extraction output, like named entities [wikipedia.org] and semantic relations [wikipedia.org], are useful for many other applications.
Why WTC name is spelled in American (Score:3, Informative)
Damn my correct spelling of English words!
Because the World Trade Center was located on American soil, its name is spelled in American dialect.