Language Models Like GPT-3 Could Herald a New Type of Search Engine (technologyreview.com) 13
An anonymous reader quotes a report from MIT Technology Review: In 1998 a couple of Stanford graduate students published a paper describing a new kind of search engine: "In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems." The key innovation was an algorithm called PageRank, which ranked search results by calculating how relevant they were to a user's query on the basis of their links to other pages on the web. On the back of PageRank, Google became the gateway to the internet, and Sergey Brin and Larry Page built one of the biggest companies in the world. Now a team of Google researchers has published a proposal for a radical redesign that throws out the ranking approach and replaces it with a single large AI language model, such as BERT or GPT-3 -- or a future version of them. The idea is that instead of searching for information in a vast list of web pages, users would ask questions and have a language model trained on those pages answer them directly. The approach could change not only how search engines work, but what they do -- and how we interact with them.
[Donald Metzler and his colleagues at Google Research] are interested in a search engine that behaves like a human expert. It should produce answers in natural language, synthesized from more than one document, and back up its answers with references to supporting evidence, as Wikipedia articles aim to do. Large language models get us part of the way there. Trained on most of the web and hundreds of books, GPT-3 draws information from multiple sources to answer questions in natural language. The problem is that it does not keep track of those sources and cannot provide evidence for its answers. There's no way to tell if GPT-3 is parroting trustworthy information or disinformation -- or simply spewing nonsense of its own making.
Metzler and his colleagues call language models dilettantes -- "They are perceived to know a lot but their knowledge is skin deep." The solution, they claim, is to build and train future BERTs and GPT-3s to retain records of where their words come from. No such models are yet able to do this, but it is possible in principle, and there is early work in that direction. There have been decades of progress on different areas of search, from answering queries to summarizing documents to structuring information, says Ziqi Zhang at the University of Sheffield, UK, who studies information retrieval on the web. But none of these technologies overhauled search because they each address specific problems and are not generalizable. The exciting premise of this paper is that large language models are able to do all these things at the same time, he says.
[Donald Metzler and his colleagues at Google Research] are interested in a search engine that behaves like a human expert. It should produce answers in natural language, synthesized from more than one document, and back up its answers with references to supporting evidence, as Wikipedia articles aim to do. Large language models get us part of the way there. Trained on most of the web and hundreds of books, GPT-3 draws information from multiple sources to answer questions in natural language. The problem is that it does not keep track of those sources and cannot provide evidence for its answers. There's no way to tell if GPT-3 is parroting trustworthy information or disinformation -- or simply spewing nonsense of its own making.
Metzler and his colleagues call language models dilettantes -- "They are perceived to know a lot but their knowledge is skin deep." The solution, they claim, is to build and train future BERTs and GPT-3s to retain records of where their words come from. No such models are yet able to do this, but it is possible in principle, and there is early work in that direction. There have been decades of progress on different areas of search, from answering queries to summarizing documents to structuring information, says Ziqi Zhang at the University of Sheffield, UK, who studies information retrieval on the web. But none of these technologies overhauled search because they each address specific problems and are not generalizable. The exciting premise of this paper is that large language models are able to do all these things at the same time, he says.
Idea! (Score:2)
2. Use adversarial training of AI tools to defeat the bullshit detectors in Google's Expert Mode
4. Profit!
Re: (Score:2)
Re: (Score:2)
are you describing TFA? :)
Faking it. (Score:3)
There's no way to tell if GPT-3 is parroting trustworthy information or disinformation -- or simply spewing nonsense of its own making.
In a world where "the truth" is as malleable as a deep-fake, and as sponsored as a nation-state, how can a machine do better?
Re: (Score:1)
not what I want (Score:3)
Re: (Score:2)
Transporter = lots of movie references
Transporter -movie = better, but not what I was searching for
Oxygen transporter = This is what I was looking for. Hemoglobin oxygen transportation.
--
Re: (Score:3)
Yes, that often helps. In this case I searched for "Star Trek transporter", since that is what I was looking for, but page after page about the movie still came up, because Google seems to prioritize things based on popularity instead of actual relevance.
Re: (Score:3)
Same here. I have stopped using Google as my primary search, it just wastes my time.
GPT is ??? (Score:2)
Global Pranking Theory version 3?
This will go down well on Slashdot (Score:3)
As it stands you see an incredible number of people here still treating Google like a techie search engine and trying to craft complex word queries when over the years Google has very much changed its model to be far more of a natural language processor.
This will be the next step in helping grandma while returning ever worse results to techies.
awful, mixes use cases (Score:2)
Not a fan of the old "everyone gets it this way" approach.