Google Experiments With Local Filesystem Search 482
Teoti writes "No, Puffin is not the next name of your favorite email client, but, according to the New York Times (NSA reg. req.), the project codename for a new Google search application coming directly into your desktop, that will let you search your local filesystem efficiently. This is different from, but complementary of, the Google DeskBar that already lets you search the Web. The article also gives a few words on the end of the stand alone browser in Longhorn."
Also on CNET... No NYT Registration (Score:5, Informative)
Competing with Microsoft? (Score:5, Informative)
X1 [x1.com] seems to be the most popular one out there.
DiskMeta [diskmeta.com], they had this project in beta for a while, the Windows product went into relese just last week, the site says
DT Search [dtsearch.com], I remember their ads in bunch of computer magazines, although have never used them myself.
EFS [com.com], found it on download.com, supports MS Office and PDF as well as other formats.
NYT Article (Score:5, Informative)
Re:Also on CNET... No NYT Registration (Score:5, Informative)
SAN FRANCISCO, May 18 - Edging closer to a direct confrontation with Microsoft [slashdot.org], Google, the Web search engine, is preparing to introduce a powerful file and text software search tool for locating information stored on personal computers.
Google's software, which is expected to be introduced soon, according to several people with knowledge of the company's plans, is the clearest indication to date that the company, based in Mountain View, Calif., hopes to extend its search business to compete directly with Microsoft's control of desktop computing.
Improved technology for searching information stored on a PC will also be a crucial feature of Microsoft's long-delayed version of its Windows operating system called Longhorn. That version, which is not expected before 2006 at the earliest, will have a redesigned file system, making it possible to track and retrieve information in ways not currently possible with Windows software.
Google's move is in part a defensive one, because the company is concerned about Microsoft's ability to make searching on the Web as well as on a PC a central part of its operating system. By integrating more search functions into Windows, Microsoft could conceivably challenge Google the way it threatened, and destroyed, an earlier rival, Netscape, by incorporating Web browsing into the Windows 98 operating system.
A Google spokesman declined to comment about the new search tool.
Although Google's core business rests on huge farms of server computers that permit fast searching on the Internet, the company has already taken several steps to move beyond that business.
Last year, Google began testing a free program called the Google Deskbar that makes it possible to search the Web by entering words and phrases in a small dialog box placed in the Windows desktop taskbar at the bottom of the computer screen.
Google also sells a computer search system designed to index and retrieve information created and stored by a single organization.
There is a rich history of less-than-successful attempts to create information search tools for personal computers. In the 1980's, for example, Mitchell Kapor's On Technology developed On Location for retrieving information on Macintosh computers and Bill Gross, a prominent software developer, led a group of programmers to create Lotus Magellan for the PC.
Digital Equipment's Alta Vista search engine group also developed a search tool for data stored on desktop PC's. Today there are a number of commercial products for desktop searches like X1 and dtSearch. Moreover, both the Macintosh and Windows operating systems have file and text retrieval capabilities.
The Google software project, which is code-named Puffin and which will be available as a free download from Google's Web site, has been running internally at the company for about a year.
The project was started, in part, to prepare Google for competing with Windows Longhorn, which according to industry analysts will dispense with the need for a stand-alone browser.
The disappearance of the Web browser and the integration of both Web search and PC search into the Windows operating system could potentially marginalize Google's search engine. Google, well aware of this threat, hired a Microsoft product manager last year to oversee the Puffin project as part of its strategy to compete with Microsoft's incursion into its territory.
Microsoft has shown demonstrations of its new search technology, which emphasizes the use of natural language in queries like "Where are my vacation photos?" or "What is a firewall?" Microsoft believes that Longhorn users will no longer think about where information is stored; they will ins
Actually yes (Score:5, Informative)
If you have followed Microsoft developments around Longhorn you might have noticed that search is one of the top priority features that microsoft is going to integrate directly into the operating system. So once Longhorn is released Microsoft would become the biggest competitor to Google's search applications on the web as well the desktop(with this application)
Search is the next big thing on which a lot of players are concentrating and Microsoft entering the field has skewed the competition towards the desktop and everyone including Google is preparing for the battle.
sorry here (Score:2, Informative)
Re:Windows + F = useless (Score:5, Informative)
Or so I'm told. My personal experiences with allowing the Windows Indexing service to run in the background have been that it's more trouble than its worth. Yes, on the rare occasion that it's actually -not- indexing when I search, the search is blazingly fast (compared to a non-indexed search).
But if the index is currently being modified, then the Windows search feature can't use it. Period. So when you search, you get the text "Windows is currently building an index of the files on drive C:" and it falls back to the regular, non-indexed search. In addition, the indexer consumes massive amounts of RAM while indexing, so a search run when the index is being modified ends up being about two times slower than usual.
It also doesn't seem to be able to tell when the user is idle. No amount of tweaking seems to fix this, without leaving you with a days-old index. If the index is complete, but you've saved a file since it was completed, that file will not show up in the search at all. I've had it kick on while in the middle of working on something else so often that I finally just turned it off entirely and have resigned myself to slow(er) searches in Windows.
In the interest of fairness I will say that the search seems to work quite well when searching a remote server that is running the indexing service. But running it locally is just a pain.
Similar ideas (Score:5, Informative)
Second, an indexing software that does the same thing is already available today and worked very well when I tried it out. It's actually almost perfect, except for the fact that it causes occasional hard drive thrashing as it tries to keep the index up-to-date. This is unfortunately a rather major downside, but if you can bear with this, you'll get literally instant file searches on your entire hard drive -- it narrows down the possible matches as you type each letter. It even indexes file contents for small files. I'm talking about X1 [x1.com].
Re:Will we see something like this on linux? (Score:3, Informative)
Re:Will we see something like this on linux? (Score:3, Informative)
Re:interesting (Score:1, Informative)
See this article [com.com]
wingrep (Score:5, Informative)
Re:site:localhost search (Score:3, Informative)
"My Documents"... [google.com]
(Not really mine)
Re:What operating systems does it work on? (Score:3, Informative)
"locate" does, but the index is never up to date.
Re:What operating systems does it work on? (Score:5, Informative)
See Lucene for a good open source inverted text index search engine.
Re:Coming from the company... (Score:3, Informative)
Re:Coming from the company... (Score:4, Informative)
I use Enfish find (Score:4, Informative)
I purchased Find from <a href="http://www.enfish.com">Enfish</a> and it saves me several minutes everyday. They have fancier products, but $50 for the Find application is all that I needed.
Re:wingrep (Score:1, Informative)
enjoy
Jakarta Lucene (Score:2, Informative)
This sounds like a great place for Jakarta Lucene [apache.org].
Lucene is Java and Open Source, so an app written to search a workstation should be able to run on any OS with a Java VM, and you can be sure it's not reporting any personal information to anyone.
I'd love to see it on my task bar. And, heck, it could probably be ready before Puffin
Re:I'd use it (Score:2, Informative)
Re:I think most of us already know... (Score:4, Informative)
Not to mention that if he ever gets raided I am *sure* there has to be at least a few child pr0n photos in there (even accidentally).
I decided long ago that keeping around lots of pr0n is just a bad idea. Binge and purge! That's my new motto!
Re:What operating systems does it work on? (Score:1, Informative)
This is a popular misunderstanding. Keyboard shortcuts work with Safari. Try the ctrl key instead of alt.
Re:I can't frickin' wait (Score:4, Informative)
Ultimately, yes, but there's searching and then there's searching. For example, searching a hashed index is much faster than just searching through files in a filesystem. You could generate an index of data and metadata for all files on the system and incrementally update it during idle times, for example, or do certain kinds of updates on an as-needed basis.
GNOME used to have something like this, called Medusa. I think it was dropped because the existing implementation had performance problems (and possibly security issues?). However, it seems to be under redevelopment [cox.net], and it looks like it will be quite useful when it gets a bit further along.
Re:About time (Score:2, Informative)
I've had it running now for a while, and I can't say how much better it feels to have a local, powerful search engine at my beck and call, personally
Plus, it solved the 'endless bookmark menu' problem too, since instead of bookmarking, I get the site spidered by SWISH++, and all my future searches give me what I need
Alta Vista used to have this. (Score:2, Informative)
I loved this product, and I'm pleased to see that Google's going to try a similar product. With 200+GB hard drives commonplace, this can be very useful.
Re:Coming from the company... (Score:5, Informative)
I wish a could beat the creator of google-watch.org and every person who ever linked to it with a gigantic clue stick.
First of all, the creator of google-watch.org has a really big axe to grind [google-watch-watch.org] with Google.
Second, HTTP is a stateless protocol. If you want a user's preferences to to persist within a session you need to use cookies or attach a lot of state information to each GET/POST request. If you want the preferences to persist after you close and re-open your browser you have to have the user log in every time and store the prefs on the server or store the prefs on the client side in a cookie like Google does. This simple fact seems to fly right over the head of google-watch.org and their ridiculous cookie conspiracy theories.
But hey, we've been over this in every Google story since the anti-Google FUD crowd started coming out of the woodwork. Here's a thought: if you really need a tinfoil hat then disable cookies, don't use Orkut and sleep better at night. But please stop subjecting people to google-watch.org FUD.
Re:Microsoft will Lose (Score:5, Informative)
Create HKEY_CURRENT_USER\Software\Microsoft\Windows\Curr
You'll have the old windows 2000 search dialogue.
the next step for this is iis integration/ sdk (Score:3, Informative)
i had a client who chose an implementation of index server i set up to do searches on his public website, but i have doubts about my solution's resource use
i replaced a guy who wanted to make a complicated mysql/ spidering solution, simply because my solution, apart from the aesthetics of the search page, was largely quick and easy, and it was fairly trivial to demo to the client a rudimentary solution for him using microsoft's index serverwhile the other guy was still in the starting gate
what would be interesting is if google builds an sdk into their local file system search that is more robust than microsoft's index service, and if maybe it can somehow "talk" to google on the web, really leveraging their intarweb leadership position to enhance any possible iis-linked implementation of this new product
Existing Google search appliance... (Score:4, Informative)
It's sweet. Some features include...
Find the highest quality and most relevant documents; Google factors in more than 100 variables for each query.
Search for secure information and view only those documents to which you have access; results are returned securely for documents protected by either NTLM or basic HTTP authentication.
Judge relevance of results more easily via dynamically generated snippets showing your query in the context of the page.
Navigate search results easily and clearly using intelligent grouping of documents residing in the same narrow subdirectories.
Avoid missing results through typos or misspellings as Google automatically suggests corrections with startling accuracy, even on company-specific words and phrases.
View search results even when the sites are down via cached copies of pages included in the search results.
Quickly find the most relevant section of a document via highlighted query terms displayed on cached documents.
Glimpse documents without needing the original client application of the file format via automatic reformatting of over 220 file types into HTML.
Access time-sensitive information first via date sorting.
Perform complex and sophisticated queries with over 10 special query terms, including Boolean AND, OR, and NOT searches.
More details are available at the appliance page on Google.
#2 above probably won't show up in the personal desktop version of the search, thouhg it is really is handy for the appliance -- even if you manage a modest sized office.