Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Operating Systems Software Windows

Google Experiments With Local Filesystem Search 482

Teoti writes "No, Puffin is not the next name of your favorite email client, but, according to the New York Times (NSA reg. req.), the project codename for a new Google search application coming directly into your desktop, that will let you search your local filesystem efficiently. This is different from, but complementary of, the Google DeskBar that already lets you search the Web. The article also gives a few words on the end of the stand alone browser in Longhorn."
This discussion has been archived. No new comments can be posted.

Google Experiments With Local Filesystem Search

Comments Filter:
  • by Mz6 ( 741941 ) * on Wednesday May 19, 2004 @02:11PM (#9196979) Journal
  • by prostoalex ( 308614 ) * on Wednesday May 19, 2004 @02:13PM (#9197015) Homepage Journal
    NYT claims the Google PC search competes with Microsoft's. Although Microsoft has never been particularly strong in the area with either Search window in 2000 or that doggie in XP. For me in 1 cases out of 10 the text search (inside the documents, search for specific text) just do not work. There are other vendors that Google will be competing against, not necessarily Microsoft.

    X1 [x1.com] seems to be the most popular one out there.

    DiskMeta [diskmeta.com], they had this project in beta for a while, the Windows product went into relese just last week, the site says

    DT Search [dtsearch.com], I remember their ads in bunch of computer magazines, although have never used them myself.

    EFS [com.com], found it on download.com, supports MS Office and PDF as well as other formats.

  • NYT Article (Score:5, Informative)

    by OverlordQ ( 264228 ) * on Wednesday May 19, 2004 @02:14PM (#9197033) Journal
    No-Reg Link [nytimes.com]
  • by (54)T-Dub ( 642521 ) * <[tpaine] [at] [gmail.com]> on Wednesday May 19, 2004 @02:16PM (#9197045) Journal
    The Reuters version [reuters.com] you linked is shorter than the NYtimes one. Here is the full version:

    SAN FRANCISCO, May 18 - Edging closer to a direct confrontation with Microsoft [slashdot.org], Google, the Web search engine, is preparing to introduce a powerful file and text software search tool for locating information stored on personal computers.

    Google's software, which is expected to be introduced soon, according to several people with knowledge of the company's plans, is the clearest indication to date that the company, based in Mountain View, Calif., hopes to extend its search business to compete directly with Microsoft's control of desktop computing.

    Improved technology for searching information stored on a PC will also be a crucial feature of Microsoft's long-delayed version of its Windows operating system called Longhorn. That version, which is not expected before 2006 at the earliest, will have a redesigned file system, making it possible to track and retrieve information in ways not currently possible with Windows software.

    Google's move is in part a defensive one, because the company is concerned about Microsoft's ability to make searching on the Web as well as on a PC a central part of its operating system. By integrating more search functions into Windows, Microsoft could conceivably challenge Google the way it threatened, and destroyed, an earlier rival, Netscape, by incorporating Web browsing into the Windows 98 operating system.

    A Google spokesman declined to comment about the new search tool.

    Although Google's core business rests on huge farms of server computers that permit fast searching on the Internet, the company has already taken several steps to move beyond that business.

    Last year, Google began testing a free program called the Google Deskbar that makes it possible to search the Web by entering words and phrases in a small dialog box placed in the Windows desktop taskbar at the bottom of the computer screen.

    Google also sells a computer search system designed to index and retrieve information created and stored by a single organization.

    There is a rich history of less-than-successful attempts to create information search tools for personal computers. In the 1980's, for example, Mitchell Kapor's On Technology developed On Location for retrieving information on Macintosh computers and Bill Gross, a prominent software developer, led a group of programmers to create Lotus Magellan for the PC.

    Digital Equipment's Alta Vista search engine group also developed a search tool for data stored on desktop PC's. Today there are a number of commercial products for desktop searches like X1 and dtSearch. Moreover, both the Macintosh and Windows operating systems have file and text retrieval capabilities.

    The Google software project, which is code-named Puffin and which will be available as a free download from Google's Web site, has been running internally at the company for about a year.

    The project was started, in part, to prepare Google for competing with Windows Longhorn, which according to industry analysts will dispense with the need for a stand-alone browser.

    The disappearance of the Web browser and the integration of both Web search and PC search into the Windows operating system could potentially marginalize Google's search engine. Google, well aware of this threat, hired a Microsoft product manager last year to oversee the Puffin project as part of its strategy to compete with Microsoft's incursion into its territory.

    Microsoft has shown demonstrations of its new search technology, which emphasizes the use of natural language in queries like "Where are my vacation photos?" or "What is a firewall?" Microsoft believes that Longhorn users will no longer think about where information is stored; they will ins

  • Actually yes (Score:5, Informative)

    by Pranjal ( 624521 ) on Wednesday May 19, 2004 @02:18PM (#9197073)

    If you have followed Microsoft developments around Longhorn you might have noticed that search is one of the top priority features that microsoft is going to integrate directly into the operating system. So once Longhorn is released Microsoft would become the biggest competitor to Google's search applications on the web as well the desktop(with this application)

    Search is the next big thing on which a lot of players are concentrating and Microsoft entering the field has skewed the competition towards the desktop and everyone including Google is preparing for the battle.
  • sorry here (Score:2, Informative)

    by Prince Vegeta SSJ4 ( 718736 ) on Wednesday May 19, 2004 @02:19PM (#9197095)
    HERE [nytimes.com]
  • by Verteiron ( 224042 ) * on Wednesday May 19, 2004 @02:22PM (#9197124) Homepage
    It works a lot better when you enable indexing.

    Or so I'm told. My personal experiences with allowing the Windows Indexing service to run in the background have been that it's more trouble than its worth. Yes, on the rare occasion that it's actually -not- indexing when I search, the search is blazingly fast (compared to a non-indexed search).

    But if the index is currently being modified, then the Windows search feature can't use it. Period. So when you search, you get the text "Windows is currently building an index of the files on drive C:" and it falls back to the regular, non-indexed search. In addition, the indexer consumes massive amounts of RAM while indexing, so a search run when the index is being modified ends up being about two times slower than usual.

    It also doesn't seem to be able to tell when the user is idle. No amount of tweaking seems to fix this, without leaving you with a days-old index. If the index is complete, but you've saved a file since it was completed, that file will not show up in the search at all. I've had it kick on while in the middle of working on something else so often that I finally just turned it off entirely and have resigned myself to slow(er) searches in Windows.

    In the interest of fairness I will say that the search seems to work quite well when searching a remote server that is running the indexing service. But running it locally is just a pain.
  • Similar ideas (Score:5, Informative)

    by Jugalator ( 259273 ) on Wednesday May 19, 2004 @02:23PM (#9197135) Journal
    Well, first this idea is part of Microsoft's WinFS plans. The idea with WinFS was partially born when Microsoft developers realized that major parts of the web can be searched faster than a user's hard drive. It will be interesting to see how this application will collide with Microsoft's plans, that's for sure. It's basically fast searches and enhanced metadata support that are the key parts of WinFS, which is in turn a key part of Longhorn.

    Second, an indexing software that does the same thing is already available today and worked very well when I tried it out. It's actually almost perfect, except for the fact that it causes occasional hard drive thrashing as it tries to keep the index up-to-date. This is unfortunately a rather major downside, but if you can bear with this, you'll get literally instant file searches on your entire hard drive -- it narrows down the possible matches as you type each letter. It even indexes file contents for small files. I'm talking about X1 [x1.com].
  • by xutopia ( 469129 ) on Wednesday May 19, 2004 @02:23PM (#9197139) Homepage
    all those utilities take a long time when searching on a 200G partition. I'd love to have something blazingly fast. Is that too much to ask for?
  • by ConsumedByTV ( 243497 ) on Wednesday May 19, 2004 @02:26PM (#9197174) Homepage
    Locate takes a while to build it's database, but after that locate is very quick.
  • Re:interesting (Score:1, Informative)

    by Anonymous Coward on Wednesday May 19, 2004 @02:26PM (#9197176)
    This is one of the things that makes Google great: they allow (expect?) their employees to spend 20% of their time working on projects that are unrelated to their main job. Basically, this 20% just needs to be focused on stuff that can benefit Google.

    See this article [com.com]
  • wingrep (Score:5, Informative)

    by (54)T-Dub ( 642521 ) * <[tpaine] [at] [gmail.com]> on Wednesday May 19, 2004 @02:27PM (#9197179) Journal
    As a developer trapped in windows I find this little tool [wingrep.com] incredibly usefull.
  • by Theaetetus ( 590071 ) <theaetetus,slashdot&gmail,com> on Wednesday May 19, 2004 @02:29PM (#9197211) Homepage Journal
    Here's another fun one...
    "My Documents"... [google.com]

    (Not really mine)

  • Grep and find don't pre-index the files.

    "locate" does, but the index is never up to date. :-/

  • by rcpettengill ( 200363 ) on Wednesday May 19, 2004 @02:42PM (#9197310) Homepage
    find and grep are oders of magnitude slower than the inverted text index techniques that Google uses.

    See Lucene for a good open source inverted text index search engine.
  • by Ummagumma ( 137757 ) on Wednesday May 19, 2004 @02:50PM (#9197385) Journal
    Have you tried ZoneAlarm? It has this basic functionality.
  • by Stigmata669 ( 517894 ) on Wednesday May 19, 2004 @02:53PM (#9197394)
    If you are worried about your privacy, don't accept these cookies, or regularly clean out your cookies. Maybe Google is being invasive but that doesn't keep you from looking out for yourself.
  • I use Enfish find (Score:4, Informative)

    by Therlin ( 126989 ) on Wednesday May 19, 2004 @02:55PM (#9197416)
    I have hundreds of word documents, PDF files, text files, e-mails in two different systems, etc.

    I purchased Find from <a href="http://www.enfish.com">Enfish</a> and it saves me several minutes everyday. They have fancier products, but $50 for the Find application is all that I needed.
  • Re:wingrep (Score:1, Informative)

    by Anonymous Coward on Wednesday May 19, 2004 @03:02PM (#9197477)
  • Jakarta Lucene (Score:2, Informative)

    by JLavezzo ( 161308 ) on Wednesday May 19, 2004 @03:03PM (#9197483) Homepage

    This sounds like a great place for Jakarta Lucene [apache.org].

    Lucene is Java and Open Source, so an app written to search a workstation should be able to run on any OS with a Java VM, and you can be sure it's not reporting any personal information to anyone.

    I'd love to see it on my task bar. And, heck, it could probably be ready before Puffin

  • Re:I'd use it (Score:2, Informative)

    by dzd12 ( 736551 ) on Wednesday May 19, 2004 @03:11PM (#9197548)
    I was under the impression that recent versions of Windows had fairly good fine grained access controls. Sure, windows 98 doesn't offer a whole lot in terms of security, but 2000 and XP aren't so bad. So I guess I'd have to disagree... Windows does have such a (working) thing. Why do you say it doesn't?
  • by jkabbe ( 631234 ) on Wednesday May 19, 2004 @03:18PM (#9197595)
    BTW, I'm not sure why you'd want to collect that much porn. I know for a fact that a lot of it he's never seen before, and what I've seen of it suffer's from porn's usual problem, a lot of repetitiveness

    Not to mention that if he ever gets raided I am *sure* there has to be at least a few child pr0n photos in there (even accidentally).

    I decided long ago that keeping around lots of pr0n is just a bad idea. Binge and purge! That's my new motto!
  • by Anonymous Coward on Wednesday May 19, 2004 @03:28PM (#9197669)
    > It also seems to work with Safari (minus the keyboard shortcuts)

    This is a popular misunderstanding. Keyboard shortcuts work with Safari. Try the ctrl key instead of alt.
  • by YellowBook ( 58311 ) on Wednesday May 19, 2004 @03:37PM (#9197747) Homepage
    Wouldn't the speed of the search be influenced mostly be the capabilities of your own computer?

    Ultimately, yes, but there's searching and then there's searching. For example, searching a hashed index is much faster than just searching through files in a filesystem. You could generate an index of data and metadata for all files on the system and incrementally update it during idle times, for example, or do certain kinds of updates on an as-needed basis.

    GNOME used to have something like this, called Medusa. I think it was dropped because the existing implementation had performance problems (and possibly security issues?). However, it seems to be under redevelopment [cox.net], and it looks like it will be quite useful when it gets a bit further along.

  • Re:About time (Score:2, Informative)

    by torpor ( 458 ) <ibisum.gmail@com> on Wednesday May 19, 2004 @03:47PM (#9197822) Homepage Journal
    I dunno, I think SWISH++ does a pretty good job ... [mac.com]

    I've had it running now for a while, and I can't say how much better it feels to have a local, powerful search engine at my beck and call, personally ...

    Plus, it solved the 'endless bookmark menu' problem too, since instead of bookmarking, I get the site spidered by SWISH++, and all my future searches give me what I need ... sweet!
  • by callipygian-showsyst ( 631222 ) on Wednesday May 19, 2004 @03:48PM (#9197835) Homepage
    Years ago, Alta Vista [altavista.com] has a product that they sold called the "Alta Vista Personal Search Engine". I have the installation CD right here.

    I loved this product, and I'm pleased to see that Google's going to try a similar product. With 200+GB hard drives commonplace, this can be very useful.

  • by irix ( 22687 ) on Wednesday May 19, 2004 @03:49PM (#9197845) Journal

    I wish a could beat the creator of google-watch.org and every person who ever linked to it with a gigantic clue stick.

    First of all, the creator of google-watch.org has a really big axe to grind [google-watch-watch.org] with Google.

    Second, HTTP is a stateless protocol. If you want a user's preferences to to persist within a session you need to use cookies or attach a lot of state information to each GET/POST request. If you want the preferences to persist after you close and re-open your browser you have to have the user log in every time and store the prefs on the server or store the prefs on the client side in a cookie like Google does. This simple fact seems to fly right over the head of google-watch.org and their ridiculous cookie conspiracy theories.

    But hey, we've been over this in every Google story since the anti-Google FUD crowd started coming out of the woodwork. Here's a thought: if you really need a tinfoil hat then disable cookies, don't use Orkut and sleep better at night. But please stop subjecting people to google-watch.org FUD.

  • by mathd ( 656476 ) on Wednesday May 19, 2004 @03:59PM (#9197947)
    3. Google is clean. If I see that damn dog show up one more time I'll kill myself. When I search my file system I don't want to hide the stupid mutt, change my options so that subfolders are searched, then click through three screens to say I want to search my file system. Google will cut through this nonsense because they believe in simple/clean interfaces.
    The dog problem is easy to fix.
    Create HKEY_CURRENT_USER\Software\Microsoft\Windows\Curre ntVersion\Explorer\CabinetState\Use Search Asst as a new String Value and use the value "no".

    You'll have the old windows 2000 search dialogue.
  • microsoft's index server (a service on most installations of win2000/ winxp) does what this google product purports to do, but has a limited and clunky sdk, and i've found it to crap out and delay indexing new pages too much if i try to throttle it's resource use

    i had a client who chose an implementation of index server i set up to do searches on his public website, but i have doubts about my solution's resource use

    i replaced a guy who wanted to make a complicated mysql/ spidering solution, simply because my solution, apart from the aesthetics of the search page, was largely quick and easy, and it was fairly trivial to demo to the client a rudimentary solution for him using microsoft's index serverwhile the other guy was still in the starting gate

    what would be interesting is if google builds an sdk into their local file system search that is more robust than microsoft's index service, and if maybe it can somehow "talk" to google on the web, really leveraging their intarweb leadership position to enhance any possible iis-linked implementation of this new product
  • by Spoing ( 152917 ) on Wednesday May 19, 2004 @04:43PM (#9198475) Homepage
    This rack mounted search engine [google.com] is probably what the desktop search will be based on.

    It's sweet. Some features include...

    1. Google Quality and Ranking
      1. Find the highest quality and most relevant documents; Google factors in more than 100 variables for each query.

    2. Secure Search
      1. Search for secure information and view only those documents to which you have access; results are returned securely for documents protected by either NTLM or basic HTTP authentication.

    3. Dynamic Page Summaries
      1. Judge relevance of results more easily via dynamically generated snippets showing your query in the context of the page.

    4. Results Grouping
      1. Navigate search results easily and clearly using intelligent grouping of documents residing in the same narrow subdirectories.

    5. Automatic Spellcheck
      1. Avoid missing results through typos or misspellings as Google automatically suggests corrections with startling accuracy, even on company-specific words and phrases.

    6. Cached Pages
      1. View search results even when the sites are down via cached copies of pages included in the search results.

    7. Highlighted Query Terms
      1. Quickly find the most relevant section of a document via highlighted query terms displayed on cached documents.

    8. View as HTML
      1. Glimpse documents without needing the original client application of the file format via automatic reformatting of over 220 file types into HTML.

    9. Sort by Date
      1. Access time-sensitive information first via date sorting.

    10. Advanced Boolean Search
      1. Perform complex and sophisticated queries with over 10 special query terms, including Boolean AND, OR, and NOT searches.

      More details are available at the appliance page on Google.

      #2 above probably won't show up in the personal desktop version of the search, thouhg it is really is handy for the appliance -- even if you manage a modest sized office.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...