Slashdot Log In
Google Revises Usenet Search
Posted by
michael
on Thu Dec 02, 2004 10:09 AM
from the what's-usenet? dept.
from the what's-usenet? dept.
michaelmalak writes "Wednesday night, Google Groups announced in a thread the rollout of their revised 20-year Usenet archive search engine. Among the various 'improvements': ability to search by date has been eliminated, as has the ability to deep link to a single post. See the announcement thread for others' reaction." An anonymous reader writes "ZDNet has published some interesting insights into what makes Google tick. In this lengthy article, Google's vice-president of engineering, Urs Hölzle delves into the nuts and bolts behind Google's operations, what back-up mechanisms and hardware setup is in place and even some interesting homegrown technology like the Google File System (GFS)."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Progress? (Score:5, Funny)
Well damn - I hope they don't "improve" it too much more.
Re:Progress? (Score:3, Insightful)
Re:Progress? (Score:5, Interesting)
This is the feature request/bug reporting form.
They claim to read every mail generated by this link.
I just submitted a question about this.
I wonder what they'd do if the full power of the
Parent
Re:Progress? (Score:5, Interesting)
Of course, their HTML still doesn't validate [w3.org]...
Parent
Re:Progress? (Score:5, Interesting)
Parent
Re:Progress? (Score:5, Interesting)
Parent
Re:Progress? (Score:4, Insightful)
Er, well, not quite; Microsoft gets hammered much more for "embracing and extending" standards and then preventing other implementations from using those "extensions" thereby forcing everyone who wants to be compatible with Microsoft to use Microsoft products. Google not including the doctype , on the other hand, is fairly innocuous, its not like IE or Firefox have issues with it.
Parent
Re:Progress? (Score:3, Funny)
Pray we don't alter it any further.
Re:Progress? (Score:5, Informative)
Alright people, you can stop overreacting. They just rearranged some things, that's all.
There's a link at the top of the thread to turn on the left-hand tree frame.
Deep-linking to a single post [google.com] is still very much possible.
And I highly doubt that a search-by-date feature is going to go missing for long in a 20-year archive. This is, after all, a BETA.
As per usual, Slashdot editors didn't even think it worth their time to follow a single link to see if the submitter wasn't trolling.
Parent
Re:Progress? (Score:4, Funny)
And from torrents I get the added benefit of not only downloading the file, but uploading to everyone else and broadcasting my IP address all over the place.
Parent
Re:RTFM (Score:5, Informative)
Parent
Google changed within the past three hours (Score:5, Interesting)
Parent
Re:RTFM (Score:4, Funny)
Parent
Re:Evil? Re:Progress? (Score:5, Insightful)
Search the web, newsgroups, your desktop etc. It may be all free and good now, but how long before someone pays the right price to access/control what people see.
My experience is that Google search seems to be turning up more noise now than before. Two years ago I could with certainty do a search and get the page I wanted. Now it seems I must scroll through pages of commercial sites and the such to get to the meaty part of the Internet...those little novelty sites that people put up themselves.
Oh well, that's progress.
Parent
Re:Evil? Re:Progress? (Score:5, Insightful)
I remember when they originally took over the archive from deja. I was devestated - convinced they were going to totally screw it up. They didn't, or I got used to the screwed up version.
Also, regarding noise appearing in searches, this is a standard cycle that all search engines go through and Google's experiences are well documented. They are constantly changing their search engine to give the most relevant results. Gradually commercial sites that depend on high search results spend enough time and money optimizing their site. Google is constantly changing their tech to push that noise down, but it always gradually floats back to the top. It's in Google's best interest to show commercial sites in their paid ads, not in the valid search results.
Parent
Re:Evil? Re:Progress? (Score:5, Interesting)
Parent
hmmmm (Score:5, Funny)
Next up: Grammar and Content
500 error? (Score:4, Funny)
(Gathers canned goods, candles, heads for cave)
Dumb (Score:5, Insightful)
Re:Dumb (Score:4, Informative)
I haven't the slightest idea where the original poster got their information.
Parent
Improvements??? (Score:5, Interesting)
Jee, nice "improvements"... I personally have linked to individual posts on a web page summarizing a lawsuit I was involved in that was directly related to posts in a newsgroup. I know others who have linked to posts in similar situations. I just checked my web page and the links to those posts no longer work.
Google just took a HUGE step backwards in my opinion.
OMG.. it's truly awful. (Score:4, Informative)
Luckily the rot hasn't spread to the national Googles yet, so you can still use Google UK [google.co.uk] if you need it.. at least until they ruin that too.
Re:OMG.. it's truly awful. (Score:5, Informative)
I'm believe that the "new groups" are not new usenet groups, but merely a yahoo-groups clone on the side, which gets he same interface as the one they provide for usenet groups.
The old groups interface rocked. This is a major step in the wrong direction in my book.
Parent
HW summary overview (Score:5, Informative)
- Over four billion Web pages, each an average of 10KB, all fully indexed.
- Up to 2,000 PCs in a cluster.
- Over 30 clusters.
- One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
- Sustained transfer rates of 2Gbps in a cluster.
- An expectation that two machines will fail every day in each of the larger clusters.
- No complete system failure since February 2000.
Now, 2,000 machines in a cluster, plus 1PB data, plus 2Gbps in a cluster times 30 clusters comes to:
- "Over" 60,000 PCs (!)
- "Over" 30PB data storage
- "Over" 60Gbps bandwidth
Also interesting:
- An expectation that two machines will fail every day in each of the larger clusters.
- No complete system failure since February 2000.
Hey Google: you're being evil... (Score:5, Interesting)
Also, it creeped me out to no end discovering this morning that my Gmail cookie is really a "Google Accounts" cookie which will now be attached to my Usenet forays via Google as well. I personally don't want the line between public and private conversations to be muddied like that, and I definitely don't want a unified cookie straddling both domains.
Finally, the interface leaves a lot to be desired. The layout is cluttered and junky now whereas it was clean and simple before. I'm not enthralled by the Javascript hooks. Threading seems to be worse than ever (and still not done by message-ID or References - when I asked Google why this was via email, the response was "too difficult"... *boggle*) and the CLI-esque search ability is degenerating into a GUI mess; where one line of text and a CR would before get you to the page you wanted, it now can take that plus several additional mouse gestures and clicks.
This is a sad day, to see a useful tool become so f**ked up for no apparent good reason. I can only hope and pray for a reversion.
Google's improvements (Score:3, Interesting)
I know this is a liiiittle bit offtopic, but here's a story about how the little guy (or little country) can still reach a huge company like Google and get them to change something.
> Date: Mon, 29 Nov 2004 13:04:02 +0100
> Hi,
>
> I wanted to post a question to Google Answers,
> but my VISA credit card was not accepted,
> because its expiry date is 09/12 and you only
> allow up to 2009, not 2012.
>
> How do I solve this problem? I live in Denmark.
> I use the same card to shop on the internet all
> the time.
>
> Kind regards,
Hello Jakob,
Unfortunately, because the expiration date is not listed on our billing page, we must ask that you use a different credit card.
Sincerely,
The Google Answers Team
> Date: Tue, 30 Nov 2004 12:00:27 +0100
>
> Dear Google Answers Team,
>
> That is the only credit card I have. This is
> very unfortunate, but since others have solved
> the problem, I'm sure that so could you?
>
> Regards, Jakob
Hello Jakob,
Thank you for your reply. We will extend our expiration date options. The
billing page should update in 24-48 hours.
Sincerely,
The Google Answers Team
So still: HURRAY FOR GOOGLE!!!
First real deviation (Score:3, Insightful)
Work around for filtering search by date (Score:3, Informative)
http://groups.google.com/advanced_group_search?hl
Re:Work around for filtering search by date (Score:3, Informative)
Total catastrophe, a complete and utter misstep (Score:4, Insightful)
Re:Total catastrophe, a complete and utter misstep (Score:4, Funny)
Parent
No Escape! (Score:4, Insightful)
ARRRRRRRRGH (Score:5, Insightful)
I have bookmarks to specific articles/threads it took me a long time to find and to which I refer now and then and if they stop working the usefulness of google groups for me will be much reduced...
As much as I understand why they would want to make USENET look more like a message board for people who never really grew up with it (usenet and gopher were mostly all we had back when I first went online) I still think that not having this functionality available for people who know how to make the most of it is very backward thinking.
Direct Linking is still possible... (Score:5, Informative)
Navigate to the thread, for example this [google.com] comp.arch thread. Choose the post you want to link to, and click on "Show Options". Two of the options are "print", which is a link to a "printable" version of the article, and "Show original", which is a link to the article with all the headers.
One more step (or simple URL hack) from this display is "view parsed" which gives a friendly HTML version -- for example, try this link [google.com].
What Google Hardware Actually Looks Like (Score:5, Informative)
Anyway, as we were walking around the 150,000+ square foot datacenter floor, when a guy came by, pushing a very odd looking rack.
It resembled a bread tray, 20 shelves if I counted correctly, with completely naked main boards sitting on them. It looked to be 4 machines per row (counting the power supplys). Each had one IDE disk sitting on a gel pad, strapped in with velcro. I personally watched them wheel 4 of these racks right by me back into the dark "Google" corner of the datacenter. Our tour guide finally gave in.
Him: "Well, you've seen them now!"
Me: "What do you mean?"
Him: "Thats google!"
Definitely the highlight of my day!
Re:What Google Hardware Actually Looks Like (Score:4, Informative)
Once, said Hölzle, "someone disconnected an 80-machine rack from a GFS cluster, and the computation slowed down as the system began to re-replicate and we lost some bandwidth, but it continued to work. This is really important if you have 2,000 machines in a cluster." If you have 2000 machines then you can expect to see two failures a day.
Looks like my numbers were correct. 20 shelves * 4 machines per shelf = 80 machines per rack.
Parent
Google File System (Score:5, Interesting)
Goodbye Google? (Score:5, Interesting)
There were other issues as well as the rapacious spidering (which reminded me of some of the worst spambots [neilgunton.com] out there), but I won't go into the details here. I didn't get any satisfactory resolution from Google when I tried contacting them.
Website suicide? I don't know. All I do know is that Google seems to be fulfilling my biggest fears - they are going downhill as they get bigger. Funny how the bigger a company gets, the more it tends to suck. Also, having an IPO is never a good thing, in my experience - it always leads to short-termism and corporate decisions based more on the bottom line than what's actually good for the users. Sure, any company has to look after its shareholders and investors, but they never seem to really grok that being so focused on the short-term negatively impacts things in the longer term, particularly if it loses you goodwill in the userspace. Also, as a company grows you do tend to get the sort of braindead, clueless decisions coming out that we apparently see here.
So now we have Google restricting what we can do with old Usenet posts... didn't they buy up all the archives for this stuff a while back? This would appear to give them some amount of power, but also (they should realize) responsibility as stewards of the past. This is not something that they are simply indexing on someone else's website, it's data that they actually own. But in this case it's not really their data at all - it's the community's.
Google seems to be slowly using up the goodwill they built up since 1998 when they came onto the scene, a small, fast, simple, charming and relevant search engine that kicked ass. Why can't a company just keep doing what it does well, and be satisfied with that? Why does everything have to eventually grow, expand, gobble up other companies, and then inevitably start to suck?
Never mind... for now, Goodbye Google.
First use of "spam" on USENET, found via Google (Score:5, Interesting)
http://groups.google.com/groups?q=ken+weaverling+s pam+usenet+first&hl=en&selm=9v6d5h%245pg%241%40new s.dtcc.edu&rnum=1 [google.com]
According to Ken and his search of google, I was the first people to ever use the word "spam" to refer to unwanted electronic communication. Obviously, I did'nt know it at the time and was quite surprised to learn of my "fame." Yeah, that and $7 will get me a cup of mocha-something, I know.
Anyhow, the whole point is that Ken's reserach was aided by the search by date feature. It will be a shame if that is removed.
(And for the curious, I changed my name from Czarnecki when I got married.)
Don't like how the Google Usenet archive evolves? (Score:5, Interesting)
I already have an archive of around 600 million messages (nearly everything sans binaries from 2000 till today; just a couple of terabytes) and intend to create a public Usenet search engine. As I am using Usenet myself on a daily basis, I know what *I* want in a Usenet search engine, and that's quite different from what Google gives us.
Here's how you can help: Contact me at martin-k (at) softmaker.de if you have a private collection of Usenet postings that you want me to put in the database.
-mk
Re:A little respect (Score:5, Insightful)
We're the users. That's our right as users. If nobody questions the decision to remove features, then how does Google know what features we liked?
There's absolutely nothing wrong with constructive criticism, even with respect to a "free" service.
Parent
Respect is earned (Score:5, Insightful)
Excuse me, but their Google Groups feature is based entirely on profiting from others' work (and copyrighted work at that). If you're providing a properly searchable index, you might (might) have a public interest defence to the copyright infringement. If you're providing a useful service, most people might (might) not mind you using their work. But if you're going to take away useful searching facilities and provide a service that doesn't even allow proper citation (i.e., deep-linking to a specific post), you're going to be both unpopular and almost certainly breaking the law. I don't know about you, but personally I don't have much respect for people who are either of those things.
Parent
Re:A little respect (Score:5, Insightful)
I'm not sure what motivated such changes, but usually you don't remove enhancesments to software unless they are causing major problems or if they somehow affect your financial bottom line. Somehow I think its related to the latter of the two because I don't see how the former would case problems.
You don't do something like collect nearly all the usenet postings ever made, make it searchable by date and then take it away. Basically people have lost the ability to do historical internet research using google groups. Sort by date is not even close to the same.
Parent
Re:WTF? (Score:5, Interesting)
Well, I supposed it makes it easier to hide the stupid things some of us may have posted (especially in university) to Usenet back in the 80s and early 90s. Mind you, those "features" allowed me to resurrect some semi-useful postings I had made:
Reading C Declarations: A Guide for the Mystified [ericgiguere.com]
The ANSI Standard: A Summary for the C Programmer [ericgiguere.com]
EricParent
two of the most useful features (Score:5, Interesting)
And linking to a single post is the whole point. I know it costs money to keep that stuff online, but surely they could find a way to put ads on deeplinked posts.
Google just used up all its goodwill with me.
Parent
Re:WTF? (Score:4, Informative)
Of course, it makes it difficult to sort by relevance *within* a date range.
Parent
are you sure? (Score:4, Interesting)
Also now you wouldn't be able to do things like, for example, if you were interested in it for historical reasons, searching posts on Freddie Mercury's (or Ayrton Senna's) death for the month after it happened.
Not to mention that when you sort by date things are not sorted by relevance at all, which means you likely will get A LOT more crap you have to wade through: limiting by date means that you can ignore time periods you're not interested in *AND* still sort by relevance.
Parent
Google is still adjusting their site (Score:4, Informative)
Parent
Deep linking is still very much possible! (Score:5, Informative)
Each message in a thread has a named HTML anchor, try this [google.com] for instance. It will show the whole thread, but position you at an exact message in the middle.
The only problem is there is no easy way to get this URL, you have to find the anchor by looking at the HTML source (Firefox's "View Selection Source" feature helps a lot).
Also, if you click on the "Options" link by the individual message, you get a "Show original" link, which shows just the message, verbatim [google.com].
And from there, you can click on "View parsed", and see just the pretty message [google.com], without the rest of the thread.
So there's your deep-linking. I agree it's not obvious how to do it at the moment, but the ability is obviously still there. Give it some time, it's still a beta!
These quirks and the "Server Error" bugs are to be expected, they'll work it out.
As for the new browsing interface itself, I kinda like it. It integrates and borrows some stuff from their excellent Gmail interface.
It hides quoted text by default (you can expand it with single click), so you don't have to scroll through some morons quoting of a whole message just to add a few words, it keeps a history of groups you recently visited, it allows you to bookmark topics you are interested in, etc. I do find it an improvement over the old interface.
The only thing is the missing date search, I agree there, that was definitely useful feature. If enough people complain, maybe they'll bring it back.
Also, someone else complained that you cannot browse by group anymore... bullshit, it's staring you right in the face, it's the "Browse all of Usenet" link.
Parent
Re:Deep linking is still very much possible! (Score:4, Interesting)
The only problem is there is no easy way to get this URL, you have to find the anchor by looking at the HTML source (Firefox's "View Selection Source" feature helps a lot).
I put this in my userContent.css file (the client-side stylesheet) in Mozilla:Any anchor that has a name attribute will disclose that attribute on the page. The file is in your ~/.mozilla//*/chrome/ folder, unless you use Windows where I don't know its location offhand. You may have to create it. (Your browser will need to be restarted for this change to take effect.)
It likely works for Firefox too.
Parent