Google Revises Usenet Search 628
michaelmalak writes "Wednesday night, Google Groups announced in a thread the rollout of their revised 20-year Usenet archive search engine. Among the various 'improvements': ability to search by date has been eliminated, as has the ability to deep link to a single post. See the announcement thread for others' reaction." An anonymous reader writes "ZDNet has published some interesting insights into what makes Google tick. In this lengthy article, Google's vice-president of engineering, Urs Hölzle delves into the nuts and bolts behind Google's operations, what back-up mechanisms and hardware setup is in place and even some interesting homegrown technology like the Google File System (GFS)."
Progress? (Score:5, Funny)
Well damn - I hope they don't "improve" it too much more.
Re:Progress? (Score:3, Insightful)
Re:Progress? (Score:5, Interesting)
This is the feature request/bug reporting form.
They claim to read every mail generated by this link.
I just submitted a question about this.
I wonder what they'd do if the full power of the
Re:Progress? (Score:5, Interesting)
Of course, their HTML still doesn't validate [w3.org]...
Re:Progress? (Score:5, Interesting)
Re:Progress? (Score:5, Interesting)
Re:Progress? (Score:4, Insightful)
Er, well, not quite; Microsoft gets hammered much more for "embracing and extending" standards and then preventing other implementations from using those "extensions" thereby forcing everyone who wants to be compatible with Microsoft to use Microsoft products. Google not including the doctype , on the other hand, is fairly innocuous, its not like IE or Firefox have issues with it.
Re:Progress? (Score:3, Funny)
Re:Progress? (Score:3, Funny)
Pray we don't alter it any further.
Re:Progress? (Score:3, Insightful)
Re:Progress? (Score:3, Insightful)
Re:Progress? (Score:3, Insightful)
Re:Progress? (Score:3, Funny)
alt.binaries.pictures.erotica.female.genitalia.
bastards.
Re:Progress? (Score:5, Informative)
Alright people, you can stop overreacting. They just rearranged some things, that's all.
There's a link at the top of the thread to turn on the left-hand tree frame.
Deep-linking to a single post [google.com] is still very much possible.
And I highly doubt that a search-by-date feature is going to go missing for long in a 20-year archive. This is, after all, a BETA.
As per usual, Slashdot editors didn't even think it worth their time to follow a single link to see if the submitter wasn't trolling.
Re:Progress? (Score:4, Funny)
And from torrents I get the added benefit of not only downloading the file, but uploading to everyone else and broadcasting my IP address all over the place.
Re:RTFM (Score:5, Informative)
Google changed within the past three hours (Score:5, Interesting)
Re:RTFM (Score:4, Funny)
Re:RTFM (Score:3, Informative)
He's wrong, and not informative at all.
Re:Evil? Re:Progress? (Score:5, Insightful)
Search the web, newsgroups, your desktop etc. It may be all free and good now, but how long before someone pays the right price to access/control what people see.
My experience is that Google search seems to be turning up more noise now than before. Two years ago I could with certainty do a search and get the page I wanted. Now it seems I must scroll through pages of commercial sites and the such to get to the meaty part of the Internet...those little novelty sites that people put up themselves.
Oh well, that's progress.
Re:Evil? Re:Progress? (Score:5, Insightful)
I remember when they originally took over the archive from deja. I was devestated - convinced they were going to totally screw it up. They didn't, or I got used to the screwed up version.
Also, regarding noise appearing in searches, this is a standard cycle that all search engines go through and Google's experiences are well documented. They are constantly changing their search engine to give the most relevant results. Gradually commercial sites that depend on high search results spend enough time and money optimizing their site. Google is constantly changing their tech to push that noise down, but it always gradually floats back to the top. It's in Google's best interest to show commercial sites in their paid ads, not in the valid search results.
Re:Evil? Re:Progress? (Score:5, Interesting)
Re:Evil? Re:Progress? (Score:3, Interesting)
The very last paragraph of the zdnet article might make you slightly happier then:
WTF? (Score:2, Insightful)
What the hell? That was probably two of the most useful features.
Damn you google!
Re:WTF? (Score:2)
Re:WTF? (Score:2)
Re:WTF? (Score:5, Interesting)
Well, I supposed it makes it easier to hide the stupid things some of us may have posted (especially in university) to Usenet back in the 80s and early 90s. Mind you, those "features" allowed me to resurrect some semi-useful postings I had made:
Reading C Declarations: A Guide for the Mystified [ericgiguere.com]
The ANSI Standard: A Summary for the C Programmer [ericgiguere.com]
EricUsenet anonymity (Score:3, Insightful)
Amen... I posted some stuff to Usenet in the early to mid 90s that, given the choice, I'd rather weren't around today. Mainly due to their naive and juvenile nature...
Problem with Usenet nowadays is you *know* it will be archived, and for that reason I use it much less (also because of the worse signal:noise ratio). When I do, it's never under my r
two of the most useful features (Score:5, Interesting)
And linking to a single post is the whole point. I know it costs money to keep that stuff online, but surely they could find a way to put ads on deeplinked posts.
Google just used up all its goodwill with me.
Re:WTF? (Score:4, Informative)
Of course, it makes it difficult to sort by relevance *within* a date range.
are you sure? (Score:4, Interesting)
Also now you wouldn't be able to do things like, for example, if you were interested in it for historical reasons, searching posts on Freddie Mercury's (or Ayrton Senna's) death for the month after it happened.
Not to mention that when you sort by date things are not sorted by relevance at all, which means you likely will get A LOT more crap you have to wade through: limiting by date means that you can ignore time periods you're not interested in *AND* still sort by relevance.
Google is still adjusting their site (Score:4, Informative)
Deep linking is still very much possible! (Score:5, Informative)
Each message in a thread has a named HTML anchor, try this [google.com] for instance. It will show the whole thread, but position you at an exact message in the middle.
The only problem is there is no easy way to get this URL, you have to find the anchor by looking at the HTML source (Firefox's "View Selection Source" feature helps a lot).
Also, if you click on the "Options" link by the individual message, you get a "Show original" link, which shows just the message, verbatim [google.com].
And from there, you can click on "View parsed", and see just the pretty message [google.com], without the rest of the thread.
So there's your deep-linking. I agree it's not obvious how to do it at the moment, but the ability is obviously still there. Give it some time, it's still a beta!
These quirks and the "Server Error" bugs are to be expected, they'll work it out.
As for the new browsing interface itself, I kinda like it. It integrates and borrows some stuff from their excellent Gmail interface.
It hides quoted text by default (you can expand it with single click), so you don't have to scroll through some morons quoting of a whole message just to add a few words, it keeps a history of groups you recently visited, it allows you to bookmark topics you are interested in, etc. I do find it an improvement over the old interface.
The only thing is the missing date search, I agree there, that was definitely useful feature. If enough people complain, maybe they'll bring it back.
Also, someone else complained that you cannot browse by group anymore... bullshit, it's staring you right in the face, it's the "Browse all of Usenet" link.
Re:Deep linking is still very much possible! (Score:4, Interesting)
The only problem is there is no easy way to get this URL, you have to find the anchor by looking at the HTML source (Firefox's "View Selection Source" feature helps a lot).
I put this in my userContent.css file (the client-side stylesheet) in Mozilla: Any anchor that has a name attribute will disclose that attribute on the page. The file is in your ~/.mozilla//*/chrome/ folder, unless you use Windows where I don't know its location offhand. You may have to create it. (Your browser will need to be restarted for this change to take effect.)
It likely works for Firefox too.
Re:WTF? (Score:3, Insightful)
The bastards.
hmmmm (Score:5, Funny)
Next up: Grammar and Content
A little respect (Score:2, Insightful)
For all the years of good service we've had from google, who are we to question the removal of features? What the bearded terminal hackers at Google giveth, the bearded terminal hackers at Google may taketh away. Certainly, if we can embrace their advertising as the GNU/Linux community has done en-masse, we can understand that they have their reasons for these changes.
Perhaps you'd like to start your own archive of the USENET message boards?
Re:A little respect (Score:5, Insightful)
We're the users. That's our right as users. If nobody questions the decision to remove features, then how does Google know what features we liked?
There's absolutely nothing wrong with constructive criticism, even with respect to a "free" service.
Re:A little respect (Score:2)
We are their customers, their clients and their users. Without us, they're nothing.
Respect is earned (Score:5, Insightful)
Excuse me, but their Google Groups feature is based entirely on profiting from others' work (and copyrighted work at that). If you're providing a properly searchable index, you might (might) have a public interest defence to the copyright infringement. If you're providing a useful service, most people might (might) not mind you using their work. But if you're going to take away useful searching facilities and provide a service that doesn't even allow proper citation (i.e., deep-linking to a specific post), you're going to be both unpopular and almost certainly breaking the law. I don't know about you, but personally I don't have much respect for people who are either of those things.
Re:Respect is earned (Score:3, Insightful)
Re:Respect is earned (Score:3, Insightful)
Re:Respect is earned (Score:3, Insightful)
Some basic copyright law / Usenet (Score:3, Insightful)
You're obviously trolling, but in the interest of myth-dispelling: under most jurisdictions, everything you write is your copyright by default. What matters is any permission you give (implicitly or explicitly) for it to be copied, and any exemptions to which someone copying it without permission may appeal (e.g., fair use).
There is an implicit permission for something you post to Usenet to propagate and stay around for a few days. Whether there's an implicit permission for others to archive those posts,
Re:Some basic copyright law / Usenet (Score:3, Insightful)
There is an implicit permission for something you post to Usenet to propagate and stay around for a few days. Whether there's an implicit permission for others to archive those posts
Unless you have some hard legal definition for how long "a few days" is supposed to be, you *do* in fact give implicit permission to archive those posts.
Re:A little respect (Score:5, Insightful)
I'm not sure what motivated such changes, but usually you don't remove enhancesments to software unless they are causing major problems or if they somehow affect your financial bottom line. Somehow I think its related to the latter of the two because I don't see how the former would case problems.
You don't do something like collect nearly all the usenet postings ever made, make it searchable by date and then take it away. Basically people have lost the ability to do historical internet research using google groups. Sort by date is not even close to the same.
I do hope you were kidding (Score:3, Insightful)
Their bread and butter? Without us (the millions of people who use google rather than a competitor) they don't have a business.
I read your post and thought I could detect a tongue firmly in cheeck. I don't know what is more disturbing
Re:A little respect (Score:3, Insightful)
Uh, we're the people who pay google's paychecks?
Who is Google to question what its users want?
Perhaps you'd like to start your own archive of the USENET message boards?
Considering Google bought up all the significant USENET archives in existence, wouldn't that be a bit hard?
If Google had come up with a service and now they were scaling it back, I would consider it silly to complain about this, sin
500 error? (Score:4, Funny)
(Gathers canned goods, candles, heads for cave)
Re:500 error? (Score:2)
Dumb (Score:5, Insightful)
Re:Dumb (Score:2)
Re:Dumb (Score:4, Informative)
I haven't the slightest idea where the original poster got their information.
Re:Dumb (Score:3, Informative)
Improvements??? (Score:5, Interesting)
Jee, nice "improvements"... I personally have linked to individual posts on a web page summarizing a lawsuit I was involved in that was directly related to posts in a newsgroup. I know others who have linked to posts in similar situations. I just checked my web page and the links to those posts no longer work.
Google just took a HUGE step backwards in my opinion.
OMG.. it's truly awful. (Score:4, Informative)
Luckily the rot hasn't spread to the national Googles yet, so you can still use Google UK [google.co.uk] if you need it.. at least until they ruin that too.
Re:OMG.. it's truly awful. (Score:2)
Re:OMG.. it's truly awful. (Score:5, Informative)
I'm believe that the "new groups" are not new usenet groups, but merely a yahoo-groups clone on the side, which gets he same interface as the one they provide for usenet groups.
The old groups interface rocked. This is a major step in the wrong direction in my book.
Re:OMG.. it's truly awful. (Score:3, Informative)
In the majority of WHAT? The only people using Usenet nowadays (or even knowing about it) ARE those who appreciate it. Usenet is MUCH less well known than you might believe, and I would say that well over 90% of Google users have NO IDEA what that third link above the query box means. It's frightening even how many programmers I meet that have never heard of Usenet, and amongst those that know about it, how few really use it for research. Frankly, without Usenet I'd
HW summary overview (Score:5, Informative)
- Over four billion Web pages, each an average of 10KB, all fully indexed.
- Up to 2,000 PCs in a cluster.
- Over 30 clusters.
- One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
- Sustained transfer rates of 2Gbps in a cluster.
- An expectation that two machines will fail every day in each of the larger clusters.
- No complete system failure since February 2000.
Now, 2,000 machines in a cluster, plus 1PB data, plus 2Gbps in a cluster times 30 clusters comes to:
- "Over" 60,000 PCs (!)
- "Over" 30PB data storage
- "Over" 60Gbps bandwidth
Also interesting:
- An expectation that two machines will fail every day in each of the larger clusters.
- No complete system failure since February 2000.
Re:HW summary overview (Score:2)
What is a petabyte? (Score:2)
Uhh, ok what is a petabyte? Is that like half of a veggie burger or something? I'm guessing 1000 Terrabytes?
Re:What is a petabyte? (Score:2)
I'm guessing you won't even want to KNOW about Yottabytes and Zeptometers [nist.gov]. It's true, Dr Seuss has come back from the grave and taken over the ISO.
Separation of posts (Score:2, Insightful)
Hey Google: you're being evil... (Score:5, Interesting)
Also, it creeped me out to no end discovering this morning that my Gmail cookie is really a "Google Accounts" cookie which will now be attached to my Usenet forays via Google as well. I personally don't want the line between public and private conversations to be muddied like that, and I definitely don't want a unified cookie straddling both domains.
Finally, the interface leaves a lot to be desired. The layout is cluttered and junky now whereas it was clean and simple before. I'm not enthralled by the Javascript hooks. Threading seems to be worse than ever (and still not done by message-ID or References - when I asked Google why this was via email, the response was "too difficult"... *boggle*) and the CLI-esque search ability is degenerating into a GUI mess; where one line of text and a CR would before get you to the page you wanted, it now can take that plus several additional mouse gestures and clicks.
This is a sad day, to see a useful tool become so f**ked up for no apparent good reason. I can only hope and pray for a reversion.
Re:Hey Google: you're being evil... (Score:3, Insightful)
Your conversations in standard email are not private (unless you pgp them)
People along the transmission path, and sysadmins with access to the mail spool can snoop on them, yes. But (1) they are not intended to be shown to the whole world, while usenet posts are - by posting to usenet you are giving explicit permission for your post to be public, and (2) they are not visible to every single person in the world with a web browser.
Google's improvements (Score:3, Interesting)
I know this is a liiiittle bit offtopic, but here's a story about how the little guy (or little country) can still reach a huge company like Google and get them to change something.
> Date: Mon, 29 Nov 2004 13:04:02 +0100
> Hi,
>
> I wanted to post a question to Google Answers,
> but my VISA credit card was not accepted,
> because its expiry date is 09/12 and you only
> allow up to 2009, not 2012.
>
> How do I solve this problem? I live in Denmark.
> I use the same card to shop on the internet all
> the time.
>
> Kind regards,
Hello Jakob,
Unfortunately, because the expiration date is not listed on our billing page, we must ask that you use a different credit card.
Sincerely,
The Google Answers Team
> Date: Tue, 30 Nov 2004 12:00:27 +0100
>
> Dear Google Answers Team,
>
> That is the only credit card I have. This is
> very unfortunate, but since others have solved
> the problem, I'm sure that so could you?
>
> Regards, Jakob
Hello Jakob,
Thank you for your reply. We will extend our expiration date options. The
billing page should update in 24-48 hours.
Sincerely,
The Google Answers Team
So still: HURRAY FOR GOOGLE!!!
Re:Google's improvements (Score:3, Insightful)
Perhaps we have our reason right there. Google+ accounts anyone?
Disclaimer: I know nothing about Google groups.
First real deviation (Score:3, Insightful)
Re:First real deviation (Score:3, Insightful)
They've been very close several times before. But the last time I cited the other cases I was modded into oblivion (though also Insightful) and you've already been modded (-1, Offtopic) despite the fact that you're clearly not. So, you just get the quick version this time: Groups itself, Google Cache and Google's image search are all potentially (or almost certainly) illegal in many jurisdictions, and all on dubious moral ground at times, too.
Work around for filtering search by date (Score:3, Informative)
http://groups.google.com/advanced_group_search?hl
Re:Work around for filtering search by date (Score:3, Informative)
Total catastrophe, a complete and utter misstep (Score:4, Insightful)
Re:Total catastrophe, a complete and utter misstep (Score:4, Funny)
No Escape! (Score:4, Insightful)
ARRRRRRRRGH (Score:5, Insightful)
I have bookmarks to specific articles/threads it took me a long time to find and to which I refer now and then and if they stop working the usefulness of google groups for me will be much reduced...
As much as I understand why they would want to make USENET look more like a message board for people who never really grew up with it (usenet and gopher were mostly all we had back when I first went online) I still think that not having this functionality available for people who know how to make the most of it is very backward thinking.
We got fooled again (Score:2)
Google Groups != usenet anymore (Score:2)
System borked (Score:2)
Grumble.
And what's with the tab for 'add a new group' - are they planning to any user to unilaterally create new usenet groups? Or are they planning to make usenet indistinguishable from their own (yet another bulletin board type) forums?
And after they'd finally got good. (Score:3, Insightful)
When Google first bought up the old DejaNews archives I was ticked. They took something with which I could get the information I was after and returned something with which I could not.
Over the past few years they finally got it back to being something useful. I had heard about this "Make It Into Yet Another Glorified Web Groups" effort, and was less than impressed. But as long as it didn't interfere with it being a decent Usenet search engine...
No sort-by-date and no direct-article-linking? WTF? So if I want to get only the most recent posts for a certain query or if I want to pass someone a direct link to a specific post then I'm now SOL? How is that an "improvement"?
Is there anywhere else with an exhaustive archive of Usenet? I think I'm about to jump ship. I neither need nor want another web-groups option, and I want more search flexibility rather than less.
Ugh (Score:3, Interesting)
One thing that's horrible, is trying to find a group in the new system. I was looking for news.admin.net-abuse.email. (Fortunately, I have it bookmarked.) After going to "news." from the top-level Google Groups page, I was taken to a category selection page that included things like "Arts & Entertainment" and even "Adult". There are no such groups under the Usenet news. heirarchy. And under those categories the individual groups are ordered in what's probably their Google PageRank order, not alphabetically, not by size, not by any obvious means.
The big change seems to be they are integrating the Usenet archive with their own Groups stuff, and the two really aren't the same.
Sigh. Should've seen this coming... (Score:3, Insightful)
Direct Linking is still possible... (Score:5, Informative)
Navigate to the thread, for example this [google.com] comp.arch thread. Choose the post you want to link to, and click on "Show Options". Two of the options are "print", which is a link to a "printable" version of the article, and "Show original", which is a link to the article with all the headers.
One more step (or simple URL hack) from this display is "view parsed" which gives a friendly HTML version -- for example, try this link [google.com].
Re:Direct Linking is still possible... (Score:3, Informative)
ttp://groups-beta.google.com/groups?selm=modera
That URL isn't linked from the discussion, and it refreshes to the "proper" location, so you have to construct it yourself by cutting/pasting the message-ID. But it still works.
Deep linking to a single post can work (Score:3, Informative)
I had a link to usenet post in a recent blog [amon-hen.com] entry. Try this [google.com] (sometimes there's a server error, but otherwise it seems to work). The trick is to click on "Show Original" and use that link.
What Google Hardware Actually Looks Like (Score:5, Informative)
Anyway, as we were walking around the 150,000+ square foot datacenter floor, when a guy came by, pushing a very odd looking rack.
It resembled a bread tray, 20 shelves if I counted correctly, with completely naked main boards sitting on them. It looked to be 4 machines per row (counting the power supplys). Each had one IDE disk sitting on a gel pad, strapped in with velcro. I personally watched them wheel 4 of these racks right by me back into the dark "Google" corner of the datacenter. Our tour guide finally gave in.
Him: "Well, you've seen them now!"
Me: "What do you mean?"
Him: "Thats google!"
Definitely the highlight of my day!
Re:What Google Hardware Actually Looks Like (Score:4, Informative)
Once, said Hölzle, "someone disconnected an 80-machine rack from a GFS cluster, and the computation slowed down as the system began to re-replicate and we lost some bandwidth, but it continued to work. This is really important if you have 2,000 machines in a cluster." If you have 2000 machines then you can expect to see two failures a day.
Looks like my numbers were correct. 20 shelves * 4 machines per shelf = 80 machines per rack.
Send a complaint message here (Score:3, Informative)
If everyone who posted a comment took out 60 seconds to send a complaint message, I think it would make a difference.
Groups is back to the old format (Score:3, Informative)
There's still hope... (Score:3, Informative)
Give them feedback (Score:3, Insightful)
http://groups-beta.google.com/support/bin/request. py [google.com]
If you don't like how they've changed it, let them know about it. If enough of us do it, maybe they'll do something about it.
Google File System (Score:5, Interesting)
Goodbye Google? (Score:5, Interesting)
There were other issues as well as the rapacious spidering (which reminded me of some of the worst spambots [neilgunton.com] out there), but I won't go into the details here. I didn't get any satisfactory resolution from Google when I tried contacting them.
Website suicide? I don't know. All I do know is that Google seems to be fulfilling my biggest fears - they are going downhill as they get bigger. Funny how the bigger a company gets, the more it tends to suck. Also, having an IPO is never a good thing, in my experience - it always leads to short-termism and corporate decisions based more on the bottom line than what's actually good for the users. Sure, any company has to look after its shareholders and investors, but they never seem to really grok that being so focused on the short-term negatively impacts things in the longer term, particularly if it loses you goodwill in the userspace. Also, as a company grows you do tend to get the sort of braindead, clueless decisions coming out that we apparently see here.
So now we have Google restricting what we can do with old Usenet posts... didn't they buy up all the archives for this stuff a while back? This would appear to give them some amount of power, but also (they should realize) responsibility as stewards of the past. This is not something that they are simply indexing on someone else's website, it's data that they actually own. But in this case it's not really their data at all - it's the community's.
Google seems to be slowly using up the goodwill they built up since 1998 when they came onto the scene, a small, fast, simple, charming and relevant search engine that kicked ass. Why can't a company just keep doing what it does well, and be satisfied with that? Why does everything have to eventually grow, expand, gobble up other companies, and then inevitably start to suck?
Never mind... for now, Goodbye Google.
First use of "spam" on USENET, found via Google (Score:5, Interesting)
http://groups.google.com/groups?q=ken+weaverling+s pam+usenet+first&hl=en&selm=9v6d5h%245pg%241%40new s.dtcc.edu&rnum=1 [google.com]
According to Ken and his search of google, I was the first people to ever use the word "spam" to refer to unwanted electronic communication. Obviously, I did'nt know it at the time and was quite surprised to learn of my "fame." Yeah, that and $7 will get me a cup of mocha-something, I know.
Anyhow, the whole point is that Ken's reserach was aided by the search by date feature. It will be a shame if that is removed.
(And for the curious, I changed my name from Czarnecki when I got married.)
Re:First use of "spam" on USENET, found via Google (Score:3, Funny)
I have a feeling of deja-vu... (Score:3, Insightful)
The new system sucks. No fixed-width fonts by default, that horrible floating group name at the right of the screen when scrolling, a far slower user interface (it was slow when I first noticed the change about 7 hours ago). I can go on.
They'll be underlining words with links next.
They shoulda checked their logs (Score:3, Interesting)
William
"Less commercial results" button (Score:3, Interesting)
That's an good idea. Other useful capabilities for advanced search:
Google may end up becoming a major player in spam control, because they process large volumes of mail through search systems and can potentially recognize almost all bulk mail.
Don't like how the Google Usenet archive evolves? (Score:5, Interesting)
I already have an archive of around 600 million messages (nearly everything sans binaries from 2000 till today; just a couple of terabytes) and intend to create a public Usenet search engine. As I am using Usenet myself on a daily basis, I know what *I* want in a Usenet search engine, and that's quite different from what Google gives us.
Here's how you can help: Contact me at martin-k (at) softmaker.de if you have a private collection of Usenet postings that you want me to put in the database.
-mk
Re:Big brother trips (Score:2)