Dvorak on Google and Wikipedia 449

Posted by CowboyNeal on Tuesday February 15, 2005 @10:55AM from the looking-for-strings dept.

cryptoluddite writes "PC Magazine has an article by John C. Dvorak expanding on the community discussion of Google's offer for free web hosting of Wikipedia. Those against the deal point out that Google may be planning to co-opt the encyclopedia as Googlepedia (by restricting access to the complete database). In a revealing speech given by the Google founders, Larry Page says he would 'like to see a model where you can buy into the world's content. Let's say you pay $20 per month.' Should public domain information be free?" It's a pretty scary scenario painted, but one can hardly take a speech from 2001 as serious evidence these days. Update: 02/16 20:16 GMT by T : This story links inadvertently to the second page of the column; here's a link to the first page.

Dvorak on Google and Wikipedia

This discussion has been archived. No new comments can be posted.

Search 449 Comments Log In/Create an Account

Comments Filter:

Is this just alarmist talk from a doomsayer? (Score:5, Informative)

by StateOfTheUnion ( 762194 ) writes: on Tuesday February 15, 2005 @11:03AM (#11677817) Homepage

hose against the deal point out that Google may be planning to co-opt the encyclopedia as Googlepedia (by restricting access to the complete database).
Can they do that? The wikipedia is governed by the GNU Free Documentation License . . .wikipedia details here [wikipedia.org].

This is why Jimbo didn't want the details to leak (Score:5, Informative)

by Neophytus ( 642863 ) writes: on Tuesday February 15, 2005 @11:03AM (#11677831)

Speculation runs rife. I guess security through well... not very obscurity's bound to get someone chatting in the end.

The deal in the short to medium term with wikipedia is expected to be the provision of about a dozen caching servers. No actual database work would be done by google. There is already a small (3) squid [squid-cache.org] cluster in Paris [wikimedia.org] that does this for users in the UK and France saving on some transatlantic bandwidth.

Google Groups is still Usenet... (Score:5, Informative)

by GillBates0 ( 664202 ) writes: on Tuesday February 15, 2005 @11:06AM (#11677868) Journal

As I understand it, Google Groups is just one more interface to Usenet, like zillion others offered by ISPs, schools, and other servers. The propogation mechanism of messages is still the same, and they just offered a way for people to access News using a web based interface (lots of other sites offer this) rather than through a regular News reader (rtin, etc).
I'm fine with Google offering a faster mirror/interface to Wikipedia, because mirroring of information is always good. From the last /. article on the subject, I gathered that Google would offer their faster processing power and ub3r bandwidth to Wikipedia....but that doesn't necessarily mean they get to hijack the content....they'd just provide a faster way to get to information that's mirrored elsewhere.

Dirty tricks 101: quotes out of context (Score:5, Informative)

by saddino ( 183491 ) writes: on Tuesday February 15, 2005 @11:11AM (#11677933)

In a revealing speech given by the Google founders, Larry Page says he would 'like to see a model where you can buy into the world's content. Let's say you pay $20 per month.

The only thing "revealing" about that article is that Page continues "Somebody else needs to figure out how to reward all the people who create the things that you use. " In other words, what Page would like to see is a system where "users" pay for accessing content and "contributors" are paid for providing it.

This /. story could have equally read "Does Google Want to Pay Wiki authors?" but of course, that would have derailed cryptoluddite's agenda to smear Google.

To the editors: when you see the words may be planning, just ignore the submission in the future. TIA.

Re:Is this just alarmist talk from a doomsayer? (Score:4, Informative)

by tdvaughan ( 582870 ) writes: on Tuesday February 15, 2005 @11:18AM (#11678014) Homepage

Since the copyrights are owned by the people who contribute to the articles, Google would have to contact each of them and ask them to relicense their contributions under a less permissive one. It's a bit like when that dude asked if a Linux kernel snapshot could be released under a BSD license for $50,000. Not going to happen.

Re:Hmm (Score:2, Informative)

by Anonymous Coward writes: on Tuesday February 15, 2005 @11:21AM (#11678053)

Google seem to be attempting to become the Microsoft of the internet.

Except Google is providing useful services that people want to use.

For free.

We get value out of using their services.
The advertisers get value out of the exposure they get, which is great, because the advertising still isn't annoying.

Google isn't squashing competitors with shady business practices, they are simply providing the best, most innovative services for the time being.

Re:Is this just alarmist talk from a doomsayer? (Score:2, Informative)

by BrettJB ( 64947 ) writes: on Tuesday February 15, 2005 @11:23AM (#11678068)

"You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute."

--From the GNU FDL

So, Google could not legally prevent access, no could they prevent content from being mirrored. Don't like Google? Then help maintain a wiki mirror elsewhere.

Seems to me Google just wants to co-opt an information resource that has become extremely popular of late. They'd like to be to serve it up as they do now, but with Google ads sprinkeled in the sidebar. Since I currently ignore the Google ads now in their searches and in my Gmail account, this wouldn't be an issue for me personally.

Re:Licensing? (Score:5, Informative)

by pohl ( 872 ) writes: on Tuesday February 15, 2005 @11:29AM (#11678131) Homepage

Not only that, but the open content license also allows Google to profit from providing premium access (read: low-latency) to their own instance of the content. This sort of scenario was anticipated from the beginning when the content license was discussed, and it was considered to be an indicator of success.

From the speech in question (Score:4, Informative)

by Infonaut ( 96956 ) writes: <infonaut@gmail.com> on Tuesday February 15, 2005 @11:31AM (#11678141) Homepage Journal

"One risk of that is that people don't get paid for their content, which is clearly a problem. I'd personally like to see a model where you can buy into the world's content. Let's say you pay $20 per month and get access to the world. Somebody else needs to figure out how to reward all the people who create the things that you use."
It seems to me that they're talking about copyrighted content here. Rather than concocting a plan to bundle up free content and make people pay Google for access, it looks to me like Page was actually talking about reasonable means of access to copyrighted information.

This is all fud (Score:5, Informative)

by Raul654 ( 453029 ) writes: on Tuesday February 15, 2005 @11:40AM (#11678254) Homepage

First, full discloser - I'm a long time wikipedia user and I probably accidentally played a peripheral role in breaking this story. I first heard about the google deal back in July. Google is not the first company to offer to host wikipedia. The typical offer comes from "Mom and Pop ISPs" (Jimbo's words) that really don't have any idea what they're getting themselves into (1,400 hits/sec is a helleva lot to do for free). What I have to say in reply to this story is - it is, IMHO, totally FUD. It's completely hypothetical, and it's unrealistic. You have to remember - all the text on Wikipedia is licensed under the GNU Free Documentation License or in the public domain; all the images and audio are licensed under the GNU Free Documetnation license, or CC-by-SA, or something liberal equivalent. So even if, on the off chance, Google succumbs to the Corporate pressure to be evil, anyone can take the text and reuse it in less evil ways. Furthmore, I trust Jimbo, Angela, and Anthere (the visible members of the board) in dealing with google to make sure the deal is done right by the rest of us contributors. There's a long history on Wikipedia of being against ads of any form - the spanish wikipedia forked several years ago over hypothetical discussion of it.

Wrong. (Score:5, Informative)

by Raul654 ( 453029 ) writes: on Tuesday February 15, 2005 @11:44AM (#11678304) Homepage

You're wrong. The problem wasn't that we didn't have enough servers, but that the servers we had were misconfigured. The slowness experienced in January was resolved when the configuration bugs were ironed out. The problem is a lack of skilled sysadmins and developers. (And for the record, we just put in an order for 10 more servers)

Re:Licensing? (Score:5, Informative)

by Raul654 ( 453029 ) writes: on Tuesday February 15, 2005 @11:47AM (#11678331) Homepage

They cannot. This article is nonsensical FUD from someone who doesn't know what he is talking about. (--A wikipedia admin)

Re:Dvorak is stale (Score:3, Informative)

by llywrch ( 9023 ) writes: on Tuesday February 15, 2005 @12:21PM (#11678657) Homepage Journal

> 15 or so years ago Dvorak had some insightful articles, even if they didn't always come 100% true. Nowadays
> he's another has-been from a past era trying to pimp his FUD and general tech conspiracy theories. IMO, if you
> steadily bet AGAINST Dvorak you'll come out ahead over the long run.

You got it in one. Dvorak must have remembered that he had a column due, indulged in his intoxicant of choice, picked some random news items & used them as an excuse to indulge in some superifical reflections.

Speaking as someone who has contributed for a long time to Wikipedia, there is no simple way that Google could take control of Wikipedia. (There are days when I wonder if *anyone* could control Wikipedia.) Because its content is licensed under a form of the GPL (as well as many parts under the Creative Commons license), if you can read it, you can copy it & fork it. Making a mirror of Wikipedia is not only possible, it has been done: there are countless websites that mirror Wikipedia's content, some more up to date than others, some using the material as a starting place for their own encyclopedias. Further, many of the non-English Wikipedias have their own communities & are establishing their own servers; even if Google somehow got control of the main servers in Florida, it would be trivial for the groups in Europe to immediately fork.

And community is an important part of this. Were Google to start limiting access to Wikipedia, *many* volunteers would leave -- either to a fork, or stop contributing entirely. In a very short time, what was left of the original Wikipedia site would have minimal value, ravaged by bitrot, out-of-date information, & unchecked vandalism.

IMHO, the best Google could do here is offer a better interface to Wikipedia than Wikipedia has. The ability to edit Wikipedia will undoubtedly remain on their servers; to attempt to share this ability would result either in a technological mess or a fork.

Lastly, having exchanged emails with Jimbo Wales, the de facto leader of this project & having read much of what he has written about Wikipedia, I sincerely doubt he has any interest in converting it into a for-profit Internet venture. For one thing, he has been working to push the legal responsibility off of his shoulders & onto an international board of directors. And for another, he seems to be having too much fun travelling around the world on behalf of Wikipedia & its related projects: he clearly gets far more satisfaction from this being open & free to everyone, than he would if he converted this project into a big pile of cash. At present, more people listen & value what Jimbo has to say than what Dvorak writes; that fact alone must stick in this has-been Ziff-Davis' columnist's craw & color anything Dvorak has to say about Wikipedia.

Geoff

Re:Licensing? (Score:3, Informative)

by Raul654 ( 453029 ) writes: on Tuesday February 15, 2005 @12:52PM (#11679089) Homepage

It means that if you copy it and modify it, you are required to license the new version under the GFDL and acknowledge Wikipedia (and that a hyperlink satisfies our acknoledgement requirement). Is that supposed to be scandalous?

Re:"should public domain information be free?" (Score:1, Informative)

by RazzleFrog ( 537054 ) writes: on Tuesday February 15, 2005 @01:06PM (#11679258)

You could copy all of the information from the site but you wouldn't be able to copy the meta information created by google (or whomever) that is used to organize the information, you obviously wouldn't be able to use any logos or other trademarkes names, and you probably would get in trouble if you made the look and feel too similar. Add to that the fact that you would have to host it and then compete against a well known brand.

Barnes and Nobles actually sells dozens of public domain works in nice matching set hard covers. Anybody can get the works for free online but having it in such a nicely bound package makes it worth the $5-10 they charge.

Re:"should public domain information be free?" (Score:2, Informative)

by IBeatUpNerds ( 827376 ) writes: on Tuesday February 15, 2005 @01:55PM (#11679832)

It's no different than the GPL

It's quite different from the GPL. Public domain works are free for any use and not bound by restrictions (typically). You can do things with Public Domain software that you can't with GPL. It typically means there is no license attached other than a disclaimer saying "Public domain".

paranoid and poorly researched (Score:3, Informative)

by maveric149 ( 250323 ) writes: on Tuesday February 15, 2005 @02:12PM (#11680027) Homepage

First any offer of hosting by Google or anybody else for that matter will not make the 40 or so servers that the Wikimedia Founation already owns go away or stop the foundation from paying its own hosting costs for those servers. Nor will it stop donations from coming in so the foundation can buy more hardware and bandwidth. And the foundation is *not* going to just rely on any one hosting partner but will instead seek out and act upon multiple offers (this is in fact necessary due to the exponential growth of traffic to the sites it operates; such as Wikipedia.org).

The most glaring omission Dvorak makes is the simple fact that due to the license Wikipedia uses, that it would be impossible for any one company to control it. If the 'end' were really near, somebody with better intentions could just download the *whole* Wikipedia and host it. But it would never come to that because the foundation would not allow it ; its very mission is to ensure free access to the projects it runs.

I'm very disappointed in Dvorak.

I totally agree (was Re:Total FUD) (Score:3, Informative)

by Teancum ( 67324 ) writes: <robert_horning.netzero@net> on Tuesday February 15, 2005 @08:03PM (#11684519) Homepage Journal

While I don't speak for the Wikimedia Foundation Board of Trustees, I am a regular follower and poster of the events on the Wikimedia Foundation mailing list where this proposal has taken on a bit of urgancy.

The main point that needs to be looked at is the fact that Wikipedia has been experiencing some absolutely explosive growth in demand from people both trying to add articles, as well as people simply accessing it, like numerous cross-links to Wikipedia mentioned in various /. articles as well as references in news media. All of this crushing demand to view content (where Wikipedia could produce a slashdot effect on /. itself) is taking up bandwidth that simply requires money just to be able to serve up the content.

The current proposed budget for maintaining the servers [wikimedia.org] is on the order of $130,000 and all of that comes from voluntary donations of the community. (BTW, please give some $$$ [wikimediafoundation.org] if you are a regular user of Wikipedia).

Google has quietly given an offer to not only co-locate some Wikimedia servers at their facilities, but also to pay for the servers themselves as part of the general Google server farm.

From what I've seen, nothing in the proposal is to have Google "take over" the Google content. Just like Google uses data [google.com] in the Open Directory Project [dmoz.org] for their google website directory, they are free to use the content of Wikipedia as long as they comply with the terms of the Gnu Free Documentation License [gnu.org].

This is not a way to "lock up" the content, but rather a way to browse Wikipedia in a way where you can be assured that the bandwidth is available to view the content. Basically, a mirror of the Wikipedia project. This is not even a new idea. [wikipedia.org]

I would imagine that the fine points of negotiation right now are that links to add content would be folded back into the main-line Wikipedia database. This is just like the Open Directory Project has been doing for a number of years, so the preceedence is definitely there, even for Google. I don't deny that there is a valid business rationale for Google to host Wikipedia, but don't read more into it than is there: Google offering to host Wikipedia content.

John Dvorak absolutely does not speak for the Wikimedia Foundation, or even as a member of the community in general, and his comments are just to inflame issues from an otherwise uninterested technology journalist just trying to improve the sales of the publications he works for. Having been through similar publicity flare-ups in the past with other "open source" groups, Mr. Dvorak is not showing behavior consistant with even mediocre journalists that would at least contact members of the community he is reporting about. He is just doing raw speculation and that is it.

This article is disingenuous and I hope that Dvorak gets taken to task for the comments that he has made. I also hope that people like him don't kill the good-faith proposal that frankly the Wikipedia could really use, nor "poison" the water of other potential offers to help out in relieving the crushing bandwidth needs of the Wikipedia and other related projects. It is articles like this that give journalists an awful name and destroy what is left of credibility to their profession.

Re:Harsh on Google (Score:3, Informative)

by jarich ( 733129 ) writes: on Tuesday February 15, 2005 @11:51PM (#11686046) Homepage Journal

You could probably download the entirety of google's usenet archives and set up your own service.
There's a leap of logic... well, I guess if it's on the web, I can just download it? E-Donkey doesn't host everything for free. ;)
Here's a little more substantial info: http://www.google.com/terms_of_service.html [google.com]
The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance.

What really happened that week (Score:3, Informative)

by Jamesday ( 794888 ) writes: on Wednesday February 16, 2005 @01:55AM (#11686593)
You're both a bit right. Here's a highlight view of some of the things happening during that week:
- New squid cache servers in Paris. After network bandwidth issues there were resolved they speeded up access in bits of Europe. But they also slowed down all page saves because saves also tell the Squids to remove (flush) related pages from their cache and those more distant servers took longer to flush. Can be tens of thousands of flushes to do. Squid flushing/purging is now much faster and longer term it's being modified to be taken completely out of the save loop. This one really hurt save speed for a while, as the developers sorted out what was happening and improved the way purging was done. Net result: all purging of squids is much more efficient and saves remain faster than they used to be. The way pages are delivered to those who aren't logged in was also improved, so style sheets are served from the Squids for them now - that makes page views for those not logged in less sensitive to Apache web server load.
- Load balancing pain. Load balancing is what chooses which Apache web server gets the next request. The previous system wasn't very even so many requests were getting sent to the most heavily loaded Apaches when they should have been sent to a less loaded one instead. It's been a long-standing problem for us. During the week or so you're talking about we were testing several different replacement load balancing systems to find those which would give a good result:
  
  Pen seemed better than running modified Squids on the Apaches but wasn't good enough.
  Perlbal from the Livejournal people worked very well and gave a nicely even load balance. Brad and Mark from LJ were very helpful in getting it described and set up. We took two Apache web servers to use for this in case they used too much CPU. As it turned out they used only about 10% on each machine. But that left us two Apaches not building pages...
  Our Squid expert wrote a replacement ICP client to run on the Apaches. That also produced an even load balance so around the end of the week we switched to using it. Freed up those two Apaches to go back to page building duty. So, until we need its other features, Perlbal isn't in use - not quite the best solution for us today (but may be in the future).
  So, we left that week with much improved load balancing for the Apaches. Much more consistent page load times now.
- Bugs in MediaWiki 1.4 beta left the Apaches filling their available child slots. That combined with some specific web crawlers could sometimes let the crawlers take the site down by leaving no or very few free children to handle requests. Also increased Apache load a bit. The most important ones have been fixed but there's still an occasional stuck child. To deal with that we have a script restarting one Apache web server every 5 or so minutes, to ensure that it can't rise to a troublesome level.
- The crawler/Apache child problem and a too high setting for maximum children would let the Apache server on some Memcached machines take so much RAM that the system swapped. Very bad news because that caused Memcached to respond very slowly - far too slowly, so all Apaches filled all child slots and the site appeared dead. That's been dealt with now. Not a Memcached issue as such - just the usual don't let the box swap to death situation. While tracking this down assorted other Memcached-related things were improved.
- As a temporary workaround, the two Memcached machines we were using at that time had Apache stopped on them. That left us four Apaches short for a few days. That took us beyond the critical Apache CPU shortage point and Apache load and wait times rose significantly. Better than a dead site but not at all good. Response time drop with loss of machines isn't linear beyond a certain point and four more Apaches out of service took us beyond that point.
- Also during that week the handling of database updates was substantially improved, so lock waits are mostly gone, wh
Read the rest of this comment...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Dvorak on Google and Wikipedia 449

Dvorak on Google and Wikipedia More Login

Dvorak on Google and Wikipedia

Is this just alarmist talk from a doomsayer? (Score:5, Informative)

This is why Jimbo didn't want the details to leak (Score:5, Informative)

Google Groups is still Usenet... (Score:5, Informative)

Dirty tricks 101: quotes out of context (Score:5, Informative)

Re:Is this just alarmist talk from a doomsayer? (Score:4, Informative)

Re:Hmm (Score:2, Informative)

Re:Is this just alarmist talk from a doomsayer? (Score:2, Informative)

Re:Licensing? (Score:5, Informative)

From the speech in question (Score:4, Informative)

This is all fud (Score:5, Informative)

Wrong. (Score:5, Informative)

Re:Licensing? (Score:5, Informative)

Re:Dvorak is stale (Score:3, Informative)

Re:Licensing? (Score:3, Informative)

Re:"should public domain information be free?" (Score:1, Informative)

Re:"should public domain information be free?" (Score:2, Informative)

paranoid and poorly researched (Score:3, Informative)

I totally agree (was Re:Total FUD) (Score:3, Informative)

Re:Harsh on Google (Score:3, Informative)

What really happened that week (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot