Huge Traffic On Wikipedia's Non-Profit Budget

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Huge Traffic On Wikipedia's Non-Profit Budget 240

Posted by timothy on Tuesday June 24, 2008 @01:11PM from the optimizing-smartitude dept.

miller60 writes "'As a non-profit running one of the world's busiest web destinations, Wikipedia provides an unusual case study of a high-performance site. In an era when Google and Microsoft can spend $500 million on one of their global data center projects, Wikipedia's infrastructure runs on fewer than 300 servers housed in a single data center in Tampa, Fla.' Domas Mituzas of MySQL/Sun gave a presentation Monday at the Velocity conference that provided an inside look at the technology behind Wikipedia, which he calls an 'operations underdog.'"

This discussion has been archived. No new comments can be posted.

Huge Traffic On Wikipedia's Non-Profit Budget

Load All Comments

Search 240 Comments Log In/Create an Account

Comments Filter:

Impressive (Score:5, Insightful)

by locokamil ( 850008 ) writes: on Tuesday June 24, 2008 @01:19PM (#23920399) Homepage

Given that their topic sites are generally in the top three for any search engine query, the volume of traffic they're dealing with (and the budget that they have!) is very impressive. I always thought that they had much beefier infrastructure than the article says.

Share
twitter facebook
- Re:Impressive (Score:5, Funny)
  
  by VeNoM0619 ( 1058216 ) writes: on Tuesday June 24, 2008 @01:21PM (#23920457)
  
  Yes, and seeing how slashdot decided to try and slashdot them also helps...
  
  Parent Share
  twitter facebook
  - Wikipedia = much more traffic than slashdot (Score:5, Interesting)
    
    by Anonymous Coward writes: on Tuesday June 24, 2008 @02:28PM (#23921887)
    
    Slashdot does .. what? 40 mbit of traffic at peak? Wikipedia
    is roughly 100 times larger [nedworks.org]. (And WP has three datacenters, not one)
    Slashdot traffic hasn't created noticeable blips on Wikipedia's radar for years.
    OTOH, if Wikipedia linked slashdot on every page slashdot would go down, if do to nothing else but bandwidth exhaustion.
    
    Parent Share
    twitter facebook
    - Re:Wikipedia = much more traffic than slashdot (Score:5, Funny)
      
      by hostyle ( 773991 ) * writes: on Tuesday June 24, 2008 @03:38PM (#23923099)
      
      OTOH, if Wikipedia linked slashdot on every page slashdot would go down, if do to nothing else but bandwidth exhaustion.
      
      Sounds like a dare to me. Gentlemen, start your packets!
      
      Parent Share
      twitter facebook
      - Re:Wikipedia = much more traffic than slashdot (Score:5, Funny)
        
        by BooRolla ( 824295 ) writes: on Tuesday June 24, 2008 @10:27PM (#23928021)
        
        If only there were some way to put links on to Wikipedia!
        
        Parent Share
        twitter facebook
    - Re:Wikipedia = much more traffic than slashdot (Score:4, Funny)
      
      by beav007 ( 746004 ) writes: on Tuesday June 24, 2008 @09:45PM (#23927575) Journal
      
      bandwidth exhaustion
      Welcome to ************ broadband tech support. How can I help?
      
      "My internet is running very slowly tonight. Why is that?"
      
      Well sir, it looks like you've been downloading from the other side of the continent. I'd say that your packets are just very tired by the time they reach you...
      
      Parent Share
      twitter facebook
- Re:Impressive (Score:4, Interesting)
  
  by sm62704 ( 957197 ) writes: on Tuesday June 24, 2008 @01:27PM (#23920571) Journal
  
  I was always impressed with how fast pages loaded, after seeing how small their operation is I'm even more impressed now!
  Go to any newspaper from the NYT to any one in a smaller city (say, Springfield's State Journal-Register) and the difference in load times is HUGE. Probably has to do with all the ads served from third party servers in the newspapers, what's the use of having a humungous server with giant pipes if your readers' pages have to wait for a flash ad served from a 486 powered by gerbils?
  If I link to the SJR form one of my journals it slows down! I mean, I can see if it's a front page slashdotting a little paper like that but come on, a user journal?
  And Wikipedia isn't all their servers serve; iinm the uncyclopedia shares servers. Impressive, indeed.
  
  Parent Share
  twitter facebook
  - Re:Impressive (Score:5, Informative)
    
    by David Gerard ( 12369 ) writes: <slashdot.davidgerard@co@uk> on Tuesday June 24, 2008 @03:03PM (#23922491) Homepage
    
    No, actually - the Wikimedia servers serve all Wikimedia projects (all the Wikipedias, Wikimedia Commons, all the other projects), but Uncyclopedia is part of Wikia, which is a private company owned by Jimmy Wales to do wikis and isn't actually linked to the Wikimedia Foundation in any way.
    
    Parent Share
    twitter facebook
  - Re:Impressive (Score:5, Informative)
    
    by kv9 ( 697238 ) writes: on Tuesday June 24, 2008 @06:10PM (#23925311) Homepage
    
    I was always impressed with how fast pages loaded, after seeing how small their operation is I'm even more impressed now!
    you can skip TFA entirely and look here for detailed info on their servers [wikimedia.org], locations, pictures, software, pretty graphs [wikimedia.org] and charts [wikimedia.org]. and lots more, just keep clicking.
    
    Parent Share
    twitter facebook
- Re:Impressive (Score:4, Interesting)
  
  by Bandman ( 86149 ) writes: <`bandman' `at' `gmail.com'> on Tuesday June 24, 2008 @01:48PM (#23921085) Homepage
  
  Yea, a single datacenter seems really risky, especially considering some of the shenanigans [google.com] that have been going on
  
  Parent Share
  twitter facebook
  - Re:Impressive (Score:5, Informative)
    
    by Achromatic1978 ( 916097 ) writes: <robert.chromablue@net> on Tuesday June 24, 2008 @02:05PM (#23921443)
    
    Except there's not. There's data centers in Europe and Asia, too, including one at some Yahoo facilities - at least on this note, the article (or summary) is utterly wrong. Single datacenter? No.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Interesting)
      
      by Bandman ( 86149 ) writes:
      
      That would make a lot more sense.
      Given the sheer amount of people who access it, it seems like the perfect use for GSLB [networkcomputing.com]
    - Re:Impressive (Score:5, Informative)
      
      by David Gerard ( 12369 ) writes: <slashdot.davidgerard@co@uk> on Tuesday June 24, 2008 @03:05PM (#23922539) Homepage
      
      Single database, though. All the databases for all the projects are in Tampa - one master for English Wikipedia and two for all the other 700+ Wikimedia projects.
      (They tried running the databases for Asian languages from the Yahoo!-sponsored datacentre in Seoul for a while, but it didn't actually work much faster than it did with everything in Tampa.)
      
      Parent Share
      twitter facebook
- Re: (Score:3, Insightful)
  
  by mcrbids ( 148650 ) writes:
  
  As somebody who has been serving the Internet for a good length of time, I remember when busy web servers serving a 10 Mb stream were "ultra-high capacity" with a Pentium II 350 Mhz chip and 256 MB of RAM.
  The reality is that today, if you pay any attention at all to performance and a reasonable architecture, modern commodity hardware has just utterly incredible delivery capacity. A cheap, 1U 4-core x86 with 8 GB of RAM and a couple of SCSI 10k drives can easily saturate a 1 Gb stream of static pages, or eve
I've always wondered... (Score:4, Insightful)

by mnslinky ( 1105103 ) writes: on Tuesday June 24, 2008 @01:22PM (#23920471) Homepage

It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work. It's always nice having the newest/fastest systems out there, but it's rarely the reality.

Share
twitter facebook
- Re:I've always wondered... (Score:5, Funny)
  
  by Anonymous Coward writes: on Tuesday June 24, 2008 @02:06PM (#23921481)
  
  "It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work."
  Since they are using LAMP, obviously they could save money by following Microsoft's "Get The Facts" advice!
  
  Parent Share
  twitter facebook
- Re:I've always wondered... (Score:5, Informative)
  
  by midom ( 535130 ) writes: on Tuesday June 24, 2008 @02:43PM (#23922127) Homepage
  
  I covered most of Wikipedia technology bits at my previous year MySQL Conference presentation: http://dammit.lt/uc/workbook2007.pdf [dammit.lt] (thats quite detailed report)
  
  Parent Share
  twitter facebook
- It's easy... (Score:2)
  
  by CarpetShark ( 865376 ) writes:
  
  If wikipedia is anything to go by, you just don't include a decent search engine.
  - Re:It's easy... (Score:4, Insightful)
    
    by Hillgiant ( 916436 ) writes: on Tuesday June 24, 2008 @04:23PM (#23923811)
    
    Why? If you want search, go to google. If you want an encyclopedia, go to wikipedia. Its pretty simple, really.
    
    Parent Share
    twitter facebook
The power of low standards (Score:5, Insightful)

by Itninja ( 937614 ) writes: on Tuesday June 24, 2008 @01:22PM (#23920477) Homepage

From TFA: "But losing a few seconds of changes doesn't destroy our business."

Our organizations' databases (also a non-profit) get several thousand writes per second. Losing 'a few seconds' would mean potentially hundreds of users' record changes were lost. If that happened here, it would be a huge deal. If it happened regularly, it would destroy the business.

Share
twitter facebook
- Re:The power of low standards (Score:5, Insightful)
  
  by robbkidd ( 154298 ) writes: on Tuesday June 24, 2008 @01:37PM (#23920799)
  
  Okay. So pay attention to the sentence before the one you quoted which read, "I'm not suggesting you should follow how we do it."
  
  Parent Share
  twitter facebook
- Re:The power of low standards (Score:5, Insightful)
  
  by Anonymous Coward writes: on Tuesday June 24, 2008 @01:47PM (#23921057)
  
  Don't be too harsh -- the standards are dependent on the application. Your application, by the nature of the information and its purposes, requires a different standard of reliability than Wikipedia does. You're certainly entitled to be proud of yourself for maintaining that standard.
  But don't let that turn into being derogatory about the Wikipedia operation. Wikipedia has identified the correct standard for their application, and by doing so they have successfully avoided the costs and hassle of over-engineering. To each his own...
  
  Parent Share
  twitter facebook
  - Re:The power of low standards (Score:5, Interesting)
    
    by WaltBusterkeys ( 1156557 ) * writes: on Tuesday June 24, 2008 @02:01PM (#23921379)
    
    Exactly. A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that. Six nines works out to about 30 seconds of downtime per year.
    It seems like Wikipedia is getting things right 99% of the time, or maybe even 99.9% of the time ("three nines"). That's a pretty low standard relative to how most companies do business.
    
    Parent Share
    twitter facebook
    - Re:The power of low standards (Score:5, Informative)
      
      by Nkwe ( 604125 ) writes: on Tuesday June 24, 2008 @02:21PM (#23921765)
      
      A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that.
      
      Banks don't require "six nines"; banks require that no data (data being money), once committed, get lost. The "nines" rating refers to the percentage of time a system is online, working, and available to its users. It does not refer to the percentage of acceptable data loss. It is acceptable for bank systems to have downtime, scheduled maintenance, or "closing periods" -- all of these eat into a "nines" rating, none of which lead to data loss.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by WaltBusterkeys ( 1156557 ) * writes:
        
        The nines can refer to both.
        I agree that banks can't withstand data loss, but they can withstand data errors. If there's a 30-second period per year when data doesn't properly move, and that requires manual cleanup, that's acceptable.
        
        Re:The power of low standards (Score:4, Insightful)
        
        by PMBjornerud ( 947233 ) writes: on Tuesday June 24, 2008 @04:36PM (#23923997)
        
        If there's a 30-second period per year when data doesn't properly move, and that requires manual cleanup, that's acceptable.
        And if there is a 1-hours downtime, EVER, you just blew through the scheduled downtime for the next 120 years.
        "Six nines" is meaningless. Unrealistic.
        It is a promise that you cannot be hit by a single accident, fuckup, pissed-off-employee or act of god.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by Qzukk ( 229616 ) writes:
        
        You can achieve 100% service availability by clustering
        Is that where when I run "DROP TABLE reallyimportanttable;" it drops it on all the servers at once?
      - Re: (Score:3, Insightful)
        
        by Waffle Iron ( 339739 ) writes:
        
        Indeed. Some of us are old enough to remember the days of "banker's hours" and before ATMs, when banks used to make their customers deal with less than "one two" (20%) availability.
        
        Re:The power of low standards (Score:5, Funny)
        
        by AK Marc ( 707885 ) writes: on Tuesday June 24, 2008 @04:25PM (#23923843)
        
        Screw that, I want a bank with six twos of performance. 22.2222%. Of course, any number of nines is easy to achieve. Want six nines? 9.99999% is easy.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Funny)
        
        by Kingrames ( 858416 ) writes:
        
        Screw that, it needs to be a prime number.
        or at least irrational.
      - Re: (Score:2, Interesting)
        
        by Anonymous Coward writes:
        
        Right, banks actually traditionally used such techniques as planned downtime to allow for maintenance. The "banker's hours" allowed for a large period of time, daily, where little-to-no 'data' was changing in the system and the system could be 'balanced'.
    - Re: (Score:3, Insightful)
      
      by astrotek ( 132325 ) writes:
      
      Thats amazing considering I get an error page on bank of america around 5% of the time if I move to quickly though the site.
- Re: (Score:2)
  
  by ericspinder ( 146776 ) writes:
  
  Losing 'a few seconds' would mean potentially hundreds of users' record changes were lost. If that happened here, it would be a huge deal.
  If you don't deal with financial data, it's likely that even your business would survive should such an event like that happens. Sure if it happens all the time users would flee, but I haven't seen such problems at Wikipedia. He wasn't talking about doing it regularly, just that when disaster does strike, no pointy haired guy appears to assign blame.
- Re: (Score:2, Informative)
  
  by MinuteElectron ( 1179725 ) writes:
  
  Changes are never just lost, when an error does happen and the action cannot be completed then it is rejected and the user notified of this so they can try what they were doing again. You have vastly overstated the severity of such issues.
- Re: (Score:3, Interesting)
  
  by az-saguaro ( 1231754 ) writes:
  
  Your reasoning may be a bit specious. If your databases get "several thousand writes per second", it sounds like this may be massive underuse of your bandwidth - i.e. your servers or databases may be able to handle hundreds of thousands or millions of writes per second. If a few seconds were lost or went down, then the incoming traffic might get cached or queued, waiting for services to come back on line. Once the connection is re-established, the write backlog might take only a few seconds or a few frac
I was just thinking that (Score:3, Funny)

by imstanny ( 722685 ) writes: on Tuesday June 24, 2008 @01:22PM (#23920481)

Every time I Google something, Wikipedia comes near the top most of the time. Maybe that's why Google doesn't want to disclose its processing power, it may very will be a lot smaller than people assume.

Share
twitter facebook
- Re: (Score:2)
  
  by Bandman ( 86149 ) writes:
  
  Ever pay attention to the render times, though?
  Their infrastructure is scary-massive, from almost every report [datacenterknowledge.com]
- Re:I was just thinking that (Score:5, Interesting)
  
  by Chris Burke ( 6130 ) writes: on Tuesday June 24, 2008 @02:32PM (#23921969) Homepage
  
  I don't actually know anything about the total computing power Google employs, but I do know that they will purchase on the order of 1,000-10,000 processors merely to evaluate them prior to making a real purchase.
  
  Parent Share
  twitter facebook
  - Re:I was just thinking that (Score:4, Interesting)
    
    by kiwimate ( 458274 ) writes: on Tuesday June 24, 2008 @05:14PM (#23924517) Journal
    
    You know what I thought was interesting? This story [cnet.com] (which was linked to from this /. story titled A Look At the Workings of Google's Data Centers [slashdot.org] contained the following snippets.
    On the one hand, Google uses more-or-less ordinary servers. Processors, hard drives, memory--you know the drill.
    and
    While Google uses ordinary hardware components for its servers...
    But this was immediately followed by:
    it doesn't use conventional packaging. Google required Intel to create custom circuit boards.
    For some reason I'd always believed they used pretty much standard components in everything.
    
    Parent Share
    twitter facebook
- - Re: (Score:3, Interesting)
    
    by imstanny ( 722685 ) writes:
    
    But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays.
    On one hand, you're right, efficiency is admirable. But on the other hand, if Google has insane amounts of processing power, it would likely mean much higher barriers to entry for its competitors. The threat of Google's power in processing such data could deter others from even attempting to compete with Google. After all, when Google started it was only funded with a few hundred thousand dollars.
  - Re:I was just thinking that (Score:5, Insightful)
    
    by dubl-u ( 51156 ) * writes: <2523987012@pota . t o> on Tuesday June 24, 2008 @03:08PM (#23922613)
    
    But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays.
    Turn it around. What does Google gain from exposing data about their internal performance?
    Maybe they do well because they are amazingly CPU-efficient on a per-query basis. Maybe it's the opposite; they may be masters at lavishing CPU on every query, but know how to do that very cheaply. Most likely, it's a clever mix of the two.
    Regardless, Google's engineering-fu and operations-fu are mighty, and a major competitive advantage. Releasing detailed data doesn't boost their reputation, as everybody already knows they are great. But it does give potential competitors an idea of what works well, making it easier for them to catch up with Google. As a rule, expect that any details you see from inside Google are old, boring, or vague. As Intel's Andy Grove said, "Only the paranoid survive."
    
    Parent Share
    twitter facebook
- - Re: (Score:2)
    
    by Albanach ( 527650 ) writes:
    
    Yes, but anytime anyone googles anything, google has to do processing.
    
    I'd have thought they'd use a caching solution just like wikipedia. After all, just as Wikipedia has some very popular pages and some less so, Google has many popular searches and many less so. Wouldn't they cache these? After all if you're dealing with millions of searches for 'george carlin' you wouldn't want to go query your entire DB every time, would you?
Easy to Increase the budget or add servers (Score:5, Funny)

by Subm ( 79417 ) writes: on Tuesday June 24, 2008 @01:23PM (#23920513)

How hard can it be to increase the budget or add more servers?
Just go to the Wikipedia page with those numbers and change them. You don't even need to have an account.

Share
twitter facebook
- Re:Easy to Increase the budget or add servers (Score:5, Funny)
  
  by elrous0 ( 869638 ) * writes: on Tuesday June 24, 2008 @02:31PM (#23921929)
  
  In their defense, if you're going to run your entire site off a single server farm,a coastal city in Florida is the logical place to put it.
  
  Parent Share
  twitter facebook
- - - Re: (Score:2)
      
      by khellendros1984 ( 792761 ) writes:
      
      I'm fine with truthiness, myself.
Maybe... (Score:3, Funny)

by nakajoe ( 1123579 ) writes: on Tuesday June 24, 2008 @01:28PM (#23920577)

Datacenterknowledge.com might want to take lessons from Wikipedia as well. Slashdotted...

Share
twitter facebook
Note to self (Score:5, Funny)

by Anita Coney ( 648748 ) writes: on Tuesday June 24, 2008 @01:28PM (#23920591) Homepage

If you ever find yourself in a flamewar on Wikipedia you cannot win, bomb Tampa, Florida out of existence.

Share
twitter facebook
- Re:Note to self (Score:5, Funny)
  
  by canajin56 ( 660655 ) writes: on Tuesday June 24, 2008 @01:43PM (#23920949)
  
  That's your solution to everything.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Funny)
    
    by TubeSteak ( 669689 ) writes:
    
    That's your solution to everything.
    I did ask if you wouldn't prefer a nice game of Chess.
    -WOPR
- Re:Note to self (Score:5, Interesting)
  
  by Ron Bennett ( 14590 ) writes: on Tuesday June 24, 2008 @01:48PM (#23921073) Homepage
  
  Or do a hurricane dance, and let nature do its thing...
  Having all their servers in Tampa, FL (of all places given hurricanes, frequent lightning, flooding, etc there) doesn't seem too smart - I would have thought, given Wikipedia's popularity, their servers would be geographically spread out in multiple locations.
  Though to do that adds a level of complexity and costs that even many for-profit ventures, such as Slashdot, likely can't afford / justify; Slashdot's servers are in one place - Chicago ... to digress a bit, I notice this site's accessibility (ie. more page not found / timeouts lately) has been spotty since the servers move.
  Ron
  
  Parent Share
  twitter facebook
  - Re:Note to self (Score:5, Informative)
    
    by OverlordQ ( 264228 ) writes: on Tuesday June 24, 2008 @02:22PM (#23921791) Journal
    
    They're not all in Tampa, they have a bunch in Netherlands and a few more in South Korea.
    
    Parent Share
    twitter facebook
    - - Re: (Score:3, Funny)
        
        by Rick Genter ( 315800 ) writes:
        
        an invading country which the one superpower of the world already hates?
        China hates North Korea?
  - Re: (Score:2)
    
    by LWATCDR ( 28044 ) writes:
    
    Tampa hasn't been hit by many Hurricanes. They don't have issues with flooding that I know about and lightning is lightning. It can happen anywhere just do your best to protect your systems from it.
    If you are a few miles inland in Florida Hurricanes are not that big of an issue. If you have a good backup generator then it isn't that big of a problem.
    Oh did I mention I was born, live, and work in Florida. My office was hit by Frances, Jean, and Wilma. Total damage to the office... Nothing. Total Damage to my
  - Re: (Score:2)
    
    by skeeto ( 1138903 ) writes:
    
    Tampa is pretty safe from all that. I have grandparents that live in St. Petersburg (right next to Tampa) and they have never had any damage or been in danger from the weather. If Tampa had major flooding, then pretty much the whole state of Florida will be submerged too. At that point Wikipedia is low on the list of things to worry about.
  - Re: (Score:2)
    
    by colfer ( 619105 ) writes:
    
    FutureQuest is a highly rated web host with its data center in Orlando, FL. It has never gone down, even in hurricanes. Very occasionally the network connects or upstreams fritz, but not due to storms (usually it's BGP, etc.).
    If you recall there was some heroic blogging out of New Orleans after Katrina. Some guys at an ISP in a tall building downtown kept themselves wired, and described hard core telecom types patrolling the streets. Surreal.
More importantly (Score:5, Interesting)

by wolf12886 ( 1206182 ) writes: on Tuesday June 24, 2008 @01:36PM (#23920755)

I don't care how few servers they have, whats more interesting to me is that they run an ultra-high traffic site, which they aren't having trouble paying for, and do it without adds.

Share
twitter facebook
- Simplicity (Score:5, Interesting)
  
  by wsanders ( 114993 ) writes: on Tuesday June 24, 2008 @02:01PM (#23921373) Homepage
  
  Although much of the Mediawiki software is a hideous twitching blob of PHP Hell, the base functionality is fairly simple and run perpetually and scale massively as long as you don't mess with it.
  What spoils a lot of projects like this is the constant need for customization. Wikimedia essentially can't be customized (except for plugins obviously, which you install at your own peril) and that is a big reason why it scales so massively.
  As for Wikipedia itself, I suspect it is massively weighted in favor of reads. That simplifies circumstances a lot.
  
  Parent Share
  twitter facebook
- Sure they do it without ads... (Score:4, Informative)
  
  by DerekLyons ( 302214 ) writes: <fairwater@@@gmail...com> on Tuesday June 24, 2008 @02:20PM (#23921751) Homepage
  
  Sure they do without ad income. But they also do it without having to pay salaries, or co location fees, or bandwidth costs... (I know they pay some of those, but they also get a metric buttload of contributions in kind.)
  When your costs are lower, and your standard of service (and content) malleable, it is easy to live on a smaller income.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by quanticle ( 843097 ) writes:
    
    But they also do it without having to pay salaries, co location fees, or bandwidth costs...
    Well, as far as salaries go, yeah, they don't have to pay for a full team of developers and administrators for the business, but they do need to pay people to go and check on the servers, replace faulty hardware, etc. Also, as far a co-location costs go, I'd say that running your own data center (i.e. providing your own electricity, cooling, backup power supplies, etc.) can't be cheap either.
- That's easier than it sounds (Score:3, Funny)
  
  by Cajun Hell ( 725246 ) writes:
  
  I don't care how few servers they have, whats more interesting to me is that they run an ultra-high traffic site, which they aren't having trouble paying for, and do it without adds.
  I can do that too; I just emulate the adds. x+y is the same as x-(0-y). You have to be careful to use signed numbers for everything (or else have a lot of casting), but that's not really all that hard.
Out like a light (Score:2)

by Joebert ( 946227 ) writes:

300 servers housed in a single data center in Tampa, Fla.

Did Wikipedia go down when hurricanes Chralie/etc came through a few years ago ?
I lost power for about a week when that happened and I only live about 15 miles from Tampa, right over the Courtney Campbel Causeway actually.
- Re: (Score:2, Informative)
  
  by timstarling ( 736114 ) writes:
  
  We've never lost external power while we've been at Tampa, but if we did, there are diesel generators. Not that it would be a big deal if we lost power for a day or two. There's no serious problem as long as there's no physical damage to the servers, which we're assured is essentially impossible even with a direct hurricane strike, since the building is well above sea-level and there are no external windows.
  - Re: (Score:2)
    
    by Joebert ( 946227 ) writes:
    
    Well then, I guess we all know where I'm going next time a hurricane rolls through. :)
Off-topic, I know, but...what about /.'s hardware? (Score:5, Interesting)

by kiwimate ( 458274 ) writes: on Tuesday June 24, 2008 @01:44PM (#23920963) Journal

I.e. the promised follow-up to this story [slashdot.org] about moving to the new Chicago datacenter? You know, the one where Mr. Taco promised a follow-up story "in a few days" about the "ridiculously overpowered new hardware".
I was quite looking forward to that, but it never eventuated, unless I missed it. It's certainly not filed under Topics->Slashdot.

Share
twitter facebook
Works great because it's not "Web 2.0" (Score:5, Insightful)

by Animats ( 122034 ) writes: on Tuesday June 24, 2008 @01:45PM (#23921009) Homepage

Most of Wikipedia is a collection of static pages. Most users of Wikipedia are just reading the latest version of an article, to which they were taken by a non-Wikipedia search engine. So all Wikipedia has to do for them is serve a static page. No database work or page generation is required.
Older revisions of pages come from the database, as do the versions one sees during editing and previewing, the history information, and such. Those operations involve the MySQL databases. There are only about 10-20 updates per second taking place in the editing end of the system. When a page is updated, static copies are propagated out to the static page servers after a few tens of seconds.
Article editing is a check-out/check in system. When you start editing a page, you get a version token, and when you update the page, the token has to match the latest revision or you get an edit conflict. It's all standard form requests; there's no need for frantic XMLHttpRequest processing while you're working on a page.
Because there are no ads, there's no overhead associated with inserting variable ad info into the pages. No need for ad rotators, ad trackers, "beacons" or similar overhead.

Share
twitter facebook
- Re: (Score:2)
  
  by internic ( 453511 ) writes:
  
  Oh really? Because O'Reill seems to think it is [oreillynet.com], and I thought he was the main pusher of this terminology. Is the term Web 2.0 actually meaningful?
  - Re: (Score:2, Informative)
    
    by Tweenk ( 1274968 ) writes:
    
    If you haven't noticed, "Web 2.0" is a long estabilished buzzword [wikipedia.org] - which means it carries little meaning, but it looks good in advertising. Just like "information superhighway", "enterprise feature" or "user friendly".
- so what's "Web 2.0"? (Score:2)
  
  by gbjbaanb ( 229885 ) writes:
  
  I take it that "Works great because it's not "Web 2.0" " means its fast and dynamic, whereas Web 2.0 generally means slow and dynamic.
  The technology behind it is irrelevant, if content is provided by users then its web 2.0 (as I understan the term), so Wikipedia definitely is web 2.0, its just that they have some fancy caching mechanism to get the best of both worlds. If only more systems were built in a pragmatic way instead of worrying about what its "supposed" to be.
  - Re: (Score:2)
    
    by quanticle ( 843097 ) writes:
    
    I take it that "Works great because it's not "Web 2.0" means that its fast and dynamic, whereas Web 2.0 generally means slow and dynamic.
    Web 2.0 is a shorthand version of saying "dynamic pages served using Asynchronous JavaScript and XML (AJAX)". Now, if you reread the parent, you'll see that he says:
    Most of Wikipedia is a collection of static [emphasis mine] pages. Most users of Wikipedia are just reading the latest version of an article... So all Wikipedia has to do for them is serve a static page.
    In other words, the parent is saying that Wikipedia is effective because avoids any sort of dynamism for the majority of use cases. Heck, even article editing isn't dynamic on Wikipedia. When you click the edit link, you're taken to a separate page which has a prepopulated form with the wikitext of the article. The only bit of dynamic co
- Nonsense. Wikipedia is THE web 2.0 (Score:5, Insightful)
  
  by Nicolas MONNET ( 4727 ) writes: <nicoaltiva@gm a i l.com> on Tuesday June 24, 2008 @02:23PM (#23921799) Journal
  
  Web 2.0 is not just about flashy Ajax or what not, it's about user generated dynamic content. WP's "everything is a wiki" architecture might /look/ a bit archaic compared to fancy schmancy dynamic rotating animated gradient-filled forums, but it's much more powerful.
  Moreover, WP is not a collection of static pages, if you're logged in at least, every pages is dynamically generated, and every page's history is updated within a few seconds.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by quanticle ( 843097 ) writes:
    
    Moreover, WP is not a collection of static pages, if you're logged in at least, every pages is dynamically generated, and every page's history is updated within a few seconds.
    That's not how it works. If you're just browsing Wikipedia, you're just looking at a collection of static pages that were generated earlier and cached. Only when you actually edit the page and save it is the page updated.
    If Wikipedia had to freshly create every page for every user, even computational power on the order possessed by Google wouldn't be up to the task.
Confused by the title (Score:5, Insightful)

by Just Some Guy ( 3352 ) writes: <kirk+slashdot@strauser.com> on Tuesday June 24, 2008 @01:52PM (#23921179) Homepage Journal

What does "Non-Profit Budget" mean, anyway? There are non-profits bigger than the company I work for. Non-profit isn't the same as poorly financed.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by quanticle ( 843097 ) writes:
  
  Good point. Perfect example: the Bill and Melinda Gates Foundation has a budget of billions of dollars, easily exceeding the budget of many private corporations.
Link to wikipedia? (Score:5, Funny)

by Luyseyal ( 3154 ) writes: <swaters@NoSpAM.luy.info> on Tuesday June 24, 2008 @01:54PM (#23921229) Homepage

The summary was wrong to include a link to the Wikipedia homepage without a Wikipedia link about Wikipedia [wikipedia.org] in case you don't know what Wikipedia is. I myself had to Google Wikipedia to find out what Wikipedia was so I am providing the Wikipedia link about Wikipedia in case others were likewise in the dark regarding Wikipedia.
-l
P.s., Wikipedia.

Share
twitter facebook
- Re:Link to wikipedia? (Score:5, Funny)
  
  by hansamurai ( 907719 ) writes: <hansamurai@gmail.com> on Tuesday June 24, 2008 @02:53PM (#23922305) Homepage Journal
  
  Wait, what's this Google thing you're talking about?
  
  Parent Share
  twitter facebook
  - Re:Link to wikipedia? (Score:5, Funny)
    
    by hansamurai ( 907719 ) writes: <hansamurai@gmail.com> on Tuesday June 24, 2008 @02:56PM (#23922359) Homepage Journal
    
    Nevermind, found it:
    http://www.google.com/search?q=google [google.com]
    
    Parent Share
    twitter facebook
  - Re: (Score:2, Funny)
    
    by felipekk ( 1007591 ) writes:
    
    http://en.wikipedia.org/wiki/Google [wikipedia.org]
- Re: (Score:2, Funny)
  
  by srollyson ( 1184197 ) writes:
  
  [citation needed]
Distributed computing? (Score:2)

by Bombula ( 670389 ) writes:

I'm kind of surprised there's not been more talk about a distributed computing effort for wikipedia. Seems like it would be a good candidate. I'm more of an honorary geek than an actual hardcore tech-savvy person - does anyone know if a distributed computing effort could work? I don't really see any problem with data integrity, since it's not confidential and is open to editing by definition (except maybe user info?), so it'd basically be a big assymetric RAID, right? I would worry more about it having f
Servers and locations (Score:2, Informative)

by Anonymous Coward writes:

According to http://meta.wikimedia.org/wiki/Wikimedia_servers [wikimedia.org] Wikimedia (and by extension, Wikipedia):
"About 300 machines in Florida, 26 in Amsterdam, 23 in Yahoo!'s Korean hosting facility."
also: http://meta.wikimedia.org/wiki/Wikimedia_partners_and_hosts [wikimedia.org]
Obviously if you're not in Silicon Valley (Score:2)

by heroine ( 1220 ) writes:

Obviously U can pay much less outside Silicon Valley. If you want investment capital & lots of customers you have to be physically in Silicon Valley and pay the millions of dollars. Even Kiwipedia had to move its office to San Francisco & the data center is going to follow if they can get enough donations.
What about the Internet Archive (Score:5, Informative)

by Xtifr ( 1323 ) writes: on Tuesday June 24, 2008 @03:51PM (#23923321) Homepage

Wikipedia's pretty impressive, but how about the Internet Archive [archive.org]? Also a non-profit that doesn't run ads, and not only do they, like Google and Yahoo, "download the Internet" on a regular basis, but the Archive makes backups! Plus, they have huge amounts of streaming audio and video (pd or creative-commons). The first time I ever heard the word "Petabyte" being discussed in practical, real world terms (as in, "we're taking delivery next month") was in connection with the Internet Archive. Several years ago. And it was being used in the plural! :)
They may not have as much incoming traffic as Wikipedia, but the sheer volume of data they manage is truly staggering. (Heck, they have multiple copies of Wikipedia!) When I do download something from there, it's typically in the 80-150 MB range, and 1 or 2 GB in a pop isn't unusual, and I know I'm not the only one downloading, so their bandwidth bills must still be pretty impressive.
The fact that these two sites manage to survive and thrive the way they do never ceases to amaze me.

Share
twitter facebook
Where are the PHP/MySQL doom criers? (Score:3, Insightful)

by trawg ( 308495 ) writes: on Tuesday June 24, 2008 @09:48PM (#23927599) Homepage

I notice they are conspicuously absent in the comments. They tend to jump up and down in any other post about PHP and MySQL. This is such a great example of the scalability and performance of it WHEN USED CORRECTLY.

Share
twitter facebook
- Re: (Score:2)
  
  by Itninja ( 937614 ) writes:
  
  you need to focus on a handful highly-talented IT people rather than an army of droids
  As long as these IT people are willing to work well below the industry pay-scale (often for free), then yeah, that would work great. Notice that most of the Wiki IT staff also have to have 'day jobs' to feed/clothe/house themselves.
  - Re: (Score:3, Insightful)
    
    by bsDaemon ( 87307 ) writes:
    
    Which is somehow different from any other open source project how?
    - Re: (Score:2)
      
      by Itninja ( 937614 ) writes:
      
      It's not. But the parent was implying that corporation should follow the same model. I was just pointing out that for-profit companies need to pay their people a bit more than non-profit love-in projects like Wikipedia.
- Re:Some thoughts (Score:5, Insightful)
  
  by TheLazySci-FiAuthor ( 1089561 ) writes: <thelazyscifiauthor@gmail.com> on Tuesday June 24, 2008 @01:35PM (#23920727) Homepage Journal
  
  "... you need to focus on a handful highly-talented IT people rather than an army of droids."
  This is so true; I've always said, "you get what you pay for."
  Do you want to pay for software, or do you want to pay for people?
  Only one can create the other.
  
  Parent Share
  twitter facebook
  - Re:Some thoughts (Score:5, Funny)
    
    by morgan_greywolf ( 835522 ) * writes: on Tuesday June 24, 2008 @01:44PM (#23920983) Homepage Journal
    
    Do you want to pay for software, or do you want to pay for people?
    Only one can create the other.
    Oh, gods, let's hope so!
    
    Parent Share
    twitter facebook
- Re:What is the role of Open Source (Score:5, Interesting)
  
  by KokorHekkus ( 986906 ) writes: on Tuesday June 24, 2008 @01:46PM (#23921027)
  
  The wiki software, MediaWiki, was written for Wikipedia and is licensed under the GPL ( http://www.mediawiki.org/wiki/How_does_MediaWiki_work%3F [mediawiki.org]. According to Wikipedia they use MySQL as their database and run it all on Linux servers.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Insightful)
  
  by guruevi ( 827432 ) writes:
  
  I don't know what else but open source you could use especially on the database side. You have only a few choices:
  Microsoft ($$$) (approx. $50,000 per server per year in licensing costs since it's a public (unlimited CAL) enterprise-level site)
  IBM ($$) (approx. $500,000 per year for leasing the whole operation, another load for support)
  Oracle ($) (approx. $20,000 per backend and about 30 contractors for the next 5 years for the implementation)
  Linux, MySQL, PHP (Free)
  Not to mention, with Microsoft you'll nee
- Re: (Score:2)
  
  by Carnildo ( 712617 ) writes:
  
  I wonder how much of a role open source software is playing in Wikipedia's operations. How much is it? Anyone in the know?
  I'm not aware of any software that Wikipedia uses that isn't open-source. They've got a very strong commitment to the free-content movement -- sometimes a little too strong: the only sound format they accept is Ogg Vorbis, the only video format Ogg Theora
- Re: (Score:2, Funny)
  
  by nickull ( 943338 ) writes:
  
  Not to mention hurricanes and faulty electronic voting machines.... ;-)
- Re: (Score:2, Informative)
  
  by midom ( 535130 ) writes:
  
  add power costs, difficulty to travel to, possible flooding, etc. it is all historic reasons, we can't just migrate datacenters at wish - that requires quite a high investment. and the datacenter choice was simply because the founder lived there in 2001, when all we needed was single server. --Domas
- Re:What amazes me... (Score:5, Interesting)
  
  by ceejayoz ( 567949 ) writes: <cj@ceejayoz.com> on Tuesday June 24, 2008 @02:08PM (#23921533) Homepage Journal
  
  Slashdot is great at taking down sites on crappy shared hosting, but anything with a decently configured dedicated server will likely survive just fine.
  Wikipedia's probably getting hit with hundreds of times the traffic Slashdot is at all times.
  
  Parent Share
  twitter facebook
  - Re:What amazes me... (Score:4, Insightful)
    
    by dubl-u ( 51156 ) * writes: <2523987012@pota . t o> on Tuesday June 24, 2008 @03:38PM (#23923113)
    
    Slashdot is great at taking down sites on crappy shared hosting, but anything with a decently configured dedicated server will likely survive just fine.
    Sounds right to me. I don't have any terribly recent data on a slashdotting, but I think the Slashdot-as-server-killer meme is pretty stale.
    Looking at some old data and extrapolating, I'd guess a modern slashdotting would peak at 200 pageviews/min, or ~3 pv/sec. Get mentioned on Good Morning America or Oprah, on the other hand, and you're looking at 20-200 pageviews/sec. I'd guess that getting on Digg's front page is somewhere in the 20-40 pv/sec range.
    A slashdotting was a big deal back when every nerd used it and the Internet was mainly nerds. Neither is true anymore.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by quanticle ( 843097 ) writes:
  
  To be quite honest, I'd say that the Slashdot surge is probably a drop in the bucket as far as Wikipedia is concerned. I mean, they're the top result for loads of Google queries, and plenty of people go straight to Wikipedia when they need to look something up.
- - Re: (Score:2)
    
    by Doug Neal ( 195160 ) writes:
    
    Correct [alexa.com]
    - Re: (Score:3, Insightful)
      
      by HarvardAce ( 771954 ) writes:
      
      (link to Alexa graph)
      One problem about Alexa is that it only gathers statistics from those who install the Alexa toolbar...I would tend to think that the Slashdot crowd would be a group that predominantly avoids installing that sort of thing. I actually think there was a discussion on this on Slashdot many months ago.
      
      That said, I'm sure that the traffic to Wikipedia is probably several orders of magnitude higher than that of Slashdot.
- Re: (Score:2)
  
  by IamTheRealMike ( 537420 ) * writes:
  
  Lots of ISPs run transparent caching proxy servers so wikipedia could be cached if they wanted. They set their headers to prevent that though, presumably so changes show up immediately.
  - - Re: (Score:3, Informative)
      
      by adri ( 173121 ) writes:
      
      It exists. Its called "validators". There are strong and weak validators. You can Vary on your validators, and thus have multiple copies of the same object but in different forms (so given a text document, you can have it in different languages, compressed/uncompressed, etc.)
      Your browser will then quite happily ask the origin server (which may not be the "origin" origin) for an object and provide validators. (Last-Modified -> If-Modified-Since; ETag->If-None-Match) which the origin (or the cache which

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Impressive (Score:5, Insightful)

Re:Impressive (Score:5, Funny)

Wikipedia = much more traffic than slashdot (Score:5, Interesting)

Re:Wikipedia = much more traffic than slashdot (Score:5, Funny)

Re:Wikipedia = much more traffic than slashdot (Score:5, Funny)

Re:Wikipedia = much more traffic than slashdot (Score:4, Funny)

Re:Impressive (Score:4, Interesting)

Re:Impressive (Score:5, Informative)

Re:Impressive (Score:5, Informative)

Re:Impressive (Score:4, Interesting)

Re:Impressive (Score:5, Informative)

Re: (Score:3, Interesting)

Re:Impressive (Score:5, Informative)

Re: (Score:3, Insightful)

I've always wondered... (Score:4, Insightful)

Re:I've always wondered... (Score:5, Funny)

Re:I've always wondered... (Score:5, Informative)

It's easy... (Score:2)

Re:It's easy... (Score:4, Insightful)

The power of low standards (Score:5, Insightful)

Re:The power of low standards (Score:5, Insightful)

Re:The power of low standards (Score:5, Insightful)

Re:The power of low standards (Score:5, Interesting)

Re:The power of low standards (Score:5, Informative)

Re: (Score:2)

Re:The power of low standards (Score:4, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Re:The power of low standards (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:2, Interesting)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3, Interesting)

I was just thinking that (Score:3, Funny)

Re: (Score:2)

Re:I was just thinking that (Score:5, Interesting)

Re:I was just thinking that (Score:4, Interesting)

Re: (Score:3, Interesting)

Re:I was just thinking that (Score:5, Insightful)

Re: (Score:2)

Easy to Increase the budget or add servers (Score:5, Funny)

Re:Easy to Increase the budget or add servers (Score:5, Funny)

Re: (Score:2)

Maybe... (Score:3, Funny)

Note to self (Score:5, Funny)

Re:Note to self (Score:5, Funny)

Re: (Score:3, Funny)

Re:Note to self (Score:5, Interesting)

Re:Note to self (Score:5, Informative)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

More importantly (Score:5, Interesting)

Simplicity (Score:5, Interesting)

Sure they do it without ads... (Score:4, Informative)

Re: (Score:2)

That's easier than it sounds (Score:3, Funny)

Out like a light (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Off-topic, I know, but...what about /.'s hardware? (Score:5, Interesting)

Works great because it's not "Web 2.0" (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2, Informative)

so what's "Web 2.0"? (Score:2)

Re: (Score:2)

Nonsense. Wikipedia is THE web 2.0 (Score:5, Insightful)

Re: (Score:2)

Confused by the title (Score:5, Insightful)

Re: (Score:3, Interesting)

Link to wikipedia? (Score:5, Funny)

Re:Link to wikipedia? (Score:5, Funny)

Re:Link to wikipedia? (Score:5, Funny)

Re: (Score:2, Funny)

Re: (Score:2, Funny)

Distributed computing? (Score:2)

Servers and locations (Score:2, Informative)