Forgot your password?
typodupeerror
Databases Programming Software IT

PostgreSQL on Big Sites? 89

Posted by Cliff
from the a-little-dog-on-big-sites dept.
An anonymous reader asks: "I've been using PostgreSQL for years on small projects, and I have an opportunity to migrate my company's websites from Oracle to an open-source alternative. It would be good to be able to show the PHBs that PostgreSQL is a viable candidate, but I'm unable to find a list of high-traffic sites that use it. Does anyone know of any popular sites that run PostgreSQL?"
This discussion has been archived. No new comments can be posted.

PostgreSQL on Big Sites?

Comments Filter:
  • Several examples (Score:5, Informative)

    by IO ERROR (128968) * <error&ioerror,us> on Monday March 21, 2005 @12:15PM (#12000720) Homepage Journal
    See, for instance, PostgreSQL Case Studies [postgresql.org] and from the pgsql-advocacy [postgresql.org] mailing list comes some more: Finally, a list of *big* companies using PostgreSQL for *serious* projects. Why use PostgreSQL? Here's why [postgresql.org] for some examples.
    • Finally, a list of *big* companies using PostgreSQL for *serious* projects.

      I genuinely like PostgreSQL and have used it extensively. I want to see a list of big name users as much as anyone. But that "list of *big* companies" is topped by:

      * Affymetrix
      * Afilias
      * BASF
      * Cognitivity
      * Journyx
      * Royal
      * The American Chemical Society
      * Tsutaya

      With the possible exception of BASF, these aren't exactly household names. I have no doubt that these are huge organizations with serious database needs, but what Postgr
      • Re:Several examples (Score:1, Informative)

        by Anonymous Coward
        Northrop Grumman, US Navy:

        Uses a J2EE application to store user profiles for the Navy Enterprise Portal and the Fleet Numerical Meteorology and Oceanography Center(FNMOC) Portal using JBoss application server and PostgreSQL database. Expose application via SOAP and RMI interfaces.

        http://kennethbowen.com/kbresume.html

  • The .org registry? (Score:5, Informative)

    by tzanger (1575) on Monday March 21, 2005 @12:17PM (#12000752) Homepage

    Story [computerworld.com.au]

    • by jadavis (473492)
      and the .info registry

      Both are run by Afilias, which is a big user and big developer of PostgreSQL. They're the ones that did the work on the Slony-I [slony.info] replication server.
  • by oreaq (817314) on Monday March 21, 2005 @12:25PM (#12000869)
    MadPenguin has an interview [madpenguin.org] with Josh Berkus, one of the core team members of PostgreSQL.
    • I want to draw attention to this interview. This is a great interview that has good answers to the original question.

      In particular, Josh talks about Fujitsu's involvement (a $43 billion company).

      A good quote:

      Josh Berkus: Basically, the head of their open source business applications division, Mr. Takayuki Nakazawa, said, "We are committed to helping make PostgreSQL the leading database management system."
  • by Naikrovek (667) <jjohnson@nospaM.psg.com> on Monday March 21, 2005 @12:27PM (#12000904)
    I've never used PostgreSQL so I can't and won't say anything about it other than this: Make sure Postgres does everything you need and can perform similarly to Oracle in your environment.

    We momentarily thought about dropping Oracle for PGSQL at my last company, but after we hired a consultant to do everything he could with Postgres to improve performance, Oracle was still a clear winner for us.

    I don't know if he was incompetent or what, but the performance numbers weren't even close with what we needed it to do.

    If your database will run just as well on PostgreSQL, I say go for it. If you go with PostgreSQL and it doesn't perform as well as Oracle in your environment, your management will have serious doubts about open-source software from then on, and that's a stain that is hard to get rid of.

    in short: choose based on your needs, not based on the fact that one is open and the other isn't.
    • PGSQL (Score:3, Informative)

      by hey! (33014)
      There's no reason however to write all your SPs in PLSQL. Oracle supports stored procedures in Java, as does Postgres.

      This not only makes it easier in some instances to migrate some applications to PGSQL, it also improves performance (JIT compiling). You don't say exactly where the performance bottlenecks are, but this could improve performance and close the gap between PGSQL and Oracle.

      That said, if you've been working for years on tuning your Oracle physical design to a fare-thee-well, it's going to
    • Your post was going along pretty well until your conclusion. It's reasonable to make sure any program does what you need it to do before you switch to using it exclusively. However, one has to apply that advice consistently, not just in a manner that stumps for proprietary software.

      However, it's naive and wrong to say that being free software or open source cannot be one of one's needs ("choose based on your needs, not based on the fact that one is open and the other isn't.").
    • I forgot to mention this in my previous follow-up: your description of why you rejected PostgreSQL lacked many salient details including: any detail on what you organization used a database for, how Oracle was able to cater to that need better than PostgreSQL, why performance matters so much as to outweigh placing your client's data into a proprprietary program, how performance was being measured, and what the figures for performance were.

      It's not possible to draw insight from such a description, hence I q
  • Need more info (Score:3, Informative)

    by The Slashdolt (518657) on Monday March 21, 2005 @12:30PM (#12000970) Homepage
    Is your companies website essentially read-only page loading? If so, why not just go with MySQL. Do you really need MVCC in a read-only scenario?

    On the other hand, If your company is doing transaction processing, like a customer facing product ordering system (think amazon), its a lot more than just having to sustain certain volumes. The reputation of your company and its ability to make money by selling products will rely entirely on your database. In a best case scenario there may be no difference between oracle and postgres. But imagine the worst case scenario. Peak volume, company is making $1M/hour in sales on the web, db dies and won't come up....who you gonna call?

    There's more to the equation than up front cost and ability to handle volumes....

    • Re:Need more info (Score:5, Insightful)

      by snorklewacker (836663) on Monday March 21, 2005 @12:41PM (#12001153)
      > Is your companies website essentially read-only page loading? If so, why not just go with MySQL.

      MyISAM can't handle a database of larger than 2 gigs. Once you switch to another table backend, MySQL's vaunted performance advantage pretty much evaporates.

      > Peak volume, company is making $1M/hour in sales on the web, db dies and won't come up....who you gonna call?

      My DBA, assuming I'm running point-in-time recovery. That's all Oracle is going to tell you to do. The unemployment office if I'm not. Although PITR in pgsql is something of a PITA [postgresql.org], which just might go to recommend Oracle for the time being.
      • Re:Need more info (Score:3, Interesting)

        by The Slashdolt (518657)
        Assuming the relational database is storing relational data, its hard to imagine a 2G database needed to store read only pages. Of course, some people store pdf's, word docs, etc into relational db's as clobs. This is a complete waste of resources. I would use MySQL in a read-only scenario where transactions are unneeded.

        I am also assuming that the guy who is posing this question IS the DBA. At least I sure hope so, for whoever is the DBA's sake. Your scenario is a best case recovery scenario using poin
        • Re:Need more info (Score:4, Insightful)

          by snorklewacker (836663) on Monday March 21, 2005 @01:43PM (#12002110)
          > What if its the point-in-time recovery that is broken/buggy? As a DBA, who do you want to deal with?

          If I'm doing a million bucks an hour, I damn well had better be running a replica, so let's add that to the solution menu too. pgsql's replication ain't terrific either. Works, but not too flexible. Score another for Oracle.

          Anyway, if Oracle's PITR is broken/buggy, you are screwed screwed screwed. First, let's forget the fanciful notion that you can sue them. Now you're part of the support machine, the wheels of which grind exceedingly slowly and roughly.

          I don't often like to plug source access because it's extremely overrated, but as a last resort, if you can instrument your database startup with a debugger and trace the point of failure, you now have an advantage FAR greater than that Oracle is going to give you once while your trouble ticket clears through the dozen support techs who repeat the same useless advice and tie up your time.

          I also don't like to sling the term "FUD" around, because it's so often this shibboleth of the open source crowd, anything they disagree with, but what Oracle employs against solutions like PostgreSQL is often pure FUD. "Who you gonna call? Who's behind your data? What will you do WHEN it breaks? Scary scary scary, you just don't knooooowwww!!" I could probably turn around to an Oracle rep and say "right, that's about the same sort of feeling I get when dealing with YOUR support organization as well."

          If I'm doing a million bucks an hour, I'm probably picking Oracle too, because it's had more years to shake out PITR, hot backup, and clustering than pgsql has, so there's more of a body of knowledge accumulated on it. I just don't like the climate of fear going around when there's plenty of Oracle disasters to look at and learn from as well.

          • Given enough redundancy, you could make MS Access work ;-) kidding kidding kidding

            It's all going to depend on your SLA, with Oracle or whoever. If you want/need to have a 4 hour response you can get that if you've got the money. I assume you could get this level of support from a Postgres related company as well, but based on the number of employees those companies have, I'd say its not typical.

            Back to my original point, if your company is in the scenario where your business critical revenue generating
          • I don't often like to plug source access because it's extremely overrated, but as a last resort, if you can instrument your database startup with a debugger and trace the point of failure, you now have an advantage FAR greater than that Oracle is going to give you once while your trouble ticket clears through the dozen support techs who repeat the same useless advice and tie up your time.

            I agree with nearly all of your post, though I do think the issue of support is an important one to consider, but I hav
        • Re:Need more info (Score:1, Interesting)

          by Anonymous Coward
          Assuming the relational database is storing relational data, its hard to imagine a 2G database needed to store read only pages

          Not that imaginative, are you.

          A geospatial database that holds merely lat/long & address info (street names, city&state codes, zips, address ranges) and related tables containing information about demographics, etc can easily get into the 90GB range. One I used for analyzing targeted marketing campaigns was about 270GB.

        • Your scenario is a best case recovery scenario using point-in-time recovery. What if its the point-in-time recovery that is broken/buggy? As a DBA, who do you want to deal with? Who are you gonna call?

          Appearently, if you are a a major national bank [theregister.co.uk], you eventually give up [danskebank.com] on your vendor's support, and just fix it yourself.

      • And you are incorrect. The 2GB limit is on file size. Either the table data or the table index. The entire database can me much bigger.

        If you do have a table using up 2Gb in data, it's probably a good idea to optimize it a little and pull out some of that data with a table_detail.

        • Oh, sorry to correct myself, but the 2Gb file size limit is only with some operating systems. The default settings allow for a 4Gb table data file, and it can be increased if needed.
          • it can be increased if needed.

            By configuration or recompilation? Do you need to dump/reload or will it work with the old data files? What OSes and filesystems have what limitations?
            • If you're running tables that big and don't reading the MySQL manual or don't know how to google "myisam file size limit", you have bigger problems.

              If the OS supports bigger files, you can increase the 4Gb limit with a SQL alter table. Just increase the maximum number of rows.

              I'm not an expert on OS's and file sizes.. but if your needing greater than 2Gb files, with any database, I would think that your OS would need to support it.

              • If you're running tables that big and don't reading the MySQL manual or don't know how to google "myisam file size limit", you have bigger problems.

                I had a legitimate question; that kind of response was unecessary. I am a PostgreSQL user, but I try to keep knowledge of other databases handy so I can avoid pitfalls if I need to use another DB.

                2GB isn't a big table really. What is that, like $2 worth of disk? If I had set up an archive or log or something a year ago, it would probably have bitten me by now
                • Take a look at this, for answers to your questions. I do believe the filesystem limitations of most common platforms upon which MySQL runs are listed...

                  http://dev.mysql.com/doc/mysql/en/table-size.htm l

                  Note it's a bit dated (though that's the current reference manual), but I still suspect most of that information is accurate.

                  hth
      • Although PITR in pgsql is something of a PITA

        That's what I thought at first, but after I learned how to do it I though to myself: "Can I come up with an easier way to administer PITR?" and I couldn't think of anything. PITR is a complicated concept (time warping, multiple timelines; it starts to get a little weird), and I'm impressed that they are supporting the feature. If you can think of an easier way to administer it, let the lists know, and I wouldn't be surprised if some tools appeared.
      • Re:Need more info (Score:4, Informative)

        by yamla (136560) <chris@@@hypocrite...org> on Monday March 21, 2005 @02:22PM (#12002598)
        FUD. MyISAM certainly can handle a database of larger than 2 gigs. It can even handle _tables_ larger than 2 gigs. "As you can see, MySQL remembers the create options exactly as specified. And it chose a representation capable of holding 91 GB of data." (p.38, High Performance MySQL: Optimization [sic], Backups, Replication & Load Balancing, by Jeremy D. Zawodny & Derek J. Balling, published by O'Reillly, April 2004.
        • 1) I stand corrected. I was reading old documentation, it applied to table size, and it's probably tied to word size on the machine.

          2) "FUD" is not a synonym for "wrong".

          3) I have a raft of other reasons MySQL is inadequate, including data integrity ones.

          • My apologies, I figured you were just posting misinformation deliberately. Looks like it was an honest mistake.

            I, too, have a host of reasons to dislike MySQL. Unfortunately, I cannot switch to PostGreSQL because MySQL supports master-master replication (albeit not well) but PostGreSQL does not.
            • Replication is a complicated issue. I would highly recommend that you actually test the replication to make sure it's doing exactly what you think it is.

              Also, what about PgPool? That is a popular form of master-master replication for postgresql. It's not the be-all-end-all, but no single replication system is right for all situations. That's why PostgreSQL has so many replication options.

              I would be very interested to read a case study of your master-master replication usage in MySQL. I understand that the
              • And when a marketroid starts talking as if their solution is the be-all-end-all without examining your details specifically, you know they're BSing you and you have no reason to think that you're safe.

                BSing, that refers to Business Speak, does it? :-P

              • I tend to think of the PostgreSQL replication problem the same way people approach any problem: None of the solutions are endorsed as the "official" answer to the problems (because there is no absolute authority on these issues.) All have their shortcomings. All have their benefits. It's up to you to decide which combination of problems and benefits you want.

                PostgreSQL, like Linux, is more like an ecosystem of software, where you can go and pick and choose or even write your own stuff. It's not as diverse
              • I have of course tested the replication in MySQL to make sure it is doing what I think it is. Working with MySQL's master-master replication involved designing my application from the ground up to support multipart keys, so this solution is a LONG way away from just dropping in the replication and expecting it to work. It does have some advantages such as not requiring a hundred percent reliable network connection between the two MySQL servers. It can survive brief outages without issue.

                There are signif
                • Interesting. My MySQL knowledge leaves something to be desired. I looked briefly in the documentation and most of what I found was master-slave replication. Could you point me to a master-master doc?

                  PgPool is fairly primitive, it's just query based replication. So your application definitly needs to account for that and it isn't perfect for all situations. I mentioned it because it's the only master-master replication software that I've used (and I haven't used it except to mess around). It isn't nearly as
        • Hear, hear; FUD. We have MyISAM tables larger than 2G in size operating fine; and they are very quick to access in a mostly read-only pattern, as expected with MySQL.

          The MySQL vs. PG vs. INSERT-FAVORITE-DB-HERE debate is tiring enough without the FUD being so readily cast about, and when the FUD doesn't change over the years it's even more so. Though it does make responses rather simple:

          <fx: pastes standard response>
          Use what's appropriate to your needs. Don't jump in head-first without unders

      • Incorrect about MyISAM. First, that limitation was tables larger than 2GB, and it's been gone for some time. In dealing with support for Bacula, I've heard reports of 34GB bacula catalogs in MySQL.
      • Bullshit.

        We [skyblog.com] use MyISAM databases that are over 20 gigs with no issue so far (except myisamchk time...).
    • First off, I think he's less worried about PostgreSQL's ability and more worried about making his boss comfortable with the idea. Others have already pointed out some case studies, so that's a good start.

      To say that you're jumping the gun in your post would be an understatement. Read only databases aren't hard; there are many ways to accomplish that and with MySQL you would still have to convince your boss to use F/OSS. I hardly see how that answers his original question, it seems more like you're trying t
      • Re:Need more info (Score:3, Interesting)

        by The Slashdolt (518657)
        That wasn't my point. My point was that given the current information his question cannot be answered. I then gave two examples. If his company wants a database for a scenario that is a read-only webpage scenario, it doesn't really matter if you use an open source db or a commercial one. Whatever works best and cheapest.

        On the other hand, if his companies business is reliant on this database for its core revenue generation then this is a business decision and not a technical one. Cost is only a minor fact
        • His question was about big name companies using PostgreSQL so that his boss could sleep at night. For anyone to make anything close to a reasonable suggestion, he would have to give us a huge amount of information, which is unreasonable for the ask-/. format. In short, his question was not a technical one.

          You're right though, different situations drastically change the requirements. However, it looks like he's already made that decision himself, otherwise he would be trying to convince himself and not his
  • How's .org and .info (Score:5, Informative)

    by snorklewacker (836663) on Monday March 21, 2005 @12:35PM (#12001051)
    SPI, the authoritative .org registrar, and Afilias, the authoritative .info registrar both use PostgreSQL for their registration databases.
  • by bendsley (217788) <brad[at]floabie[dot]com> on Monday March 21, 2005 @12:53PM (#12001337) Homepage
    Their website shows that BASF uses PostgreSQL as their DB.

    www.basf.com

    They're an enormous company. I've always heard too that PostgreSQL is much better for larger sites. Cannot say for sure though as I have never used it.
  • Apple Remote Desktop (Score:2, Informative)

    by dadragon (177695)
    Apple's remote desktop 2 package uses PostgreSQL for its data store.

    link [apple.com]
  • OpenACS (Score:3, Informative)

    by aquarian (134728) on Monday March 21, 2005 @01:16PM (#12001704)
    OpenACS has been Postgres-based for a long time, as a free alternative to Oracle. You can get plenty of Postgres information at www.openacs.org The folks there have been using it for years for all kinds of sites, so it's pretty well tested. OpenACS is a unique system using AOLServer and TCL, but the database performance should translate to whatever server/scripting platform you're using.
    • OK, you're inducing the facial tics again... Been there, done that, it wasn't fun.

      The only sane way to be making this sort of decision is to benchmark, because everything relies on how the existing SQL is written. If you don't have time and money to benchmark, then follow this simple metric:

      a) who's the lead database programmer?
      b) which database does s/he write for first?

      Pick that one, because you can bet that the programmer has been picking up all sorts of database-specific tips and tricks for the last
  • ...on some hefty hardware these days. This post [postgresql.org] talks about running it on a 16 CPU machine...
  • by Anonymous Coward
    We are an e-Learning company which started 4 years back with very little startup budget. We have been using Postgres for 4 years now and it has never let us down. We never imagined our company would grow so big so fast. Today we provide an ASP solution for over 10,000 users from around 20 companies. Postgres scales very well and is quite responsive. In the past we have had periods of 100% CPU utilization but postgres did not crash on us. You have to know how to configure it correctly and is will perform as
  • postgresql goodness (Score:3, Interesting)

    by mcguyver (589810) on Monday March 21, 2005 @01:52PM (#12002235) Homepage
    My company uses PostgreSQL and are pretty happy with the performance. The only problem we had was in November when the Google spider went crazy and hit us a few million times a day for a few weeks. After a few hours of optimization, the sites were running smooth. A few years ago we had to come up with a db platform and we were a small company. We could use oracle but it's all around expensive. Oracle software, support, licensing, and engineers are expensive. Mysql's transaction support was too bleeding edge at the time. What I like most about postgresql is the transition from oracle to postgresql is smooth and most our engineers come from an oracle environment. Plus postgresql has adequate transactions support, subselects and functions...and it's free.

    (In defense of Google, their spider did not intentionally go crazy - we have distributed webservers on seperate IPs so the spider can't tell if it's pounding one particular site. However Google only spidered more pages as a publicity stunt before MSN search was released so maybe they are to blame...)
  • by jbrayton (589141) on Monday March 21, 2005 @01:57PM (#12002313) Homepage

    This question really requires more data. How much traffic are we talking about? How much data are we talking about? And then there are all sorts of variables, like the type of content begin stored in the database, the number and types of queries that are done on each page, and the type of caching your application is doing.

    Also, if Oracle is already purchased and paid for, you will have a difficult time making a business case for PostgreSQL.

    Don't get me wrong, I like PostgreSQL. But you will want to have a reason for switching, aside from PostgreSQL being open source.

    • Also, if Oracle is already purchased and paid for, you will have a difficult time making a business case for PostgreSQL.
      I thought that until I saw the running costs and how much the extra users cost. We started to use MySQL instead. PostgresSQL may have more possibilities (also with licensing) but it was easier to use MySQL under the circumstances.
  • cdbaby, baby. check out that dude's weblog on o'reily.
  • MobyGames runs on PostgreSQL and has done so for over 5 years.
  • I work at WhitePages.com [whitepages.com], one of the top 100 US websites, and we use PostgreSQL in our mix of databases. We have the entire US and CA white and yellow page data loaded into PostgreSQL and we see awesome performance from our configuration. We've got over 250,000,000 rows of data and a *lot* of indexes making our database about 375G. We run over 1,500,000 queries per server per day which is about 100 per second at peak. Under load tests, we've seen almost triple that volume from the same servers. However
  • by ArnIIe (814978)
    Yes! I have (the company I work for clock [clock.co.uk]) use PostgreSql for some large-ish popular sites:

    Eddie Izzard (link [eddieizzard.com]
    JD Wetherspoon (link [jdwetherspoon.com]
    Bill Bailey (link [billbailey.com] and others We find it far the best solution for us

CCI Power 6/40: one board, a megabyte of cache, and an attitude...

Working...