PostgreSQL on Big Sites? 89
An anonymous reader asks: "I've been using PostgreSQL for years on small projects, and I have an opportunity to migrate my company's websites from Oracle to an open-source alternative. It would be good to be able to show the PHBs that PostgreSQL is a viable candidate, but I'm unable to find a list of high-traffic sites that use it. Does anyone know of any popular sites that run PostgreSQL?"
Several examples (Score:5, Informative)
Re:Several examples (Score:2)
Re:Several examples (Score:2)
I genuinely like PostgreSQL and have used it extensively. I want to see a list of big name users as much as anyone. But that "list of *big* companies" is topped by:
* Affymetrix
* Afilias
* BASF
* Cognitivity
* Journyx
* Royal
* The American Chemical Society
* Tsutaya
With the possible exception of BASF, these aren't exactly household names. I have no doubt that these are huge organizations with serious database needs, but what Postgr
Re:Several examples (Score:1, Informative)
Uses a J2EE application to store user profiles for the Navy Enterprise Portal and the Fleet Numerical Meteorology and Oceanography Center(FNMOC) Portal using JBoss application server and PostgreSQL database. Expose application via SOAP and RMI interfaces.
http://kennethbowen.com/kbresume.html
The .org registry? (Score:5, Informative)
Story [computerworld.com.au]
Re:The .org registry? (Score:3, Informative)
Both are run by Afilias, which is a big user and big developer of PostgreSQL. They're the ones that did the work on the Slony-I [slony.info] replication server.
recent interview with Josh Berkus (Score:4, Informative)
Re:recent interview with Josh Berkus (Score:2)
In particular, Josh talks about Fujitsu's involvement (a $43 billion company).
A good quote:
Re:recent interview with Josh Berkus (Score:2)
I thought that Oracle was the second biggest software company out there. I didn't know whether that put them in the $40 billion range or not.
Re:recent interview with Josh Berkus (Score:1, Informative)
They make submarine/underseas telecommunication networks, the world's fastest optical amplifiers, semiconductors, nanotech/quantum-dots, point-of-sale systems, Sparcstations, ATM machines, LCDs, supercomputers (#14 on the "Top 500 supercomputers list), Sparc chips for Sun etc.
No doubt Oracle sells more software.
Re:recent interview with Josh Berkus (Score:2)
well oracle IS pretty good (Score:4, Insightful)
We momentarily thought about dropping Oracle for PGSQL at my last company, but after we hired a consultant to do everything he could with Postgres to improve performance, Oracle was still a clear winner for us.
I don't know if he was incompetent or what, but the performance numbers weren't even close with what we needed it to do.
If your database will run just as well on PostgreSQL, I say go for it. If you go with PostgreSQL and it doesn't perform as well as Oracle in your environment, your management will have serious doubts about open-source software from then on, and that's a stain that is hard to get rid of.
in short: choose based on your needs, not based on the fact that one is open and the other isn't.
PGSQL (Score:3, Informative)
This not only makes it easier in some instances to migrate some applications to PGSQL, it also improves performance (JIT compiling). You don't say exactly where the performance bottlenecks are, but this could improve performance and close the gap between PGSQL and Oracle.
That said, if you've been working for years on tuning your Oracle physical design to a fare-thee-well, it's going to
Re:well oracle IS pretty good (Score:2)
However, it's naive and wrong to say that being free software or open source cannot be one of one's needs ("choose based on your needs, not based on the fact that one is open and the other isn't.").
More info needed on PostgreSQL survey. (Score:2)
It's not possible to draw insight from such a description, hence I q
Need more info (Score:3, Informative)
On the other hand, If your company is doing transaction processing, like a customer facing product ordering system (think amazon), its a lot more than just having to sustain certain volumes. The reputation of your company and its ability to make money by selling products will rely entirely on your database. In a best case scenario there may be no difference between oracle and postgres. But imagine the worst case scenario. Peak volume, company is making $1M/hour in sales on the web, db dies and won't come up....who you gonna call?
There's more to the equation than up front cost and ability to handle volumes....
Re:Need more info (Score:5, Insightful)
MyISAM can't handle a database of larger than 2 gigs. Once you switch to another table backend, MySQL's vaunted performance advantage pretty much evaporates.
> Peak volume, company is making $1M/hour in sales on the web, db dies and won't come up....who you gonna call?
My DBA, assuming I'm running point-in-time recovery. That's all Oracle is going to tell you to do. The unemployment office if I'm not. Although PITR in pgsql is something of a PITA [postgresql.org], which just might go to recommend Oracle for the time being.
Re:Need more info (Score:3, Interesting)
I am also assuming that the guy who is posing this question IS the DBA. At least I sure hope so, for whoever is the DBA's sake. Your scenario is a best case recovery scenario using poin
Re:Need more info (Score:4, Insightful)
If I'm doing a million bucks an hour, I damn well had better be running a replica, so let's add that to the solution menu too. pgsql's replication ain't terrific either. Works, but not too flexible. Score another for Oracle.
Anyway, if Oracle's PITR is broken/buggy, you are screwed screwed screwed. First, let's forget the fanciful notion that you can sue them. Now you're part of the support machine, the wheels of which grind exceedingly slowly and roughly.
I don't often like to plug source access because it's extremely overrated, but as a last resort, if you can instrument your database startup with a debugger and trace the point of failure, you now have an advantage FAR greater than that Oracle is going to give you once while your trouble ticket clears through the dozen support techs who repeat the same useless advice and tie up your time.
I also don't like to sling the term "FUD" around, because it's so often this shibboleth of the open source crowd, anything they disagree with, but what Oracle employs against solutions like PostgreSQL is often pure FUD. "Who you gonna call? Who's behind your data? What will you do WHEN it breaks? Scary scary scary, you just don't knooooowwww!!" I could probably turn around to an Oracle rep and say "right, that's about the same sort of feeling I get when dealing with YOUR support organization as well."
If I'm doing a million bucks an hour, I'm probably picking Oracle too, because it's had more years to shake out PITR, hot backup, and clustering than pgsql has, so there's more of a body of knowledge accumulated on it. I just don't like the climate of fear going around when there's plenty of Oracle disasters to look at and learn from as well.
Re:Need more info (Score:2)
It's all going to depend on your SLA, with Oracle or whoever. If you want/need to have a 4 hour response you can get that if you've got the money. I assume you could get this level of support from a Postgres related company as well, but based on the number of employees those companies have, I'd say its not typical.
Back to my original point, if your company is in the scenario where your business critical revenue generating
Re:Need more info (Score:1)
I agree with nearly all of your post, though I do think the issue of support is an important one to consider, but I hav
Re:Need more info (Score:1, Interesting)
Not that imaginative, are you.
A geospatial database that holds merely lat/long & address info (street names, city&state codes, zips, address ranges) and related tables containing information about demographics, etc can easily get into the 90GB range. One I used for analyzing targeted marketing campaigns was about 270GB.
Re:Need more info (Score:1)
Appearently, if you are a a major national bank [theregister.co.uk], you eventually give up [danskebank.com] on your vendor's support, and just fix it yourself.
Re:Need more info (Score:2)
If you do have a table using up 2Gb in data, it's probably a good idea to optimize it a little and pull out some of that data with a table_detail.
Re:Need more info (Score:2)
Re:Need more info (Score:2)
By configuration or recompilation? Do you need to dump/reload or will it work with the old data files? What OSes and filesystems have what limitations?
Re:Need more info (Score:2)
If the OS supports bigger files, you can increase the 4Gb limit with a SQL alter table. Just increase the maximum number of rows.
I'm not an expert on OS's and file sizes.. but if your needing greater than 2Gb files, with any database, I would think that your OS would need to support it.
Re:Need more info (Score:2)
I had a legitimate question; that kind of response was unecessary. I am a PostgreSQL user, but I try to keep knowledge of other databases handy so I can avoid pitfalls if I need to use another DB.
2GB isn't a big table really. What is that, like $2 worth of disk? If I had set up an archive or log or something a year ago, it would probably have bitten me by now
Re:Need more info (Score:1)
http://dev.mysql.com/doc/mysql/en/table-size.ht
Note it's a bit dated (though that's the current reference manual), but I still suspect most of that information is accurate.
hth
Re:Need more info (Score:2)
That's what I thought at first, but after I learned how to do it I though to myself: "Can I come up with an easier way to administer PITR?" and I couldn't think of anything. PITR is a complicated concept (time warping, multiple timelines; it starts to get a little weird), and I'm impressed that they are supporting the feature. If you can think of an easier way to administer it, let the lists know, and I wouldn't be surprised if some tools appeared.
Re:Need more info (Score:4, Informative)
Re:Need more info (Score:2)
2) "FUD" is not a synonym for "wrong".
3) I have a raft of other reasons MySQL is inadequate, including data integrity ones.
Re:Need more info (Score:2)
I, too, have a host of reasons to dislike MySQL. Unfortunately, I cannot switch to PostGreSQL because MySQL supports master-master replication (albeit not well) but PostGreSQL does not.
Re:Need more info (Score:2)
Also, what about PgPool? That is a popular form of master-master replication for postgresql. It's not the be-all-end-all, but no single replication system is right for all situations. That's why PostgreSQL has so many replication options.
I would be very interested to read a case study of your master-master replication usage in MySQL. I understand that the
Re:Need more info (Score:1)
And when a marketroid starts talking as if their solution is the be-all-end-all without examining your details specifically, you know they're BSing you and you have no reason to think that you're safe.
BSing, that refers to Business Speak, does it? :-P
PostgreSQL Replication & Culture (Score:3, Insightful)
PostgreSQL, like Linux, is more like an ecosystem of software, where you can go and pick and choose or even write your own stuff. It's not as diverse
Re:Need more info (Score:2)
There are signif
Re:Need more info (Score:2)
PgPool is fairly primitive, it's just query based replication. So your application definitly needs to account for that and it isn't perfect for all situations. I mentioned it because it's the only master-master replication software that I've used (and I haven't used it except to mess around). It isn't nearly as
Re:Need more info (Score:1)
The MySQL vs. PG vs. INSERT-FAVORITE-DB-HERE debate is tiring enough without the FUD being so readily cast about, and when the FUD doesn't change over the years it's even more so. Though it does make responses rather simple:
<fx: pastes standard response>
Use what's appropriate to your needs. Don't jump in head-first without unders
Re:Need more info (Score:2)
Re:Need more info (Score:2)
We [skyblog.com] use MyISAM databases that are over 20 gigs with no issue so far (except myisamchk time...).
Re:Need more info (Score:2)
To say that you're jumping the gun in your post would be an understatement. Read only databases aren't hard; there are many ways to accomplish that and with MySQL you would still have to convince your boss to use F/OSS. I hardly see how that answers his original question, it seems more like you're trying t
Re:Need more info (Score:3, Interesting)
On the other hand, if his companies business is reliant on this database for its core revenue generation then this is a business decision and not a technical one. Cost is only a minor fact
Re:Need more info (Score:2)
You're right though, different situations drastically change the requirements. However, it looks like he's already made that decision himself, otherwise he would be trying to convince himself and not his
How's .org and .info (Score:5, Informative)
Companies that use PostgreSQL (Score:4, Informative)
www.basf.com
They're an enormous company. I've always heard too that PostgreSQL is much better for larger sites. Cannot say for sure though as I have never used it.
Apple Remote Desktop (Score:2, Informative)
link [apple.com]
OpenACS (Score:3, Informative)
Re:OpenACS (Score:2)
The only sane way to be making this sort of decision is to benchmark, because everything relies on how the existing SQL is written. If you don't have time and money to benchmark, then follow this simple metric:
a) who's the lead database programmer?
b) which database does s/he write for first?
Pick that one, because you can bet that the programmer has been picking up all sorts of database-specific tips and tricks for the last
Note that PostgreSQL is being benchmarked... (Score:3, Informative)
PostgreSQL on Big Sites? (Score:2, Informative)
Re:If you can afford Oracle (Score:5, Insightful)
If I was a PHB type for an online retailer and I looked at the costs and noticed that 50% of our profits are going to Oracle rather than to our pockets, I'd have some questions for the IT guys like:
(1) Are we a retailer or a data warehousing company?
(2) What is Oracle and why is it so expensive?
(3) Can you get the same job done with less money? If so, what costs, benefits, and risks might we see?
(4) My friend's IT guys use this thing called Post-whatever-SQL, and it costs $0. Is Oracle kinda like that?
postgresql goodness (Score:3, Interesting)
(In defense of Google, their spider did not intentionally go crazy - we have distributed webservers on seperate IPs so the spider can't tell if it's pounding one particular site. However Google only spidered more pages as a publicity stunt before MSN search was released so maybe they are to blame...)
How much traffic are we talking about? (Score:5, Insightful)
This question really requires more data. How much traffic are we talking about? How much data are we talking about? And then there are all sorts of variables, like the type of content begin stored in the database, the number and types of queries that are done on each page, and the type of caching your application is doing.
Also, if Oracle is already purchased and paid for, you will have a difficult time making a business case for PostgreSQL.
Don't get me wrong, I like PostgreSQL. But you will want to have a reason for switching, aside from PostgreSQL being open source.
Re:How much traffic are we talking about? (Score:2)
Re:Well... (Score:2)
Just to make sure, you didn't leave the postgresql shared_buffers setting default did you?
Re:Well... (Score:2)
(a) notice; and
(b) believe that that is his real email address; and
(c) assume that it summarily discredits every word he writes?
Personally, I didn't make it past (a), so the compiler's optimizer in my brain never bothered to check (b) or (c).
Re:Well... (Score:2)
For mostly read depositories, MySQL is pretty good. When you start mixing in more and more writes, it tends to not do so well with MyISAM tables, and innodb don't quite keep up with PostgreSQL. But they're pretty good.
Re:Well... (Score:2)
Re:Yahoo, Google, etc. (Score:3, Insightful)
Also, the application matters a lot. MySQL is very effective as a cache to hold a relation. It would not surprise me if many of those companies use Oracle/DB2/MSSQL/PostgreSQL as a backend database, and then use MySQL to cache some of the data for fast access. If you list the companies using PostgreSQL extensively,
cdbaby (Score:1)
MobyGames (Score:2)
PostgreSQL powers the WhitePages Network (Score:2)
Sites (Score:1)
Eddie Izzard (link [eddieizzard.com]
JD Wetherspoon (link [jdwetherspoon.com]
Bill Bailey (link [billbailey.com] and others We find it far the best solution for us