Large Scale Web Apps Built on Open Source 213

prostoalex writes "Brad Fitzpatrick presented at OSCON with on overview of his little project. Interesting facts about the evolution of the Livejournal back-end architecture."
Large Scale Web Apps Built on Open Source

  Salesforce.com

    by Anonymous Coward on Tuesday September 21, 2004 @03:32PM (#10311808)
    It's all LAMP.
  • lamp! (Score:2, Informative)

    you can do allot with Lamp, just look at....SLASHDOT!

  /. effect

    by mreed911 ( 794582 )
    LiveJournal? Not anymore...
  • by Anonymous Coward on Tuesday September 21, 2004 @03:37PM (#10311880)
    • by Anonymous Coward on Tuesday September 21, 2004 @03:48PM (#10312036)
        That's just sad. That someone is that wrong on their history about such a major point. I would post a correction, but I'd rather not post my IP addr to the world on her blog. I'll have to do it when I get home from work.
        • by MC Negro ( 780194 ) on Tuesday September 21, 2004 @04:38PM (#10312618) Journal
          • You left the best part out. She was upset because she got a D on the paper (from which that came) and she thinks she is a good writer. Her explanation, of course, is not that she has a greatly inflated opinion of her abilities but that he teacher is anti-Christian.

            We laugh about this but the really scary part is that there are a lot of people who think like her. People hate Bush so much because of the war but I am much more scared about his connection to the zealots of the religous right. The war in Ir
        • Or recent history...

    • by seanmeister ( 156224 ) on Tuesday September 21, 2004 @03:56PM (#10312127)

      Brad Fitzpatrick apparently agrees with your take on LJ, judging from the sample user data shown on page 24 of the presentation:

  • My companies backend is mostly Java.
    We are using Oracle as the database, and Solaris as the UNIX, but we could be using MySQL and Linux.

    In fact, we are investigating that right now :)
    • by Kainaw ( 676073 ) on Tuesday September 21, 2004 @03:46PM (#10312014) Homepage Journal
      We are using Fedora, Postgres, and PHP for what I consider a rather large-scale application. It is a storage and query system for research on a few million patients. We could have gone with Oracle and Java (...shiver...), or even MSSQL and a Windows server, but why waste money? The only real headache I've had is figuring out that Apache2 is threaded and Postgres/PHP sits on top of some low-level linux code that is not. I could use Apache instead of Apache2 to fix the problem, but I fixed the non-threaded code instead.
    • Absolutely. Having a J2EE project running Linux servers with Apache, JBoss, and PostGRES aren't unheard of... and most J2EE developers prefer to use eclipse.

      That's 100% open source, people... and we are talking large corporate intraweb apps and such.

      I work mostly with financial institutions... they prefer IBM backed Linux servers with WebSphere... but still like eclipse (or WSAD, which is eclipse with a Websphere test server plugin), and a commercial DB (oracle, DB2, or informix are popular)... but the
      • Just out of curiosity, how scaleable and popular is DB2, anyway? I remember IBM sending our LUG a bunch of free DB2 CDs for Linux, a long long time ago. I tried getting it up and running, but didn't have the patience to work on it or do anything useful.

        I was just wondering how it compares to other existing commercial DBs out there.
        • DB2 is about as scalable as it gets (for general purpose DBMS)

          Not sure about popular, but it runs on some fairly big IBM hardware, and would historically be used by larger organizations. I guess with the porting to Linux, they are hoping for more small scale business.
  • Uh, the Web itself (Score:4, Insightful)

    by FunWithHeadlines ( 644929 ) on Tuesday September 21, 2004 @03:39PM (#10311910) Homepage
    Large Scale Web Apps Built on Open Source

    Uh, like, you mean the Web itself? That's large scale, certainly was built, and is most certainly built on open source.

    So, yeah, I reckon it can be done. I'm using the proof-of-concept to submit this comment.

    • by yintercept ( 517362 ) on Tuesday September 21, 2004 @03:52PM (#10312088) Homepage Journal
      The web is really a mixed bag that allows a mix of open standards, and proprietary software. To claim it is all open source is misleading. It is a dynamic network that allows development on multiple layers.

      The most important aspect of the web is that the interface of the different layers were well defined and exposed...not that each line of code in the different layers is exposed.
    • by drouse ( 34156 )
      Sure, Slashdot is just Apache, some Linux boxes, some Perl maybe some C -- not a big deal...

      The LJ folks faced scaling problems and had financial limits on how much money they could throw at the problem. So they used smarts and OS software instead of huge piles of money. They also built some new tools that are OS themselves, thus contributing back to the community (I hate that phrase, but this is Slashdot).

      The presentation is actually interesting technically, and good news for Linux/MySQL/Perl/etc.

  • by numbski ( 515011 ) * <numbskiNO@SPAMhksilver.net> on Tuesday September 21, 2004 @03:40PM (#10311916) Homepage Journal
    Anyone know what that document format is since it's roughly half the size of the pdf?
  • Why is there a password on this sxi file (star office presentation)... is the file not open source?
  • by tcopeland ( 32225 ) * <tom@NosPam.thomasleecopeland.com> on Tuesday September 21, 2004 @03:45PM (#10312007) Homepage
    ...right here [helixcommunity.org].

    It's powered by GForge [gforge.org], so it's backed by PHP and PostgreSQL [postgresql.org].

    There are a bunch of other sites running GForge listed here [gforge.org]...
    • GForge [gforge.org] really is great. We're using it internally at my workplace for request tracking and project management. Now, if only 4.0 would come out soon... :)
    • It's powered by GForge
      That looks pretty nice. How much effort is involved in setting up something like that? (For internal use, I mean.)
      Not being one to want to re-invent the wheel, do you know of anyone packaging that for Fedora Core?
  • Maypole! (Score:5, Informative)

    by Anonymous Coward on Tuesday September 21, 2004 @03:47PM (#10312022)
    Maypole [perl.org] is a Perl framework for MVC-oriented web applications, similar to Jakarta's Struts. Maypole is designed to minimize coding requirements for creating simple web interfaces to databases, while remaining flexible enough to support enterprise web applications.
  • by tinla ( 120858 ) on Tuesday September 21, 2004 @03:50PM (#10312056) Homepage Journal
    Ok, so most of the Journals lack even a scrap of entertainment value... but the data feeds are normally fun. Is there anyone left that hasn't wasted a few bytes on the following url?

http://www.livejournal.com/stats/latest-img.bml

    http://www.livejournal.com/stats/latest-img.bml [livejournal.com]

    Hint - its a constantly updating list of all the new images posted to journals. After a while you give up waiting for a hot chick to post and decide crazy survey graphics are as good as it gets. And then some hot chick posts her birthday party pictures, but she's only 14 and suddenly you wish you'd spent the day doing something else.

  • Porn (Score:5, Interesting)

    by Neil Blender ( 555885 ) <neilblender@gmail.com> on Tuesday September 21, 2004 @03:51PM (#10312069)
    Back in the .com days, I worked at a huge (now defunct) porn site. We had about 50,000 active hosted sites, 500,000 hit counters and a bunch of other stuff. We were getting tens of millions of page views daily, maxing out two 100 megabit circuits at times. It was all FreeBSD, a little Redhat, Perl, mysql, squid, apache, mod_perl and C. The only real closed stuff we used were BigIPs and traffic monitoring software.
  • How is this "large scale?" Maybe it's medium-scale as far as the web goes, but otherwise, it's very much a lightweight app. From livejournal.org:

    Per Hour: 6818
    Per Minute: 114

    That's 2 inserts a second, and maybe a hundred queries a second. Quite honestly, that could be handled by MySQL & PHP. Definitely not what I'd call "large scale".
    • Re:Large scale? (Score:2, Informative)

      by Anonymous Coward
      Thats only posts, it doesnt take into account comments (which is probably most of the traffic) userpics, etc.
      Your assumption would be correct if it was 1 select for each page view, but since there are about 4-5 just for 1 page view (userpic, friends, info, etc) then that number is misleading.. Fortunally most of that static content is memcached and not hitting the DB's.
    • Re:Large scale? (Score:5, Interesting)

      by xb95 ( 815446 ) on Tuesday September 21, 2004 @04:56PM (#10312797) Homepage
      Most of the time those numbers are four or more times that high. It's early in the afternoon, this isn't a peak time.

      Anyway, those are only the number of entries being posted. For every entry being posted, there are a ton of inserts actually going on:

      * log2 table to contain some metadata about the entry
      * logtext2 table to contain the actual text
      * logprop2 table (multiple rows, 3-5) containing other metadata about entry

      So, four times the traffic, about 6 inserts each, 2400 updates per second--and that's just for posting entries. We get a lot more traffic from people posting comments (which also do 3 or 4 update/inserts each comment), plus people editing their userinfo, uploading new userpics, ...

      While LiveJournal definitely isn't a huge site, it's not a lightweight, and definitely doing pretty good for having around 80 machines and doing 30-40 million fully dynamic page views a day.
  • by TheSync ( 5291 ) on Tuesday September 21, 2004 @04:15PM (#10312330) Journal
    If you are looking for scalable OSS solutions, also look into Zope with Zope Enterprise Objects (ZEO) [zope.org].
  • by WreckDiver ( 685191 ) on Tuesday September 21, 2004 @04:26PM (#10312473)
    Not large enough scale to survive a Slashdotting...
  • by tshak ( 173364 ) on Tuesday September 21, 2004 @05:06PM (#10312904) Homepage
    As a paying subscriber of Livejournal, I can say the only reason I even have an account is because of the friends that I have who use it. I would never use it as a case study for any technology. It's got huge performance problems, data loss issues, and usability issues. This may not be the fault of using OSS, but it definitely doesn't help it look good.
    • Performance problems? Data loss?

      Do you even watch the evolution of the site, or are you just throwing stuff out for the hell of it? :)
    • how about you read the damn pdf, which mainly is about dealing with those problems on the huge scale of things?
      • You should really take some time to investigate the tech they use - it would explain the data loss, memcached in particular.

        I was at OSCON, heard all the rap about memcached, and instantly thought to myself:

        "What happens when memcached shits itself?"

        The answer is: data loss. I'm not talking about a transaction here - I'm talking about the whole frickin' database, guaranteed. It has to be repopulated before you see any performance gain, and it's incapable of populating it automatically. Also, it's method
  • Most were moderately budgeted -- $10m/year -- and based on some combination of Tomcat/Apache and other tools though the DB was not OSS (2 Oracle, 1 MSSQL). ASCII White must be many times that $-wise as well as scale, and I'm sure other folks can think of a few more easily.

    Is there something interesing here?

  • by otisg ( 92803 ) on Tuesday September 21, 2004 @06:14PM (#10313604) Homepage Journal
    Some may find it interesting that Wikipedia (covered earlier today on Slashdot) uses some code that came out of LiveJournal for caching: memcached [danga.com].
  • Kenny (Score:5, Funny)

    by Trejkaz ( 615352 ) on Tuesday September 21, 2004 @07:02PM (#10314050) Homepage
    It's somewhat amusing that in the first load balancing example, one of the points of failure was Kenny. Especially since Kenny ALWAYS DIES.
  • Opening the PDF I am greeted with the following:

    Inside LiveJournal's Backend or, "holy hell that's a lot of hits!"

    Believe me, it is taking all my strength to avoid making a certain obvious joke about this title..

The absent ones are always at fault.