Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming Software

Scaling Large Projects With Erlang 200

Delchanat points out a blog entry which notes, "The two biggest computing-providers of today, Amazon and Google, are building their concurrent offerings on top of really concurrent programming languages and systems. Not only because they want to, but because they need to. If you want to build computing into a utility, you need large real-time systems running as efficiently as possible. You need your technology to be able to scale in a similar way as other, comparable utilities or large real-time systems are scaling — utilities like telephony and electricity. Erlang is a language that has all the right properties and mechanisms in place to do what utility computing requires. Amazon SimpleDB is built upon Erlang. IMDB (owned by Amazon) is switching from Perl to Erlang. Google Gears is using Erlang-style concurrency, and the list goes on."
This discussion has been archived. No new comments can be posted.

Scaling Large Projects With Erlang

Comments Filter:
  • Scala (Score:5, Informative)

    by fils ( 88044 ) on Sunday July 06, 2008 @10:09AM (#24074545) Homepage

    People may also want to check out Scala at:
    http://www.scala-lang.org/ [scala-lang.org]

    It also uses the Erlang style concurrency approach and runs on the JVM with class compatibility with other JVM languages, ie Java, Groovy, etc.

  • TFA is misleading (Score:1, Informative)

    by Anonymous Coward on Sunday July 06, 2008 @10:22AM (#24074619)

    TFA implicitely states that Google is using Erlang or some concurrent programming languages. That's wrong: they use C++, Java and Python, and prett much nothing else (apart for specialized stuff for MacOS apps for instance).

  • Stupid article (Score:5, Informative)

    by IamTheRealMike ( 537420 ) * on Sunday July 06, 2008 @10:39AM (#24074697)

    Wow, it's not often I strongly criticise articles around here, but that was total garbage.

    For the smart ones that didn't RTFA, here's a quick summary:

    • I like Erlang.
    • Big companies like Google and Amazon make things fast by using concurrency.
    • Erlang supports (one type of) concurrency.
    • Thus Google and Amazon are [probably] using Erlang.
    • Thus everyone should learn Erlang.

    For the record, I work for Google and we don't use Erlang anywhere in the codebase. Google Gears restricts you to message passing between threads because JavaScript interpreters are not thread-safe, so it's the only way that can work. Visual Basic threading works the same way for similar reasons. It's not because eliminating shared state is somehow noble and pure, regardless of what the article would have you believe, and in fact systems like BigTable use both shared-state concurrency and message passing based concurrency.

    The article says this:

    Architects (but also university-professors for that matter) still think they can build current and future industrial-grade and internet-grade systems with the same technologies as they did 10-15 years ago.

    But in fact the Google search engine, which is one of the larger "industrial-grade, internet-grade" systems I know of, is written entirely in C++. A language which is much the same as it was 10-15 years ago. Thus the central point of his argument seems flawed to me.

    Seeing as the article is merely an advert for Erlang, I'll engage in some advocacy myself. If you have an interest in programming languages, feel free to check out Erlang, but be aware that such languages are taking options away from you, not giving you more. A multi-paradigm language like version two of D [digitalmars.com] is a better way to go imho - it supports primitives needed to write in a functional style like transitive invariance, as well as a simple lambda syntax, easy closures and first class support for lazyness.

    However it also compiles down to self-contained native code in an intuitive way, or at least, a way that's intuitive to the 99.9% of programmers used to imperative languages, unlike Erlang or Haskell. It provides garbage collection but doesn't force you to use it, unlike Java. It doesn't rely on a VM or JIT, unlike C#. It provides some measure of C and C++ interopability, unlike most other languages. And it has lots of time-saving and safety-enhancing features done in a clean way too.

  • Re:Scala (Score:4, Informative)

    by bonefry ( 979930 ) on Sunday July 06, 2008 @10:44AM (#24074725)

    There is a significant difference between Scala and Erlang.

    Erlang uses green threads. And green threads have advantages and disadvantages over native threads.

    For instance Erlang is bad at IO but on the other hand it can spawn millions of threads, something that the JVM has a hard time doing because native threads are limited by the kernel.

  • by GatesDA ( 1260600 ) on Sunday July 06, 2008 @10:52AM (#24074761)

    Can't point you to a comparison article, but one language you should consider is Scala. It compiles to the Java platform, and thus can interact almost transparently with existing Java code and libraries, and uses Erlang's concurrency model. It can do both functional and imperitive, object-oriented tasks. It's statically-typed, but with features I didn't think were possible outside a dynamic language, such as duck-typing (only compile-time checked!)

    It's very powerful, but sometimes hard to figure out. Not my ideal language, but the closest I've found.

    Official site:
    http://www.scala-lang.org/ [scala-lang.org]

    The busy Java developer's guide to Scala:
    http://www.ibm.com/developerworks/views/java/libraryview.jsp?search_by=scala+neward [ibm.com]

    Scala for Java refugees:
    http://www.codecommit.com/blog/scala/roundup-scala-for-java-refugees [codecommit.com]

  • Re:Scala (Score:4, Informative)

    by Cyberax ( 705495 ) on Sunday July 06, 2008 @11:01AM (#24074813)

    Scala has actors, which are allow you to do something _like_ green threads: http://lamp.epfl.ch/~phaller/doc/ActorsTutorial.html [lamp.epfl.ch]

  • Re:Scala (Score:4, Informative)

    by jonabbey ( 2498 ) * <jonabbey@ganymeta.org> on Sunday July 06, 2008 @11:43AM (#24075005) Homepage

    Modern JVMs on the modern Linux Kernel can spawn quite a hellacious amount of threads these days, actually.

    The problem with Java is the shared-state synchronization that is often necessary, and the extra work required to distribute state to threads across different VMs. A functional language and programming style could work quite well on top of the JVM, though, and could leverage RMI and some kind of message port facility for the distribution.

  • by stonecypher ( 118140 ) <stonecypher@@@gmail...com> on Sunday July 06, 2008 @11:45AM (#24075013) Homepage Journal

    1) Actually, there are quite a few good reasons for this, largely around the complete elimination of mutexing and locks. Just because you don't understand the purpose doesn't mean there wasn't one.

    2) Oooooh, a language is faulty because it has a syntax with which you are not familiar. Immediately kill all non-Java clones!

    3) They're just lists of numbers; they're neither ASCII nor Latin1. There is unicode parsing in the XMERL module.

    Please wait until you know a language before criticizing it.

  • by Anonymous Coward on Sunday July 06, 2008 @11:46AM (#24075017)

    Just a minor correction: Erlang has native code compilation on quite a few architectures -- try the "+native" flag. Most projects seem content with just using the VM interpreter, though.

    Best,
    Thomas Lindgren

  • by Anonymous Coward on Sunday July 06, 2008 @12:00PM (#24075115)

    1. Invariable variables.
    This appears to have been done for no reason other than the designer's preference. In fact, it's not strictly true -- variables can be unbound, and later bound. They just can't be re-bound once bound.

    On the contrary, there are very good reasons for having single-assignment variables. It makes the code more similar to plain mathematics, which makes it easier to reason about, and significantly reduces the number of programming errors. And you don't have to take that from me - there are some 20 years of experience at Ericsson and elsewhere with writing huge telecom applications in Erlang.

    2. Weird syntax.
    Why, exactly, are there three different kinds of (required) line endings? It seems as though the syntax is designed to be as different from C as possible, while maintaining at least as many quirks. Moreso, even -- when constructing normal, trivial programs, you're going to hit most language features head-on and at their worst. Where's my 'print "hello\n"' that works most other places?

    I don't believe the important features of Erlang are mutually-exclusive with the sane syntax of, say, Ruby or Python.

    The syntax is certainly different from C, Ruby, or Python, but this is because it is derived from the Prolog syntax. Furthermore, it is actually pretty systematic, once you get over those initial differences. It is a poor programmer who cannot master both worlds.

    3. Not Unicode-ready.
    Strings are defined as ASCII -- maybe latin1. But there's no direct unicode support in the language -- if you're lucky, there are functions you can pipe it through.

    A standard unicode library is still missing, but can be hoped for. At least, there is nothing in the basic representation of strings that prevents full unicode support (*cough* Java *cough*).

    There are other things I haven't mentioned, mostly implementation-specific -- things like the fact that function-reloading cannot be done when you natively-compile (with hipe) for extra speed.

    That's simply wrong. Dynamic code upgrade still works, native or not. It's the unloading of older native code from memory that is not being done (this is safe, but could be a memory leak in a very long running server).

    My plan is to take the features I actually like from Erlang and implement them elsewhere, in a language I can actually stomach for its real tasks.

    Well, good luck, and see you in 20 years. Meanwhile, the rest of us will be over here, getting stuff done with the language we have. For my part, I don't know anything else that lets me be as productive, at least for general problem solving.

  • by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Sunday July 06, 2008 @12:45PM (#24075413) Journal

    Actually, there are quite a few good reasons for this, largely around the complete elimination of mutexing and locks.

    ...What? No, the elimination of mutexing and locks is made possible by a shared-nothing architecture.

    Oooooh, a language is faulty because it has a syntax with which you are not familiar.

    Hey, I mentioned Ruby. I don't mind LISP, either.

    The point is not that the language is unfamiliar, the point is that it's inconsistent (and unfamiliar) for no good reason. I use English, but I could make a lot of the same criticisms about it.

    They're just lists of numbers;

    In that case, the argument becomes, "Erlang has very poor text-processing, if any at all."

    If Erlang has text-processing functions that are designed to operate on these "lists of numbers", then yeah, it's pretty much going to be ASCII. And how are Erlang source files read? Could be "neither ASCII nor Latin1" if you like, but they can't be Unicode unless the parser is actually Unicode-aware.

  • by thanasakis ( 225405 ) on Sunday July 06, 2008 @01:03PM (#24075515)

    TFA more or less says that IMDB is switching from Perl to Erlang. So I looked at the link and here's what I got:

    (From here [computerjobs.com]

    We are looking for developers with experience building web scale distributed systems. We are currently working in Perl but have plans to use Java, Erlang and any other language that we think will suit our purposes. We aren't looking for expertise in any of those, particularly, but we expect that you will be an expert in the systems you know. We do require that you be passionate about testing (unit, integration, fault-injection) and code quality. Experience with relational databases (Oracle, MySQL, etc), embedded databases (BerkeleyDB, CDB, MonetDB, etc) and Linux are a big plus.

    I'll leave anyone to draw his own conclusions.

  • by GaryOlson ( 737642 ) <.slashdot. .at. .garyolson.org.> on Sunday July 06, 2008 @01:11PM (#24075559) Journal
    Erlang vs. Stackless python: a first benchmark [wordpress.com] is a very good discussion of lots of niggling details in benchmarking a concurrency language. The comments are quite good.
  • Mozart/Oz (Score:2, Informative)

    by synthespian ( 563437 ) on Sunday July 06, 2008 @01:26PM (#24075689)

    http://www.mozart-oz.org/ [mozart-oz.org]

    I'll just cite another "competitor":


    "The Mozart Programming System is an advanced development platform for intelligent, distributed applications. The system is the result of a decade of research in programming language design and implementation, constraint-based inference, distributed computing, and human-computer interfaces. As a result, Mozart is unequalled in expressive power and functionality. Mozart has an interactive incremental development environment and a production-quality implementation for Unix and Windows platforms. Mozart is the fruit of an ongoing research collaboration by the Mozart Consortium.

    Mozart is based on the Oz language, which supports declarative programming, object-oriented programming, constraint programming, and concurrency as part of a coherent whole. For distribution, Mozart provides a true network transparent implementation with support for network awareness, openness, and fault tolerance. Mozart supports multi-core programming with its network transparent distribution and is an ideal platform for both general-purpose distributed applications as well as for hard problems requiring sophisticated optimization and inferencing abilities. We have developed many applications including sophisticated collaborative tools, multi-agent systems, and digital assistants, as well as applications in natural language understanding and knowledge representation, in scheduling and time-tabling, and in placement and configuration."

  • by aconbere ( 802137 ) on Sunday July 06, 2008 @01:27PM (#24075697)

    Actually, there are quite a few good reasons for this, largely around the complete elimination of mutexing and locks.

    ...What? No, the elimination of mutexing and locks is made possible by a shared-nothing architecture.

    Oooooh, a language is faulty because it has a syntax with which you are not familiar.

    Hey, I mentioned Ruby. I don't mind LISP, either.

    The point is not that the language is unfamiliar, the point is that it's inconsistent (and unfamiliar) for no good reason. I use English, but I could make a lot of the same criticisms about it.

    It's not that it's syntax is /inconsistent/ Erlang is actually incredibly consistent, it's just very different. Once you learn the 3 or 4 quirks that separate it from other languages those 3 or 4 quirks are very consistently applied.

    Take for instance the punctuation (not line ending characters as is suggested).

    Commas separated arguments in function calls, data constructors, and patterns. Periods separate functions.

    Semi-Colons separate clauses. (this is the trickiest, but can be thought of as signifying the existence of multiple cases of pattern matching).

    They're just lists of numbers;

    In that case, the argument becomes, "Erlang has very poor text-processing, if any at all."

    If Erlang has text-processing functions that are designed to operate on these "lists of numbers", then yeah, it's pretty much going to be ASCII. And how are Erlang source files read? Could be "neither ASCII nor Latin1" if you like, but they can't be Unicode unless the parser is actually Unicode-aware.

  • by Anonymous Coward on Sunday July 06, 2008 @02:09PM (#24076007)

    I think you don't understand "shared-nothing". Immutable variables and objects are the core of a shared-nothing architecture. When your variables are immutable, they are not "shared state", and there can be no race conditions in accessing them.

    You can have a "shared-nothing architecture" in a language with mutable variables by establishing and enforcing conventions, but then you have to think about it all the time and trust other people not to mess it up. In a language like Erlang, where all variables are immutable, a shared-nothing architecture is automatic. Your code (and more importantly, other people's code) is always automatically thread-safe and parallelizable.

    Considering that thread safety is perhaps the hardest thing to achieve in traditional languages, and concurrency is the next hot topic in computer science, having it built into the language starts to sound like a good idea.

  • by Anonymous Coward on Sunday July 06, 2008 @02:36PM (#24076207)

    Where is Lisp today? Smalltalk?

    Running Orbitz.com and DabbleDB. Lisp also allowed Paul Graham to make a lot of money.

  • by Anonymous Coward on Sunday July 06, 2008 @02:37PM (#24076211)

    You wrote a SMP-capable micro-stack green thread scheduler in Ruby, over the weekend?

    Nope. I used the existing Thread API. It is theoretically, but not actually, SMP-capable. It should be portable to JRuby, which uses Java threads.

    What I did was port the message-passing to a Ruby paradigm -- so, much simpler than it sounds. Essentially, I wrote a proxy for objects -- every method call to such an object stacks up in a queue, and is sent (in the object's thread).

    It's not so much a direct port of Erlang as a mapping of what I found valuable in the Erlang philosophy. A class can be written with almost no thought to concurrency, and then used concurrently.

    Your class can't be used concurrently, because Ruby is not actually SMP-capable. It can't be used on multiple systems, because the messaging can't be used across nodes.

    Single-machine, non-concurrency queue-based message passing systems are what we usually call "linked lists". Comparing your hobby-horse linked-list message queues to Erlang is foolish hubris.

    It supports transparent multi-node messaging?

    No, Ruby supports that on its own, with DRb.

    DRb is just another RPC implementation. By your logic, I could call Python's xmlrpc libraries an implementation of transparent multi-node concurrent messaging.

    However, you missed the "transparent" and "concurrent".

    In erlang, processes and (immutable) messages are first class entities, whether they're operating locally or remotely. Sending a local process a message is the same as sending a remote process a message. The message and protocol formats are well-defined, such that one can implement an Erlang "node" in any language -- including Java, Ruby, or C. Unlike most RPC systems, Erlang doesn't attempt to hide (or ignore the possibility of) underlying network/system failure, but instead, is built to handle failure through the use of supervisors.

    A language-specific reflective RPC library doesn't really count as a replacement, though it does make a pat answer.

    What I haven't figured out is the exception handling.

    You mean one of the some of the most complex pieces of code in Erlang -- the software that manages a supervision tree of processes, handling failure across the local system as well as remote nodes?

    Good luck with that

    If you pulled this off in Ruby, which doesn't even -support- native threads

    Ruby 1.9, which does. Unfortunately, it takes a page out of Python and uses a global interpreter lock, so only one Ruby bytecode can be executing at a time -- but again, I believe JRuby does real (actually concurrent) threads.

    So no concurrency.

    Or, more likely, you're just another know-it-all shithead who implemented 1% of the solution to a problem you don't even understand, and are now trumpeting it on the blogowww-twittersphere.

    That's exactly why I'm not -- no blog, no big announcements until it's done.

    Then please stop commenting on these systems, because I don't think you're going to understand what makes this problem space so complex until you've actually finished a solution for the whole space, instead the small percentage of it that you actually understand.

  • Re:Scala (Score:5, Informative)

    by Richard W.M. Jones ( 591125 ) <rich AT annexia DOT org> on Sunday July 06, 2008 @02:37PM (#24076215) Homepage

    "Last time you checked" was some time last century in that case. Linux kernels have been able to support at least 100,000 threads [wikipedia.org] for ages.

    That doesn't mean that using shared memory concurrency is a good idea though. When your computer comes with 10s or 100s of cores you'll realise that maybe SMP wasn't the best model of concurrency to choose. That's where models such as map-reduce, Erlang's shared nothing concurrency, message passing, and MPI come into their own. Even today they are useful because you'll be able to scale your program across multiple machines.

    Rich.

  • by Anonymous Coward on Sunday July 06, 2008 @04:08PM (#24076879)

    Your class can't be used concurrently, because Ruby is not actually SMP-capable.

    But JRuby is.

    I should note that, in a simple test, Erlang got about three times slower when I enabled SMP mode.

    JRuby is a non-standard Ruby implementation. From their feature list: "Most builtin Ruby classes provided". Emphasis mine. Additionally, see below why support for OS threads isn't enough to implement M:N actor:thread scheduling.

    As for Erlang SMP speed -- first, they implemented SMP support. Then, they started to optimize it -- It has the architecture necessary to improve performance. Ruby, in contrast, still has a GIL, which means every single component and extension will need to be reviewed for thread-safety before the GIL can be removed (which means, never).

    It can't be used on multiple systems, because the messaging can't be used across nodes.

    Except it can, with DRb. Which also provides sloppy SMP support -- just run two nodes on the same machine.

    How many "nodes" are you going to run on the same machine? Each ruby process consumes a great quantity of resources, the elimination of that resource drain is the entire purpose behind Erlang's tiny-stack microthreads -- they consume very few resources, so you can have a great many of them.

    Single-machine, non-concurrency queue-based message passing systems are what we usually call "linked lists".

    Erm, what? I know what a linked list is, and I don't see the comparison.

    You've created a dead-simple message queue based on delivery of messages to a thread-owned linked-list. Erlang implements externally scheduled pattern-based actor message delivery, using thread-scheduled microstacks to allow for an enormous number of concurrently deliverable processes. One key element of these implementations is that they do NOT consume a thread per actor. Since you're so up on JRuby, I recommend Philipp Haller's [lamp.epfl.ch] papers on his actor library for Scala.

    Assuming the implementation of thread-scheduled actor message delivery in Ruby (rather than JRuby), you'll still be stuck with the GIL, making the whole exercise, well, moot. If you use JRuby, you still don't control scheduling of microthreads, leaving you with 1:1 correspondence between an executing actor and a real thread, whereas Erlang is capable of maintaining M:N scheduling of *executing* actors to real threads.

    This M:N correspondence truly matters -- without it, blocking operations inside of an actor mean that the entire thread is blocked and useless for other purposes. It's not possible to implement the M:N model without control over your execution stack, which means true microthreads are unimplementable in both JRuby and Ruby.

    DRb is just another RPC implementation.

    ...and Erlang's RPC isn't?

    No. RPC is a limited form of message passing -- RPC is something you build ON TOP OF the message passing model. If you skip the "message passing" step and go straight to RPC, you wind up with something that ignores the complexity of the network -- the network fails, messages get lost, dropped, or ignored. Given this ignorance of the medium, RPC interfaces assume synchronous message delivery and response, which then precludes asynchronous delivery and response.

    In erlang, processes and (immutable) messages are first class entities, whether they're operating locally or remotely.

    Again, I have to go, "huh?"

    DRb allows objects to be created and "passed" between the machines. I could do:

    remote_object = DRB::whatever remote_object.some_method :arg1, :arg2 ...

    How is this different than Erlang's PIDs, other than being syntactically cleaner?

  • Re:Scala (Score:4, Informative)

    by TheRaven64 ( 641858 ) on Sunday July 06, 2008 @05:43PM (#24077633) Journal
    Note that I said 'per process' - each process has its own LDT, and so each one can support 8K threads, so you can get 100K processes with 13 processes easily. This might not still be the case - implementing TLS using an extra register would avoid this limitation but would remove one GPR, and they are quite scarce on x86. The other option, updating the LDT every few thread context switches introduces a lot of TLB churn.

    I quite agree that shared memory concurrency is a bad idea, however. Unfortunately, until you have message passing instructions in the hardware, you're stuck emulating message passing on top of shared memory, which leads to cache coherency issues and a host of other problems.

  • Re:Scala (Score:5, Informative)

    by Jamie Lokier ( 104820 ) on Sunday July 06, 2008 @07:25PM (#24078349) Homepage

    Linux threads stopped using the LDT on x86 in 2002. This change went mainstream over subsequent years, and is nowadays always used on x86.

    There was once a limit on the number of processes, too, due to each process having an entry in the GDT. That has long been removed too.

  • by A Numinous Cohort ( 872515 ) <raybaqNO@SPAMgmail.com> on Sunday July 06, 2008 @09:59PM (#24079357)

    Have you had a look at Clojure [clojure.org]? It is a Lisp dialect that runs on the JVM with good Java interop and has built-in support for STM [wikipedia.org] concurrency.

"If anything can go wrong, it will." -- Edsel Murphy

Working...