Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming Upgrades IT Technology

Project Aims For 5x Increase In Python Performance 234

cocoanaut writes "A new project launched by Google's Python engineers could make the popular programming language five times faster. The project, which is called Unladen Swallow, seeks to replace the Python interpreter's virtual machine with a new just-in-time (JIT) compilation engine that is built on LLVM. The first milestone release, which was announced at PyCon, already offers a 15-25% performance increase over the standard CPython implementation. The source code is available from the Google Code web site."
This discussion has been archived. No new comments can be posted.

Project Aims For 5x Increase In Python Performance

Comments Filter:
  • by KnightElite ( 532586 ) on Friday March 27, 2009 @04:33PM (#27363011) Homepage
    I hope this translates into further speed ups for EVE online down the road.
  • Re:Kill the GIL! (Score:5, Insightful)

    by dgatwood ( 11270 ) on Friday March 27, 2009 @04:55PM (#27363339) Homepage Journal

    The key is to find the right balance of granularity in locking. A big giant mutex is always a bad idea, but having tens of thousands of little mutexes can also be bad due to footprint bloat and the extra time needed to lock all those locks. The right balance is usually somewhere in the middle. Each lock should have a moderate level of contention---not too little contention or else you're wasting too much time in locking and unlocking the mutex relative to the time spent doing the task---not too much contention or else you're likely wasting time waiting for somebody else that is doing something that wouldn't really have interfered with what you're doing at all. Oh, and reader-writer locks for shared resources can be a real win, too, in some cases.

  • by zindorsky ( 710179 ) <zindorsky@gmail.com> on Friday March 27, 2009 @05:19PM (#27363671)

    Yes, I realise the right tool for the job argument.

    Exactly. Most applications are not CPU bound. If yours is, then I don't know why others are trying to get you to use Python.

  • Re:Kill the GIL! (Score:5, Insightful)

    by Just Some Guy ( 3352 ) <kirk+slashdot@strauser.com> on Friday March 27, 2009 @05:28PM (#27363757) Homepage Journal

    Good luck with that. The last time someone tried that, they slowed Python down by half.

    Yes, good luck with that! Because the current implementation slows it down by 7/8ths on my 8-core server.

  • It all depends (Score:5, Insightful)

    by mkcmkc ( 197982 ) on Friday March 27, 2009 @05:38PM (#27363905)

    I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.

    Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.

    My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.

    (Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)

    Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.

    (Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)

  • by mkcmkc ( 197982 ) on Friday March 27, 2009 @05:43PM (#27363979)

    You're not CPU bound until you: add all the features, handle the special cases, add the error checking, scale up beyond trivial test data, etc.

    Then what? Rewrite?

    Yes. If you didn't know all of that was going to happen, you're prototyping. If you're prototyping, you should be doing it in a prototyping language.

    Rewriting from Python to C++ is not particularly difficult. Completely overhauling the design of a project written entirely in C++ is really unpleasant and takes a long time. So much so that many early design decisions on large C++ projects simply cannot be undone.

    Model in clay first, then in stone later if you have to.

  • Re:Kill the GIL! (Score:5, Insightful)

    by jd ( 1658 ) <imipak@ y a hoo.com> on Friday March 27, 2009 @05:56PM (#27364145) Homepage Journal

    If developers were working from a clean-slate and didn't have the problems of excessive legacy code to work with, I suspect Digital Mars' D, Inmos' Occam and Erikkson's Erlang would be the three main languages in use today.

    If hardware developers were working from a clean-slate, you'd probably also see a lot more use of Content Addressable Memory, Processor-In-Memory and Transputer/iWarp-style "as easy as LEGO" CPUs.

    Sadly, what isn't patented was invented 30 years too late and 20 years before the technology existed to make these ideas really work, so we're stuck with neolithic monoliths in both the software and hardware departments.

    (Remember, Y2K was worth tens of billions, but wasn't worth enough to get people to stop using COBOL, and that was practically dead. To get people to kick their current habits would need a kick in the mind a thousand times bigger.)

  • Re:Any Hope? (Score:3, Insightful)

    by /dev/trash ( 182850 ) on Friday March 27, 2009 @08:34PM (#27366131) Homepage Journal

    You are free to use C and Assembler where required.

  • by bnenning ( 58349 ) on Friday March 27, 2009 @10:18PM (#27367009)

    If you're organization has 50 Java developers, the effort needed to train them to be Python developers is not trivial. Then you can't just rewrite everything because you still have all that Java code to maintain.

    Yes, you shouldn't rewrite working Java code in Python just for kicks, or vice versa. I'm not sure how that's relevant.

    It's not like Python is significantly less lines of codes than Java or anything. Especially now with annotations. Maybe 2x as many LOC

    I'll agree that 2x is in the ballpark, and I find that to be quite significant, considering that studies have found that developers tend to produce lines of (debugged, working) code at the same rate regardless of language. Doubling developer productivity will very often be worth sacrificing performance, especially when the software isn't CPU-bound. Why do you think Java took over from C?

    Plus, I don't think fewer LOC means greater maintainability.

    All I can say is that I've been developing in Java for 12 years and Python for 2, and that's been my experience.

    Let me give an example using a pizza recipe intead of a programming language.

    I don't agree with that, because the short version leaves out critical information so of course it's not as useful. What I like about Python is that it largely lets me deal with *only* the stuff that matters to my application. In my questionable metaphor Python would be "Bake at 400 degrees for 15 minutes", and Java would be "Turn the temperature dial to 400, open the oven door, insert the pan in the oven, close the oven door, wait 15 minutes, open the oven door...". Ok not quite that bad, but the essential details are often obscured by unimportant boilerplate. And yes, you can get tools that automatically create and hide some of it, but that should just make you question why the language can't do that itself.

    The main problem I see though. In 5 years, a lot of those Python developers are probably going to be working in a different language all together.

    A fine argument for COBOL :)

  • Re:Kill the GIL! (Score:3, Insightful)

    by ultranova ( 717540 ) on Saturday March 28, 2009 @03:44AM (#27368505)

    I thought it was pretty interesting because reference counting can have more cache-friendly behavior than copying gc or mark-sweep approaches.

    That's Java's biggest problem, IMHO: once the data spills into swap, it'll take forever to run garbage collection.

  • by daver00 ( 1336845 ) on Saturday March 28, 2009 @04:38AM (#27368655)

    The thing about Python is you are replacing every lost hour in runtime with a day gained in development time. That is the point of Python. Numpy (formerly scipy I think) is mostly written in C anyway and provides fast n-dimensional array objects for vector and matrix operations, there are really only a few bottlenecks for maths/science purposes. Generally anything that is going to take a seriously long amount of time you would be doing in C over anything else anyway, what Python is is a viable alternative to Matlab etc, and a damn sight less expensive!

    Where I study Engineering they teach Python for this very reason. It has a gentle syntax which appeals to engineers and scientists who often aren't bargaining to become coders, and it is so much cheaper than Matlab that any missing features are rendered a moot point.

    Seriously, sitting on the sidelines and saying "I'm not gonna use Python because it is slow" is silly, it is so damn easy to code in python that you would learn it in a weekend if you already have coding experience. And as I said before, any lost time running python scripts over other languages is made up ten time over at least in the ridiculously short development times that go with Python scripts. Yes, it really is THAT easy to do anything in Python, there is a reason people bug you to try it out. Just give it a weekend, Python deserves it!

  • Re:Kill the GIL! (Score:3, Insightful)

    by dkf ( 304284 ) <donal.k.fellows@manchester.ac.uk> on Sunday March 29, 2009 @03:48AM (#27377299) Homepage

    Not necessarily; there are plenty of application areas where you can easily design your data structures and access rules so that multiple threads accessing them are not a problem. Consider the application I'm currently working on, a parallel artificial neural network trainer. I have one copy of the weights, and 4 threads with a different training set each. Each runs through its training set, totalling changes to make to the weights, then passes [a pointer to] those changes off to a coordinator thread which waits until all 4 have finished before adjusting the weights and then telling them to resume with the next epoch. The weight matrix is in the range of 50-100MB, so we really don't want to have to copy it each time around. This is a much more efficient way of achieving this result than anything I can think of without shared data, and I'd love to know if anyone else can see a better solution.

    Sounds like a reasonable medium-scale approach to me - I've done similar things. But you are aware that you're, in effect, using a locking solution? And that shared memory scheme won't scale up to a cluster? (To scale it up, consider whether you can only transmit the diffs to the weight table or change the axis on which you're splitting things up so that you get better data locality. Another possibility might be to compute the weights twice or more in different threads, which trades more computation for less lock contention. Don't know which is right for your case though, since scaling up isn't easy; requires real thought sometimes.)

    I suppose it might help you to have a bit more background. In many types of traditional supercomputer, a lot of effort was put into supporting a shared memory model over very large numbers of processors (e.g., a thousand or so). That's really what made them so stupendously expensive, especially through the '90s. (The CPUs themselves weren't that much better than normal desktop ones by comparison; better floating point units typically, but not by that much.) Of course, it wasn't sustainable; the memory hardware was just too much of a bottleneck (in effect there was a lock for every memory access!) so that had to go and the cluster is now king. But to take proper advantage of that, you have to start minimizing the amount of locking and communication of big memory structures; get that right (with clever algorithms, etc.) and you can go up to internet-scale apps, some of which are so big that we don't usually think of them that way.

"If it ain't broke, don't fix it." - Bert Lantz

Working...