Forgot your password?
typodupeerror
Programming Upgrades IT Technology

Project Aims For 5x Increase In Python Performance 234

Posted by ScuttleMonkey
from the do-stupid-things-faster-and-with-more-energy dept.
cocoanaut writes "A new project launched by Google's Python engineers could make the popular programming language five times faster. The project, which is called Unladen Swallow, seeks to replace the Python interpreter's virtual machine with a new just-in-time (JIT) compilation engine that is built on LLVM. The first milestone release, which was announced at PyCon, already offers a 15-25% performance increase over the standard CPython implementation. The source code is available from the Google Code web site."
This discussion has been archived. No new comments can be posted.

Project Aims For 5x Increase In Python Performance

Comments Filter:
  • by KnightElite (532586) on Friday March 27, 2009 @03:33PM (#27363011) Homepage
    I hope this translates into further speed ups for EVE online down the road.
  • Kill the GIL! (Score:5, Informative)

    by GlobalEcho (26240) on Friday March 27, 2009 @03:36PM (#27363035)

    The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.

    Also, it's based on v2.6, which they are hoping will make 3.x an easy change.

    • Re:Kill the GIL! (Score:4, Interesting)

      by eosp (885380) on Friday March 27, 2009 @03:42PM (#27363131) Homepage

      The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.

      Good luck with that. The last time someone tried that, they slowed Python down by half.

      • by Anonymous Coward on Friday March 27, 2009 @03:50PM (#27363261)

        0.5x slower is like 2x faster, right? Reciprocals?

        • I interpreted it as "now the Python interpreter takes 150% as much time as it used to". The half being added, rather than multiplied.

          • by jd (1658)

            Or it could mean they used half-and-half in the developer's tea, causing them to slow down.

      • Re:Kill the GIL! (Score:5, Insightful)

        by dgatwood (11270) on Friday March 27, 2009 @03:55PM (#27363339) Journal

        The key is to find the right balance of granularity in locking. A big giant mutex is always a bad idea, but having tens of thousands of little mutexes can also be bad due to footprint bloat and the extra time needed to lock all those locks. The right balance is usually somewhere in the middle. Each lock should have a moderate level of contention---not too little contention or else you're wasting too much time in locking and unlocking the mutex relative to the time spent doing the task---not too much contention or else you're likely wasting time waiting for somebody else that is doing something that wouldn't really have interfered with what you're doing at all. Oh, and reader-writer locks for shared resources can be a real win, too, in some cases.

        • Re: (Score:3, Interesting)

          by Viking Coder (102287)

          Hrmph.

          Maybe I'm just drinking kool-aide, but Software Transactional Memory sounds much, much better to me.

          The "D" programming language from Digital Mars sounds very interesting, for example.

          • Re:Kill the GIL! (Score:5, Interesting)

            by Nevyn (5505) * on Friday March 27, 2009 @04:47PM (#27364017) Homepage Journal

            Then you probably want to read: Patrick Logan on why SMT isn't "awesomez" [blogspot.com].

            • Nice food for thought.

              I think all I really need is multi-thread safe Persistence, in my use case, with as little memory duplication as possible, of course.

              Hrm - the hamster is definitely running in the wheel right now...

            • That's a rant on STM by an Erlang fan that has little in the way of rational arguments, but a lot of mentions of the word "wrong" (and also "Erlang"). So?

          • Re:Kill the GIL! (Score:5, Insightful)

            by jd (1658) <imipak@nOSPam.yahoo.com> on Friday March 27, 2009 @04:56PM (#27364145) Homepage Journal

            If developers were working from a clean-slate and didn't have the problems of excessive legacy code to work with, I suspect Digital Mars' D, Inmos' Occam and Erikkson's Erlang would be the three main languages in use today.

            If hardware developers were working from a clean-slate, you'd probably also see a lot more use of Content Addressable Memory, Processor-In-Memory and Transputer/iWarp-style "as easy as LEGO" CPUs.

            Sadly, what isn't patented was invented 30 years too late and 20 years before the technology existed to make these ideas really work, so we're stuck with neolithic monoliths in both the software and hardware departments.

            (Remember, Y2K was worth tens of billions, but wasn't worth enough to get people to stop using COBOL, and that was practically dead. To get people to kick their current habits would need a kick in the mind a thousand times bigger.)

            • The fascinating thing about the LLVM architecture is that you can bolt any language on the front end, and still benefit from a mountain of hardware-specific optimizations on the back end, without the need to figure them out and implement them yourself. Erlang, D, and Occam front ends for LLVM are just some code away... just a shout away, just a kiss away... kiss away... kiss away, hey, hey-ya...
        • Re: (Score:3, Interesting)

          by Secret Rabbit (914973)

          Or one could keep *A* GIL and largely ignore it. Here's the model I would use.

          Separate python into per thread instances yet keep a larger overall memory space to be shared between threads. But, one must explicitly state that they want to go to the global space. That way, when one uses a single threaded application, everything is as it should be, nothing in its way to slow it down. So, those locks won't even get invoked. However, when one is programming a multi-threaded application, then one has the *ch

      • Re:Kill the GIL! (Score:5, Insightful)

        by Just Some Guy (3352) <kirk+slashdot@strauser.com> on Friday March 27, 2009 @04:28PM (#27363757) Homepage Journal

        Good luck with that. The last time someone tried that, they slowed Python down by half.

        Yes, good luck with that! Because the current implementation slows it down by 7/8ths on my 8-core server.

        • Re:Kill the GIL! (Score:4, Informative)

          by Nevyn (5505) * on Friday March 27, 2009 @04:44PM (#27363993) Homepage Journal

          That's funny, because os.fork() etc. work fine on my version of python.

          • They work great here, too, but each process model has its place. There are times when I really, really wish I could use effective threading.

      • Re:Kill the GIL! (Score:5, Informative)

        by Red Alastor (742410) on Friday March 27, 2009 @04:39PM (#27363925)

        Good luck with that. The last time someone tried that, they slowed Python down by half.

        Only because Python uses a refcounting garbage collector. When you get many threads, you need to lock all your data structures because otherwise you might collect them when they are still reachable. This project plans to change the garbage collection strategy first. Once it's done, killing the GIL is easy.

        • > This project plans to change the garbage collection strategy first.

          Shouldn't be too hard, should it?

          Not so long ago I wrote a simple mark&sweep GC over a weekend, with no previous practical experience in this area at all.
        • Re: (Score:3, Interesting)

          by Waffle Iron (339739)

          Only because Python uses a refcounting garbage collector.

          Refcounting itself isn't necessarily the problem; it's using a rudimentary implementation that bogs down. I read a paper a while back where they successfully experimented with a high-tech refcounting gc algorithm specifically because it was amenable to parallel operation on multiple CPUs.

          By using a variety of tricks, they were able to avoid actually having to update refcounts for the vast majority of writes (most notably all stack references), and the mutex acquisition was limited to a couple per thread per

          • Re: (Score:3, Insightful)

            by ultranova (717540)

            I thought it was pretty interesting because reference counting can have more cache-friendly behavior than copying gc or mark-sweep approaches.

            That's Java's biggest problem, IMHO: once the data spills into swap, it'll take forever to run garbage collection.

    • The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.

      Thanks for that. I was about to say, that the main issue for me is the GIL, not interpreter performance. Improvement of both is good, of course, but the GIL can be a show-stopper much more easily.

  • by LingNoi (1066278) on Friday March 27, 2009 @03:36PM (#27363041)

    They say five times faster however it really depends on if they're talking about a European or African Python Interpreter.

    • by ArsonSmith (13997) on Friday March 27, 2009 @03:38PM (#27363079) Journal

      Java spokes person: "5x faster? We already do that."

      Java spokes person to other java people: "(whisper)Hehe, I told them we already do that. Hehe."

      • by rackserverdeals (1503561) on Friday March 27, 2009 @03:56PM (#27363345) Homepage Journal

        I know you're trying to be funny but... If you're talking plain Java vs Python [debian.org], Java looks to be quite a bit faster. You don't have to look hard to find benchmarks that show java is faster [timestretch.com].

        Jython [jython.org] seems to be about 2-3 times faster than CPython [warwick.ac.uk] according to those test.

        This could give CPython the performance edge over Jython, but it still has a way to go to catch up to Java.

        • IronPython too. Not quite as fast as Jython last I eval'd it, but at the time it had plenty of room to improve.

          The only place I currently use Python is embedding the IronPython system in a Mono app, though, so I'll take what I can get.

        • This could give CPython the performance edge over Jython, but it still has a way to go to catch up to Java.

          Except that jdk 1.7 is getting all sorts of improvements that will help with Jython speed... like a dynamic method call opcode and stack-allocated objects.

          So it's doubtful that llvm python will be faster than Jython, or at least not for long.

        • Re: (Score:2, Interesting)

          by kpainter (901021)

          If you're talking plain Java vs Python [debian.org], Java looks to be quite a bit faster

          The first link above refers to Java used with "Hotspot" and it is really fast. If you select the Java Xint, they are a lot closer although Java is still faster. But that "Hotspot" option looks to me to provide about a 10x speed improvement over plain interpreted Java. http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=javaxint&lang2=java&box=1 [debian.org] If Python were to do something similar, I would expect a significant improvement in its performance too.

          • The Xint option is used in very rare cases if you encounter a bug with the compiler. I have never run into one case where I needed it.

            I think some people are working on JIT compilers for Python and other interpreted languages but I'm not sure of the status.

        • by wisty (1335733)

          How is Java faster? If it's a trivial program, than it just doesn't matter. Actually, if it's a trivial program, for your own use, a Pythoneer will write the script and run the interpret (no compile!) before you can fire up Eclipse and type "private static void".

          If we are talking about a non trivial program, then algorithms, data structures, caching, micro-optimization (like re-writing bits in C) and profiling can improve things by many many orders of magnitude. Too bad if the code has so many layers and ad

          • by rackserverdeals (1503561) on Friday March 27, 2009 @06:56PM (#27365767) Homepage Journal

            How is Java faster? If it's a trivial program, than it just doesn't matter. Actually, if it's a trivial program, for your own use, a Pythoneer will write the script and run the interpret (no compile!) before you can fire up Eclipse and type "private static void".

            You know you can write trivial java programs without using an IDE such as Eclipse. I started out in the late 90's writing Servlets in vi and notepad. The time it takes to compile is meaningless. You only need to do it once. You don't have to recompile every time you run the application.

            If we are talking about a non trivial program, then algorithms, data structures, caching, micro-optimization (like re-writing bits in C) and profiling can improve things by many many orders of magnitude. Too bad if the code has so many layers and adapters that any real change will be prohibitively expensive.

            Or they could use any of the many java libraries available so they don't have to write those parts of the code. Since they've been around for years, they've already been optimized.

            The productivity gains of writing fewer lines of code seems stupid to me. Programmers aren't secretaries. I can't type maybe 90wpm but a few lines of code might take an hour to get right. It doesn't matter what the language is.

            • by bnenning (58349)

              The productivity gains of writing fewer lines of code seems stupid to me.

              Correct. The win is *maintaining* fewer lines of code.

              • Correct. The win is *maintaining* fewer lines of code.

                Still I consider that a bogus argument. If you're organization has 50 Java developers, the effort needed to train them to be Python developers is not trivial. Then you can't just rewrite everything because you still have all that Java code to maintain.

                It's not like Python is significantly less lines of codes than Java or anything. Especially now with annotations. Maybe 2x as many LOC [codemonkeyism.com] for a significant increase in performance and using your existing developer pool.

                The example in the link is simple but there

                • by bnenning (58349) on Friday March 27, 2009 @09:18PM (#27367009)

                  If you're organization has 50 Java developers, the effort needed to train them to be Python developers is not trivial. Then you can't just rewrite everything because you still have all that Java code to maintain.

                  Yes, you shouldn't rewrite working Java code in Python just for kicks, or vice versa. I'm not sure how that's relevant.

                  It's not like Python is significantly less lines of codes than Java or anything. Especially now with annotations. Maybe 2x as many LOC

                  I'll agree that 2x is in the ballpark, and I find that to be quite significant, considering that studies have found that developers tend to produce lines of (debugged, working) code at the same rate regardless of language. Doubling developer productivity will very often be worth sacrificing performance, especially when the software isn't CPU-bound. Why do you think Java took over from C?

                  Plus, I don't think fewer LOC means greater maintainability.

                  All I can say is that I've been developing in Java for 12 years and Python for 2, and that's been my experience.

                  Let me give an example using a pizza recipe intead of a programming language.

                  I don't agree with that, because the short version leaves out critical information so of course it's not as useful. What I like about Python is that it largely lets me deal with *only* the stuff that matters to my application. In my questionable metaphor Python would be "Bake at 400 degrees for 15 minutes", and Java would be "Turn the temperature dial to 400, open the oven door, insert the pan in the oven, close the oven door, wait 15 minutes, open the oven door...". Ok not quite that bad, but the essential details are often obscured by unimportant boilerplate. And yes, you can get tools that automatically create and hide some of it, but that should just make you question why the language can't do that itself.

                  The main problem I see though. In 5 years, a lot of those Python developers are probably going to be working in a different language all together.

                  A fine argument for COBOL :)

                • The difference is more like between:


                  Prepare the bread.
                  Put the sauce on the bread.
                  Put the cheese on the sauce on the bread.
                  Bake.

                  And:

                  define PizzaDoughFactory : AbstractDoughFactory{
                          sub PizzaDoughFactory( PizzaDoughFactory cls, Integer thickness ){
                                  cls.AbstractDoughFactory( thickness )
                          }

                          sub Sauce ( PizzaDoughFactory cls, Topping top){
                                  cls.toppings = org.coolpace.JavaSmart.List( -1 )
                                  cls.toppings.appendToTop( top )
                          }
                  }

                  define PizzaCreator : AbstractApplication {
                          def main( Integer argc, String *argv ){
                                  new pizza = PizzaFactory()
                                  pizza.set_dough = PizzaDoughFactory()
                                  sauce = SauceFactory()
                                  cheese = CheeseFactory()
                                  pizza.dough.Sauce( sauce )
                                  pizza.dough.Sauce( cheese ) // historically all toppings are called sauces as well
                                  new ready_pizza = PizzaBakery( pizza )
                          }
                  }

      • by meringuoid (568297) on Friday March 27, 2009 @04:27PM (#27363749)
        Joking aside, though, I find this target to be overambitious. Speeding up by a factor of three would be plausible; two would be OK, but I'd hope they'd keep working on it to get it up to three. Four strikes me as unlikely, and five is right out.
        • by ArsonSmith (13997)

          I mean if I went around claiming to be faster just because I was hard coded in C they'd put me away. We have to take it in terms that a JIT engine can optimize code in real time much better than a precompiled binary. Now you see the slowdown inherit in the system.

        • I think the speed improvements will be reached in stages, roughly equal to:

          • x1
          • x2
          • x5
  • by Max Romantschuk (132276) <max@romantschuk.fi> on Friday March 27, 2009 @03:36PM (#27363043) Homepage

    I read about what they intend to do, and they seem to have quite a few interesting ideas... But there are also major drawbacks:

    - No Windows support (apparently a Linux-only VM in the plans)
    - No Python 3.0 support

    And thus no guarantees most of the work will merge back into CPython.

    But competition is good, I can't really see a problem with having an alternative faster Python runtime, even if it's not as compatible as CPython. :)

    • Re: (Score:3, Informative)

      by ianare (1132971)

      - No Python 3.0 support

      They are using v 2.6 which has been designated as the official migration step towards 3.0. So it should be easiy to port over to 3.0, anyway right now very few projects are using 3.0.

      • by maxume (22995) on Friday March 27, 2009 @04:00PM (#27363423)

        It might be easy to port over to 3.0, but not because it is using 2.6. Basically, they are planning on ripping out a big chunk of the internals of 2.6 and replacing it with a LLVM based system. To the extent that those internals changed for 3.0 (there wasn't necessarily effort put into making them compatible across 2.6 and 3.0...), the code would need to be updated for 3.0. The python level portability between 2.6 and 3.0 isn't a huge factor for something like this.

        They are targeting 2.6 because that is what made sense for Google (who is paying for the work). Or so they say:

        http://code.google.com/p/unladen-swallow/wiki/FAQ [google.com]

    • Re: (Score:3, Interesting)

      I'm not quite sure what benefits this gives that Psyco doesn't already.

      • by MightyYar (622222) on Friday March 27, 2009 @03:50PM (#27363273)

        Psyco is x86 only and uses a lot of memory. It also requires additional coding... you have to actively use it, so you don't automatically get the speedup that a faster interpreter gets you. You also have to pick-and-choose what you want to get compiled with Psyco - the extra overhead isn't always worth it.

        To be fair, I don't know what the memory requirements of this new project are.

        • Psyco may be x86-only, but this is Linux-only. That kills a lot of the appeal this might have in much the same way.

          • by MightyYar (622222) on Friday March 27, 2009 @08:57PM (#27366863)

            I think it's only Linux-only right now, because the developers currently use Linux. But they consider loss of Windows support a "risk", not a design goal:

            Windows support: CPython currently has good Windows support, and we'll have to maintain that in order for our patches to be merged into mainline. Since none of the Unladen Swallow engineers have any/much Windows experience or even Windows machines, keeping Windows support at an acceptable level may slow down our forward progress or force us to disable some performance-beneficial code on Windows. Community contributions may be able to help with this.

        • by schmiddy (599730)

          Psyco is x86 only and uses a lot of memory

          Even worse, Psyco is 32-bit only [sourceforge.net] : Psyco does not support the 64-bit x86 architecture, unless you have a Python compiled in 32-bit compatibility mode. There are no plans to port Psyco to 64-bit architectures. This

          However , as far as "requires addition coding", I think you're a little off-base.. unless you consider "import psyco" to be a lot of work.

      • by bnenning (58349)

        Psyco only works for 32-bit x86, and many Python features are unsupported [sourceforge.net].

      • by Tumbleweed (3706) * on Friday March 27, 2009 @04:11PM (#27363567)

        I'm not quite sure what benefits this gives that Psyco doesn't already.

        It doesn't get as stabby.

      • You can still have Janet Leigh over for a shower.

    • Or BSD, or several other important platforms.

      • Re:No windows (Score:4, Informative)

        by Anonymous Coward on Friday March 27, 2009 @04:03PM (#27363461)

        Quite to the contrary, the FreeBSD guys have been building with clang [llvm.org]+llvm [llvm.org] for a while now, and they seem to like it [freebsd.org]. The kernel boots, init inits, filesystems mount, the shell runs.

        What other platforms, Darwin? Apple employs the largest number of LLVM developers. Windows? Both MinGW and Visual Studio based builds are tested for each release.

        It's still not as portable as the python interpreter, but that will come if and when developers who are interested in working on it start to contribute.

    • Re: (Score:3, Informative)

      by orclevegam (940336)

      - No Windows support (apparently a Linux-only VM in the plans)

      The article says it's going to be based on LLVM which most definitely is cross-platform (and being touted as the logical successor to GCC). Unless they go out of their way to use some Linux only calls while implementing their Python VM on top of LLVM it should be trivially easy to get it running in Windows.

    • by negative3 (836451)

      From what I've seen, Python 3.0 is not supported by a good number of Python packages whereas Python 2.6 is which would make the "no Python 3.0 support" a minor issue for me. Python 3.0 is also not shipping as the default interpreter for Fedora, Ubunutu, or openSuSE yet so it won't really affect basic users for a while. I have also seen benchmarks (but I don't have references, so I welcome contradictions and corrections) that show that 3.0 is considerably slower than 2.6 so if the speed of Python is an iss

    • by samkass (174571)

      Now that JDK7 is adding invokedynamic, it would be interesting to see this target the JVM instead of LLVM. The JVM is ported everywhere and is extremely fast. I smell some upcoming bake-offs...

      • by ishobo (160209)

        Except the JDK uses GPL while LLVM uses a modifed BSD license (hence why a few project are hoping to replace gcc with clang). Lack of reciprocity is the key if this is intended to be imported into CPython.

  • FTFA:

    Adopting LLVM could also potentially open the door for more seamlessly integrating other languages with Python code, because the underlying LLVM intermediate representation is largely language-neutral.

    So much for Parrot.

    • by Abcd1234 (188840) on Friday March 27, 2009 @04:03PM (#27363471) Homepage

      Not really. Parrot is a much higher-level VM, providing things like closures, multiple dispatch, garbage collection, infrastructure to support multiple object models, and so forth, whereas LLVM really models a basic RISC instruction set with an infinite number of write-only registers.

      In fact, it would make a fair bit of sense to actually use LLVM as the JIT-compiling backend for Parrot...

  • IronPython speed (Score:3, Informative)

    by icepick72 (834363) on Friday March 27, 2009 @04:03PM (#27363459)
    Word has it [slashdot.org] that Microsoft created a speedy IronPython implementation on their Common Language Runtime and JIT technology for .NET. Here are benchmarks for it [codeplex.com]. Failing to find similar benchmarks for comparison; can anybody else contribute to this info?...
  • by Theovon (109752) on Friday March 27, 2009 @04:04PM (#27363475)

    It sounds like that they're going to take Python, which is already gets translated to some kind of p-code (right?) and either translate the original Python or the p-code into LLVM code, which is then JIT-compiled to the native architecture.

    The translation from Python to LLVM is going to lose some specificity and require that extra code be added to implement whatever needs to be done in Python that isn't trivially implemented by LLVM. Then the LLVM code needs to be compiled to native, introducing yet more "glue" code in the process.

    Wouldn't a more direct compile yield a better result?

    And don't give me any junk about compiling dynamic languages. LISP and Self are highly dynamic languages, yet they're compiled. If they can be compiled, then so can Python. I mean, the fact that it can be done through multiple levels of translation proves that it can be done, although possibly inefficiently. I just think that a more direct approach would reduce some of the superfluous glue code and a variety of other inefficiencies in translation that result from a loss of knowledge about what the original program was actually trying to implement.

    • by Abcd1234 (188840) on Friday March 27, 2009 @04:21PM (#27363689) Homepage

      Wouldn't a more direct compile yield a better result?

      No, it wouldn't.

      The entire point of LLVM is that it provides an easy-to-target machine (it's basically a RISC instruction set) that you can use as your intermediate representation (the p-code you described). You then use the LLVM backends to compile the IR down to machine code. And because of the way the IR is structured (for example, it has write-only registers, which makes certain classes of optimizations much easier), you can do a really good job of optimizing.

      Basically, you "direct compile" to the LLVM IR, and then let LLVM take care of the details of generating the machine code. This gives you better abstraction (no more machine-specific code generation in Python itself), portability (to whatever LLVM targets), and you get all the sophisticated optimization that LLVM provides for free. That's a huge potential win.

    • by MtHuurne (602934) on Friday March 27, 2009 @08:59PM (#27366873) Homepage

      The Python object files are just a more convenient way to store the program compared to text files. No information is lost or glue is added in that first step.

      LLVM is, like its name suggests, really low level. You should think of it as a kind of portable assembly. It's much closer to actual hardware architectures than for example Java byte code. I don't expect much overhead from the LLVM to native step. A while ago I ran some tests with C++ compiled by GCC directly to native and compiled by GCC to LLVM byte code and then by LLVM to native; sometimes one approach was faster and sometimes the other, but they were pretty close.

      So that leaves the glue added in the Python object to LLVM step. I expect this to have a significant overhead, but I don't see it becoming a smaller overhead by going directly to native. The advantage of using LLVM is that you only have to write this step once, instead of once for each architecture.

      With LLVM it is possible to compile parts of the interpreter to LLVM byte code in advance and then inline that into the program being JIT-compiled. That way, you can be sure that the JIT and the interpreter actually do the same thing. Apple did this for their OpenGL driver, there is a nice presentation (PDF) [llvm.org] about it.

  • Binspam (Score:5, Funny)

    by Thelasko (1196535) on Friday March 27, 2009 @04:09PM (#27363553) Journal
    I get emails claiming to increase my python's performance all of the time, I just delete them.
    • by oldhack (1037484)

      I get emails claiming to increase my python's performance all of the time, I just delete them.

      Then why is your pants smoking?

  • Any Hope? (Score:2, Funny)

    by Anonymous Coward

    Is there any hope that we will move away from these boutique programming languages and back to "real languages" that seriously consider size and performance?

    I for one am completely sick and tired of 3Ghz multicore processor machines with gigabytes of RAM running like a 486. Languages like Python don;t help in the bloat arena and the scripting languages made out of frameworks on top of other scripting languages are just ludicrous!

  • I do my best here not to offend, but I can see clearly now why I don't use Python.

    I keep getting pressured by others to adopt it rather than my C or C++ but if they are touting a possible 5x increase, that means it was really, really slow to begin with. And how much further is there to go? I suspect it is not even worth benchmarking it yet.

    Since all I mostly do is big matrix and vector work why would I use python? And no, scipy doesn't count as I can get MPI going pretty quickly.

    Yes, I realise the right

    • Re: (Score:2, Insightful)

      by zindorsky (710179)

      Yes, I realise the right tool for the job argument.

      Exactly. Most applications are not CPU bound. If yours is, then I don't know why others are trying to get you to use Python.

    • It all depends (Score:5, Insightful)

      by mkcmkc (197982) on Friday March 27, 2009 @04:38PM (#27363905)

      I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.

      Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.

      My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.

      (Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)

      Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.

      (Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)

      • Re: (Score:2, Flamebait)

        by master_p (608214)

        (Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)

        I smell bullshit. There is no overhead from using STL containers.

        If you used an std::list or an std::map for random access, then you certainly had a bottleneck, because those containers are not for random access.

        If you used an std::vector, you couldn't have a bottleneck, for the simple rea

        • by tkinnun0 (756022)

          If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.

          If you say so, but try to tell that to the compiler. Or rather, try to let the compiler figure that out.

        • Re:It all depends (Score:4, Informative)

          by mkcmkc (197982) on Friday March 27, 2009 @06:55PM (#27365733)

          I smell bullshit. There is no overhead from using STL containers.

          If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.

          That was my impression, too, but careful timing and profiling suggested otherwise.

          In addition, we can by simple reasoning determine that there's gotta be some overhead involved with vector implementations. First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes. Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage. It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down. ("more or less" because one can imagine certain optimizations that might be possible if you somehow knew an upper bound on the vector's lifetime size)

          All of this stuff costs you in time and space.

          Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time.

          Most of the time these minor effects don't matter, but if it's in the innermost loop and is going to run billions of times, it can be quite noticeable.

          It could conceivably be that gcc's implementation of STL is a little slow. Doesn't matter why, though, because that's my target, and that's where my program has to run.

          It's been a while since I went through this exercise, so I don't have the exact scenario. But the code is GPL'ed and available here [greylag.org]. If you can replace any of the arrays with an as-simple, as-fast use of vectors, I'd be happy to have it.

        • by kramulous (977841) *

          No bullshit ... at least in my case. std::vector does a lot of array bounds checking and various other things that involve 'if' statements. You don't want them inside large loops. So I write my own vector classes - I make assumptions.

          Now, I used to have some kickarse vector (and matrix) templates but have had to ditch them with the release of the Harpertown processor because templates don't vectorise. This was ok for the Clovertown but the 256 bit wide register in the Harpertown (as opposed to the 128bi

          • Kramulous: Where's a good place to learn about this stuff?

            • by kramulous (977841) *

              That's what sucks .. I haven't seen a definitive place yet and so far has been stumbling on the ieee website, intel site (whitepapers) and just generally playing with the intel compiler.

              Give me a little more time (a day or two) while I chase up the urls of some of the pdfs and codes I have :)

              I've been meaning to put this stuff together for a while now with code fragments to back it up.

        • I smell bullshit. There is no overhead from using STL containers.

          Ehm... that's a nice theory, but I second GP's experience in finding otherwise in practice. I don't know the specific reasons -- maybe there were memory fragmentation issues and it wasn't really STL's "fault" -- but I was doing some large-for-my-laptop (with 3GB ram) data processing, and initially used vectors for everything. I eventually had to give up and rewrite it all with arrays just like GP, because I kept having difficult-to-debug and impossible-to-fix memory issues as a result.

          Really, why would so

    • by portscan (140282)

      if all you are doing is linear algebra, why would you use C++ instead of fortran, which is still top dog in that area.

      • by kramulous (977841) *

        It's been on the agenda for a while. I just haven't learnt all the little tweaks yet. But you are right. It amazes me how often another white paper comes out with further compiler optimisations for fortran. 40 years of optimisations (at least!) have to make it the superior language.

    • by daver00 (1336845) on Saturday March 28, 2009 @03:38AM (#27368655)

      The thing about Python is you are replacing every lost hour in runtime with a day gained in development time. That is the point of Python. Numpy (formerly scipy I think) is mostly written in C anyway and provides fast n-dimensional array objects for vector and matrix operations, there are really only a few bottlenecks for maths/science purposes. Generally anything that is going to take a seriously long amount of time you would be doing in C over anything else anyway, what Python is is a viable alternative to Matlab etc, and a damn sight less expensive!

      Where I study Engineering they teach Python for this very reason. It has a gentle syntax which appeals to engineers and scientists who often aren't bargaining to become coders, and it is so much cheaper than Matlab that any missing features are rendered a moot point.

      Seriously, sitting on the sidelines and saying "I'm not gonna use Python because it is slow" is silly, it is so damn easy to code in python that you would learn it in a weekend if you already have coding experience. And as I said before, any lost time running python scripts over other languages is made up ten time over at least in the ridiculously short development times that go with Python scripts. Yes, it really is THAT easy to do anything in Python, there is a reason people bug you to try it out. Just give it a weekend, Python deserves it!

  • So whatever happened to 'Stackless' Python? Is that ever going to be merged into CPython? And would it work with this?

  • by Animats (122034) on Friday March 27, 2009 @06:24PM (#27365365) Homepage

    This is disappointing. Shed Skin [google.com] has shown speed improvements of 2 to 220x over CPython. Going for 5x over CPython is lame. But Shed Skin is a tiny effort, and needs help.

    PyPy got a lot of press, but they tried to do an optimizing compiler with "agile programming" and "sprints", and, at six years on with substantial funding, it's still not done.

    The fundamental problem with running Python fast is its gratuitous dynamism. In CPython, almost everything is late-bound, and most of the time goes into name lookups. This makes it easy to treat everything as dynamic. You can store into the local variables of a function from outside the function, for example. In order to make Python go fast, the compiler has to be able to detect the 99.99% of the time when that isn't happening and generate pre-bound code accordingly.

    Dynamic typing requires similar handling. Most variables never change type. Recognizing int and float variables that will never contain anything else creates a significant speedup. In CPython, all numbers are "boxed", stored in an object structure. This is general but slow.

    CPython is nice and simple, but slow. Serious speedup requires global analysis of the program to detect the hard cases and generate fast code for the easy ones. Shed Skin actually does this, but has to place some limitations on the language to do it. If someone did everything right, Python could probably achieve the speed of C++.

    There's also the problem that if you want to be compatible with existing C modules for CPython, you're stuck with CPython's overly general internal representation.

    • Re: (Score:3, Informative)

      # Maintain source-level compatibility with CPython applications.
      # Maintain source-level compatibility with CPython extension modules.

      vs.

      Shed Skin will only ever support a subset of all Python features.

  • by lkcl (517947) <lkcl@lkcl.net> on Saturday March 28, 2009 @11:06AM (#27370475) Homepage

    The experimental combination of the Python-to-Javascript compiler, http://pyjs.org/ [pyjs.org] and the Python Bindings to Google's V8 Engine, http://code.google.com/p/pyv8 [google.com] brings a ten times performance increase over standard python, already.

    not - "10% now and 5x in the future" - that's a 1000% increase NOW.

    When V8 supports the ECMAScript "Harmony" standard, which will include support for basic integer types, then there will be "correct" support in the PyJS + PyV8 combination for numerical types, and the word "experimental" can be dropped.

    http://pyjsorg/ [pyjsorg] also includes an experiment showing the bindings of the PyJS compiler with the Python-Spidermonkey project. The spidermonkey JS engine has the advantage of running on generic platforms instead of just ARM and 32-bit x86 platforms, but has the disadvantage of being slightly slower.

    Javascript is a _really_ interesting language that makes it in many ways highly suitable as an intermediate compiler language for compiling dynamic languages as Ruby and Python.

"How to make a million dollars: First, get a million dollars." -- Steve Martin

Working...