Time to Get Good At Functional Programming? 620
prone2tech writes "From an article at Dr. Dobb's: Chipmakers have essentially said that the job of enforcing Moore's Law is now a software problem. They will concentrate on putting more and more cores on a die, and it's up to the software industry to recraft software to take advantage of the parallel-processing capabilities of the new chips. As is argued in this article, this means becoming proficient in parallel functional programming. The bad news? Getting good at functional programming is hard, harder than moving from iterative Pascal or Basic or C coding to object-oriented development. It's an exaggeration but a useful one: When you move to FP, all your algorithms break.'"
Why would the Algorithm break? (Score:3, Interesting)
I don't see why an algorithm would break just because you're changing language type, the whole point of an algorithm is that it's programming language independent.
it's always a good time to try functional (Score:5, Interesting)
It's been said in the comments on slashdot many times. Learning functional programming techniques will improve your programming skills. There are many good functional languages out there, and many have imperative features for ease of transition. No functional will not solve all of your problems, but it will give you that most valuable of all lessons, how to think about a problem _differently_.
You don't need an excuse, start today.
Question (Score:3, Interesting)
So a quick question before I go and master FP...does the compiler automatically compile the code that can be done in parallel in the proper "way", or do I have to specify something?
Also, if I rewrote an app written in an imperative language to a FP one like Haskell, would I see a that much of a difference on a multi-core processor?
Re:it's always a good time to try functional (Score:3, Interesting)
What language would you recommend for an OO programmer to start with?
Re:it's always a good time to try functional (Score:5, Interesting)
You don't need an excuse, start today.
The excuse is: it's fun. But if you do start, choose the right language for the job. Python for example seems good for fp, but was not designed for the task. Don't choose a language that simply supports functional programming. Choose a language that was designed specifically for functional programming. You'll be happier in the long run when you don't run into limitations of the language you choose.
My 2c.
Re:Convince your boss. (Score:5, Interesting)
Another question you might ask yourself is, are you going to let the CPU designers push you into a programming paradigm you are not effective in? Personally, I can see a machine being quite useful with say, 16 or 32 cores just because these machines do more than one thing at a time. But I'd much rather see them speed the cores up than endlessly multiply the number of them. There is a *lot* of room left to do this. Three D architectures offer more connectivity than is currently being used, and both the number and type of one-cycle instructions within a CPU can be increased until the summary is "all of 'em", which I doubt they're going to ever get to, orthogonality can be increased until again, the answer is that all instructions are available to the same degree for all registers and addressing modes no matter what. Compilers like broad orthogonality (and so do assembly programmers, not that there are a lot of us left.)
If CPU designers run off towards the horizon making many-core designs, what if no significant number of people follow them there? Which... frankly... pretty much seems to be the case. I've an 8-core machine, and just about the only things that actually use those cores together are the easy ones: graphics and waveform encoding/decoding. Aperture sure enough uses all my cores in a well-balanced fashion and can build a JPEG in a snap; but that's a far cry from my web browser doing the same thing while trying to render a page.
I'm just saying that the direction the CPU folks have chosen lately doesn't have to be the direction we actually end up going, and there are points in favor of this as the best choice.
Just some thoughts.
Re:Question (Score:3, Interesting)
The way it should work is that if you're implementing a pararell algorithm like merge sort http://en.wikipedia.org/wiki/Merge_sort [wikipedia.org] it should just go and get a new processor core each time you make a recursive call.
Re:Broken Algorithm BS (Score:1, Interesting)
Turing complete doesn't mean the running times are the same. Your algorithms don't break in the sense that they no longer produce the correct output. They break in the sense that their running time is multiplied by some polynomial function of the input size; i.e. they are broken for practical purposes.
Sounds like BYTE magazine in 1985 (Score:5, Interesting)
Look at the table of contents of this BYTE magazine from 1985 [devili.iki.fi]. In a nutshell it said the same thing as this article: Functional languages are the great hope for solving the parallel programming problem. Only then the languages were different: Hope, Linda, and Prolog were among them.
My response back then was to get excited about FP. My response now is: Where is the proof? Can anyone name a single instance where a functional paradigm has yielded the best measured performance on a parallel computing problem? In other words, take the best functional programmers in the world, and pair them up with the best tools in existence. Can they actually create something superior, on any problem running on any hardware? This is a very low bar, but until it's demonstrated FP will be confined mostly to the lab.
IMHO the path forward is to treat parallel programming like just another optimization. As we know, the vast majority of your code doesn't need to run fast, and you get most of the performance benefit by optimizing small bits of code that really matter. I suspect the same thing will happen with parallel programming: In a given application only a few areas will benefit much from parallelism, and these tasks will probably be very similar across applications. Graphics rendering, large matrix math, video encoding/decoding, and speech recognition would be examples. People will treat these as special cases, and either develop special-purpose hardware (e.g., GPUs), or libraries that encapsulate the nitty-gritty details. The interesting question to me is what is the best runtime model to support this.
example (Score:5, Interesting)
As an example of the learning curve, I wanted to learn a little OCaml. I played around with this [inria.fr] insertion sort example. I used it to sort a list of 10,000 integers, and it took 10 seconds, versus <1 second in C with linked lists. Not too horrible. But changing it to 100,000 integers made it die with a stack overflow, so I'm guessing that its memory use goes like n^2. However, it's not at all obvious to me from looking at the code that this would be the case. I think if I wanted to do a lot of OCaml programming I'd have to develop "FP Eye for the Straight Guy." Probably if you wanted to make it perform better on big arrays you'd want to make it tail-recursive, but it's not totally obvious to me from the code that it's *not* tail-recursive; although the recursive call isn't the very last line of code in the function, it is the very last thing in its clause...?
I know of at least one well known OSS project in Haskell, written by a very smart guy, that is really struggling with performance issues. I wonder whether bad performance is to FP as null-pointer bugs are to C. Sure, a sufficiently skilled programmer should theoretically never write code that will dereference a null pointer, but nevertheless my Ubuntu system needs a dozen security patches every month, many of which are due to null pointers, buffer overflows, etc.
Re:Why would the Algorithm break? (Score:2, Interesting)
The reason quicksort is quick is that in a language with modifiable random-access arrays, partitioning can easily be done in-place (without having to copy the array). You can't do that in functional programming.
Re:Broken Algorithm BS (Score:3, Interesting)
Moore's law dealt specifically with hardware. To say that "Moore's law is now a software problem" shows as much of a misunderstanding as attributing "any mention of Nazis causing someone to immediately lose the argument" to Godwin's Law[1].
It's reasonable to suggest that increasing the speed at which software runs will start requiring learning multi-threaded programming techniques, but to say that software will allow Moore's Law to continue is incorrect.
The idea that performance will increase at the same rate as doubling of transistors is attributed to a colleague of Moore. I have not heard of a moniker for this "law."
[1] Godwin's law states that as a USENET thread increases in length, the probability of a comparison to Hitler or a Nazi goes to 1. Godwin's law states nothing about the thread ending or about a winner or loser.
Re:Sounds like BYTE magazine in 1985 (Score:3, Interesting)
Functional Programming is not a buzzword that is supposed to be better at paralellism. When coding in a stateless fashion (what FP is all about), function reduction can be split transparently across many computers. There are no locks, no funny semaphores and mutexes, no deadlocks, no nothing. It just works, because of its uncoupled nature of "no side effects, ever".
There is one kind of early optimization that is not premature: architectural optimization. If you design your whole system to be synchronous, you trade off scalability for simplicity. If you design your whole system around the imperative paradigm, there will be a significant amount of work involved in making things work in parallel environments. There is no amount of later optimization that will fix this kind of architecture issues.
Functional design is an architectural decision.
Re:Broken Algorithm BS (Score:4, Interesting)
Think of it as being "Moore's law is now also a software problem".
In the past, Moore's law meant that you could buy new hardware and have your stuff go way faster without any further work.
Now, Moore's law mainly means that you get more parallelism. Without software work, this means that your stuff runs at the same speed it used to. Thus software also needs to change in order to obtain the benefit.
Re:Convince your boss. (Score:5, Interesting)
Actually, the reason for this is because of the heat consumption. As the power of a chip grows, the heat consumption grows much faster, and more cores are a much better way to get more speed with less power consumption and heat.
Re:Broken Algorithm BS (Score:5, Interesting)
Well we've been using the same basic single-core architecture for the last, what, 30 or 40 years? Now programmers have a much bigger challenge in front of them - taking a program and make it work in a new environment.
I don't honestly believe there's been much in the way of innovation in the programming world for a while. Sure, you might have new coding languages that can do some things better than others or process it a different way, but don't they all operate on the same basic principle? Now programmers are faced with a complete paradigm change - the old style of programming isn't going to cut it 10 years from now when everything from your computer to your coffeemaker has a multi-core processor.
Engineers deal with stuff like this all the time. More than a few programmers use the term "software engineer". It's finally time for them to prove they can live up to that name and innovate.
Re:Convince your boss. (Score:2, Interesting)
Lots of tasks could be done in parallel.
* Voice recognition
* Image recognition
* File search
* Text search
* Find&Replace
* virus scanning
* video games
The better question is: How many CPU intensive applications run best with sequential execution?
I think the list is pretty small.
----
All this talk about functional programming is just hype. We have had multithreading for a LONG LONG time. Until now multithreading a simple search would have been pointless, multiple threads on a single cpu? absurd. As multiple cores becomes more standard I think we will re-evaluate many common programming practices and in fact see that doing them in parallel is very natural.
that hasn't materialized, though (Score:5, Interesting)
Auto-parallelization of functional programs has been proposed for decades now, and every attempt has fallen on its face as the overhead has killed any gains. Current parallel FP research isn't even putting that much effort into auto-parallelization, because most PLs researchers consider it a dead end---taking a random FP and evaluating all its thunks in parallel as futures, or some similar mechanism, is not going to solve the problem.
Instead, most of the current research is in programmer-level primitives for designing and specifying inherently parallel algorithms. There is some of this in both the FP and non-FP communities.
Re:Convince your boss. (Score:4, Interesting)
More CPU horsepower is the obvious choice for non-bus limited operations, but things are starting to get expensive there, so I welcome a few extra cores. The real question is if Intel and AMD will save some cash from their MMC(massive multi-core) projects and deliver a more sensible number of faster cores. You just can't depend on a given user's programs being set up to run efficiently in parallel.
Re:Broken Algorithm BS (Score:3, Interesting)
It's rather funny that you say that the same hardware architecture has been used for the last 30 to 40 years and then observe that there hasn't been any innovation in the software world "for a while".
Multicore isn't an innovation, it's a product of a lack of innovation in the hardware world.
Re:Formal Methods Initiative (Score:1, Interesting)
And functional programming will only make it *simpler*, not harder.
I hear this sentiment quite frequently, but only from grad students and professors that specialize in the area of compilers/programming languages. The rest of us, who suffer through these courses only to fulfill a requirement, are firmly convinced these peers are quite deluded.
Re:parallel algorithms, mutable data, and STM (Score:1, Interesting)
Re:Convince your boss. (Score:5, Interesting)
personally, i think in terms of commodity computing, we don't really need to squeeze any more power out of the CPU than we've already got. use fully pipelined superscalar architectures, perhaps multithreaded dual or quad cores (for high-end workstations) and VLIW to maximize ILP efficiency. even at current processor speeds, 99% of the applications people (especially casual computer users) use have bottlenecks elsewhere (like memory & disk I/O speeds, internet bandwidth, user response time, etc.).
for the really resource-intensive stuff, like image/video/audio processing, cryptography, 3D graphics, CAD/engineering applications, scientific modeling, processing/manipulating financial data, etc. you would be much better off using a specialized dedicated vector coprocessor (as opposed to a general-purpose scalar processor that commodity CPUs tend to be). this way you can have a relatively low-power (and low clock rate) CPU for processing common SISD instructions that constitute 99% of all computing tasks, greatly cutting the cost of consumer and pro-sumer systems. and by using highly specialized coprocessors to handle the heavy lifting, those applications can be processed more efficiently while using less power (and at lower clock speeds) than trying to get a general-purpose scalar CPU to do the same work.
that is why GPGPUs are generating so much interest these days. it just so happens that most of the really processor-intensive applications consumers run greatly benefit from stream processing. game developers have long taken advantage of dedicated vector coprocessors with highly-specialized instruction sets and architecture made specifically for 3D gaming. DSPs with specialized architectures are also commonly used for hardware-accelerated video encoding, audio processing, etc. and now companies like Adobe are also seeing the advantages to using specialized vector coprocessors for their resource-intensive applications rather than having the CPU handle it.
and, honestly, how many different kinds of processor-intensive applications do most users run on a regular basis? if you're a graphic designer, your main processing power concern is only in relation to 2D/3D graphics. if you're an audio engineer or musician, then you're only going to use audio-related resource-intensive software. likewise, if you're a cryptographer/cryptanalyst, you probably won't ever run any audio editing software or 2D graphics software. therefore, it makes sense to pair moderately powered general-purpose scalar CPUs up with a powerful & highly specialized vector coprocessor like a GPU/DSP/Stream Processor/etc.
Re:Convince your boss. (Score:3, Interesting)
You're cherry picking your data there (compilers, etc). To see what's out there, we must look at want is commonly done and if those things can benefit from parallel processing (and I see lots and lots of places, including the browser, where things could go parallel).
When we do that, we notice that what goes on in the gaming industry, soon becomes standard everywhere else. And both modern platforms, the PS3 and XBox 360 (I'm not including the Wii as Nintendo has different goals that having bleeding edge tech.) have multi-core processors. Radically different architectures, but multi-core none-the-less. We are also seeing this, and have been for a while, multi-core entering the desktop.
This isn't a coincidence. Moore's law has effectively ended for individual processors. Anything that might go against that is nothing more than a *very* short term dodging the inevitable. Multi-core is the only thing that is going to see our computers get faster. This does make it a software problem. And a problem it is.
Why? Because for the longest time threading has been available, but basically no-one has been teaching it nor developing better techniques to do it. So, we have grossly under experienced/under trained people out there that are now having to deal with the reality of the situation. This is going to cause some initial problems and lots of people are going to bitch about it. But, that is relatively short term.
The gaming industry will work out many of the problems and will share a lot of that information. Insomniac Games is a good example of this. But, there will still be growing pains. But, it shouldn't be perceived as anything else but growing pains.
1985 vs.2008 - no more free speedup (Score:3, Interesting)
The difference between 1985 and today is the end of the free lunch in clock rates. 1985 was roughly the middle of the last great clock rate stall - during the shift from minicomputers to micros, Moore's Law expressed itself not as speedup, but as decrease in cost per bit/gate.
Then we entered the great die-shrink and clock rate increase era, where for about 20 years processors really did get twice as fast every 12-18 months. Why expend the effort to bury a problem in parallel hardware when you can see faster serial hardware coming over the hill?
Clock rates have stalled again, and we're reaching physical limits for our current fabrication methods and physical chip designs. We're seeing renewed interest in functional programming because it looks like a way to make use of parallel hardware safely compared to imperative coding. Traditional coding methods are easier to understand, and are probably more efficient when they work, but...
how fast do you want the wrong answer?
Re:Which is more likely? (Score:3, Interesting)
Corollary to B:
Visual Studio 2010 will make parallel and multithreaded programming easier to accomplish. Essentially, instead of just
for(x=0;x1000;x++) dosomething();
you'll have
parallel_for(x=0;x1000;x++) dosomething();
and 2-1000 threads will kick off at once. I'm sure there can be thread pooling, locking, etc., but if the code is segmented well enough then parallel programming for 32 cores is probably not that traumatic for 90% of the world's code needs. Get into some high performance stuff or time-critical code, and you're probably already past the point where VS2010's new parallel code library is interesting.
Re:Convince your boss. (Score:5, Interesting)
That's not really true. Supporting instruction level parallelism is reasonably straightforward - you just analyses the per-instruction data dependencies and output things ordered in such a way that the hardware instruction scheduler can do it's thing. There's only one problem - optimally ordering a set of instructions is NP-hard. So if you want to do a good job ordering 5 instructions you're fine, 15 instructions is obnoxious, and 150 instructions is basically impossible.
Further, instruction level re-ordering doesn't change the basic algorithm. If you're compiling a heapsort routine, the re-ordered instructions are still going to implement heapsort. And heapsort doesn't parallelize well. Maybe you could build logic into your compiler to detect heapsort and automatically replace it with parallel quicksort, but that doesn't help you when you run into a non-sorting routine.
Realistically, programmers have to write parallel code for many-processor platforms. It's not amazingly difficult (given reasonable training and a reasonable set of tools), but it is different. But it's not something that's going to go away when the gcc guys / java team / VS.NET team implement some clever optimization.
Re:Convince your boss. (Score:3, Interesting)
It's true that standard hash algorithms are generally serial in nature. Luckily, you can use well-tested hash algorithms to build hash trees and get the same security properties with excellent parallelism. Some modern hash algorithms are even designed for tree modes to support parallel computation.
Re:Convince your boss. (Score:3, Interesting)
You don't have to be CPU-bound to benefit from extra cores. More cores means less OS-driven task switching, since all of the active processes can be isolated. The result is (slightly) lower latency for everything involved.
Plus, you're underestimating the obvious parallel processes on a machine. Right now I have around 50 Firefox tabs open. I'm struggling to close them fast enough (I tend to hang onto amazon.com pages and whatnot), but what if I was using a browser that separated that into 50 separate processes (instead of the Firefox approach, which is to hang whenever I open too many Youtube windows)? They're all periodically vying for CPU time when their Ajaxified Web 2.0 Timer goes off and re-downloads crap.
Why some languages don't catch on. (Score:3, Interesting)
Between picking a programming language that is powerful (expressive/concise) for all the code you have to write (insert random FP language here) and picking a programming language that is powerful for all the code you don't have to write (perl, python etc), I'd pick the latter in most cases.
Even if an FP language is 2x more concise than perl for a given task (AFAIK it isn't), without lots of decent libs, you usually have to write a lot more code.
The more code you have to write, the more code someone else later on has to read and understand. And worse the more code that has to be documented and maintained.
Standard libraries are important, even if they are only defacto standards because if you use a standard library, even if it's buggy, when an experienced programmer looks at your code, they know what the lib does (and hopefully its bugs), so they only need to read your code, they don't have spend time looking at the lib.
Whereas if you were writing your own custom libraries, or using one of 100 possible libraries (with no defacto standard) out on the internet, the person taking over or helping out will have to spend extra time trying to debug/understand it.
Some of the FP languages are catching up in terms of standard libraries, but for many you either have to write your own crap or have to waste lots of time finding and evaluating libraries to see if they are worth using.
Nobody in their right mind wants to deal with 10 different print commands (ok maybe the php people are different
I'd rather be able to get on with writing the more "interesting" bits ASAP.
Re:No, look at the scope (Score:3, Interesting)
We have also seen the merging of those paradigms over the years. Every mainstream language today, with the exception of C, has some form of OOP. Every mainstream language either has or is getting (e.g. C++0x lambdas) first-class function values - with the unfortunate (for Java) exception of Java, now that Neal Gafter has moved from Google to Microsoft. Many languages are introducing lazy sequences and libraries centered around them (Scala, LINQ, Python 3000) Also related is syntactic sugar for list comprehensions (Scala, Python). Then, there are languages that are specifically designed to be combined OO/FP - Scala, OCaml, F#.
Actually, it's worth learning FP today solely for this reason. FP has found its way into mainstream by piggybacking on top of existing OO solutions. For example, if you want to be a good C# or VB programmer today, you have to learn LINQ - and that is all about FP [microsoft.com]. You may not even be fond of it yourself, but there will be code written using it, and you'd better understand it well enough to be able to maintain and extend it!
Re:it's always a good time to try functional (Score:2, Interesting)
Please. If you're writing that condescendingly I'd wager I've written more ML than you've written code full stop. The example you give is an extremely unnatural way of doing a loop control structure, and in a language that has more normal flow control there is truly no need for it. That one doesn't have to go through the contortions you do to get regular flow control in a pure functional language in python doesn't make it unsuitable for functional programming; on the contrary, it makes it the best language for functional programming. Once you've rewritten a perfectly clear function to a convoluted version using an accumulator so that you can get reasonable performance doing something that wouldn't require any of that in a slightly less dogmatic language, it quickly loses its appeal.
don't bother (Score:3, Interesting)
When people say that they want to learn "functional programming", they usually mean some functional programming language. Common functional programming languages (OCAML, SML, Haskell, Lisp, Scheme, etc.) are duds when it comes to performance: in principle, they encourage parallel-friendly programming, in practice, they get many basic language design and performance issues wrong.
It's also silly to rewrite large amounts of code when usually only a small percentage of the code is performance critical. If you need to speed up C code to run fast on a multicore machine, optimize the inner loops. All the "functional programming" you need for that is available right within C in the form of OpenMP. You'll understand OpenMP a little better if you have used a "real" functional programming language, but the concepts aren't rocket science and you can do just fine as a C programmer.
Re:don't bother (Score:3, Interesting)
i kind of agree, don't bother about fp.
the thing is, our applications are today running hundreds of processes in a box, even if you get 256 cores on a cpu, we'll still keep them all busy without a change.
who ever wrote the logic for the article was probably on coke or smth, seriously, no company with over 50 employees (and hence reasonable salaries for non-owners), will migrate to any functional language any time soon, they can't afford it, they don't have time for it and most of all, they don't want a software that only works due to the knowledge of "those 2 guys" ...
people want software that a lot of people can understand, people want to hire people that can be replaced (in case they get hit by a train or go over to the competitor). nobody wants "this unique piece of s.it that is 5% faster". welcome to the "enterprise".
big players just don't care. they want simple solutions that can be extended and handled cheaply. buying an extra box with a network connection behind it is cheap. hiring niche people is expensive as hell.
enough complained, i think those who have ever worked more than a few years in big companies understood the point already while reading the title or article...
Re:"everything that has been invented will be..." (Score:4, Interesting)
Sorry, I didn't mean to imply that we had come to the end of invention, even in a small area.
But all of the effective techniques *that we know about* have been implemented, and the chip makers have been banging their heads on the wall for years trying to come up with new ones. They went to multi-core processors as an absolute last resort, and 5 years later it's probably time for most programmers to accept that and learn to target multi-core platforms.
Re:Convince your boss. (Score:3, Interesting)
And for what it's worth, evolution algorithms and genetic algorithms are going to be all over the place in 5 or 10 years, because they can handle 1000 or 10,000 cores without any problems. They're sort of like 'meta-algorithms' for parallelizing a linear algorithm. The only requirement is that the linear algorithm has to be set up to compete with other copies of itself. So you then have to deal with a bunch of stuff like encoding populations of potential answers, creating mutation and crossover operators, defining selection thresholds, and creating competition scenarios.
The most difficult part, in many respects, is creating an intuitive user interface for all of it.
Re:Convince your boss. (Score:2, Interesting)
Re:Convince your boss. (Score:3, Interesting)
True. OpenMP is great. But it's not the only solution.
OpenMP, MPI, and all are pretty much only useful if you are programming for a multi-computer (not multi-core, or multi-proc) environment - e.g. supercomputer.
When dealing with just one computer and either multi-proc or multi-core, there's a much simplier solution: locks. Works great with doing multi-threaded and even multi-processed programming.
Most programs would simply benefit from having a second thread. (Ever wonder why your program's interface hangs? Usually because it's only one thread and it's in a processing loop and not updating the GUI. A second thread solves that.)
And multi-document programs (e.g. FF, IE, etc.) benefit by having 1 main thread, and another for each document. (Thread, Process, take your choice. Either way...it benefits.)
But you also have to remember - your program is not the only program on the system. So additional cores that you are not using can be provided to a different program so it can run in parallel, thus making the computer more useful for the user - and let them do more, faster - even if its not with your program.