Donald Knuth Rips On Unit Tests and More 567
eldavojohn writes "You may be familiar with Donald Knuth from his famous Art of Computer Programming books but he's also the father of TeX and, arguably, one of the founders of open source. There's an interesting interview where he says a lot of stuff I wouldn't have predicted. One of the first surprises to me was that he didn't seem to be a huge proponent of unit tests. I use JUnit to test parts of my projects maybe 200 times a day but Knuth calls that kind of practice a 'waste of time' and claims 'nothing needs to be "mocked up."' He also states that methods to write software to take advantage of parallel programming hardware (like multi-core systems that we've discussed) are too difficult for him to tackle due to ever-changing hardware. He even goes so far as to vent about his unhappiness toward chipmakers for forcing us into the multicore realm. He pitches his idea of 'literate programming' which I must admit I've never heard of but find it intriguing. At the end, he even remarks on his adage that young people shouldn't do things just because they're trendy. Whether you love him or hate him, he sure has some interesting/flame-bait things to say."
Literate programming... (Score:3, Insightful)
Using "MULTIPLYBY" instead of "*" isn't going to make your code easier to read.
Programming for the human VM (Score:5, Insightful)
It interleaves code and documentation in the same files, and provides a specialized compilator to tell the two kinds of codes apart. Just like Doxygen and Javadoc can extract the comments from a source code project, the "tangle" process can extract all the code from a Literate program and pass it to a clasic compiler.
Now that C and C++ seem to have a declining popularity [slashdot.org], maybe we can look for better ways of getting away from the bare metal (which, don't forget it, is why those languages become popular at the beginning). Don't get me wrong, they served us well for 36 years, but I think it's time again to begin caring more for the developers' requirements and less for the hardware requirements.
Re:Literate programming... (Score:4, Insightful)
Unit Tests are not wasteful (Score:5, Insightful)
Re:What? (Score:5, Insightful)
Re:Literate programming... (Score:2, Insightful)
Re:Literate programming... (Score:4, Insightful)
I'd kill any colleague of mine who wrote such a vacuous comment. With a golf club, in front of its cow-orkers to drive the lesson home,
Re:Literate programming... (Score:3, Insightful)
Re:Literate programming... (Score:5, Insightful)
I'd kill any colleague of mine who wrote such a vacuous comment. With a golf club, in front of its cow-orkers to drive the lesson home,
Re:Unit Tests are not wasteful (Score:5, Insightful)
Unit tests, especially if written and organized in an intelligent fashion, can be of tremendous value in eliminating small coding errors that were not inteded but are bound to creep in if the project is large enough. If you are clever about your tests then you can usually inherit the same test multiple times and get multiple uses out a test or part of a test. If unit tests are not used then it is more likely that several small errors in seemingly unrelated classes or methods can combine in an unforseen way to produce nasty and unexpected emergent behavior that is difficult to debug when the whole system is put together. Unit tests will not make up for crappy design, but they will help detect minor flaws in a good design that might otherwise have gone undected until final system integration where they could be much more difficult to debug.
I actually have a great deal of respect for Knuth, but I think that he is wrong about unit tests. Perhaps it is the difference between the academic computer scientist and the career programmer who admires the ivory tower, but is willing to make concessions in the name of expedience and getting work done on time.
He's right (Score:5, Insightful)
If you have a function that multiplies two integers, most coders will write a test that multiplies two numbers. That's not good enough. You need to consider boundary conditions. For example, can you multiply MAX_INT by MAX_INT? MAX_INT by -MAX_INT? Etc. With real world functions you are going to have boundaries up the whazoo. In addition, if you have a function that takes data coming from the user, check for invalid cases even if another function is validating. Check for null or indeterminate values. Write tests that you expect to fail.
Re:Literate programming... (Score:5, Insightful)
I've met code blocks several hundred lines long and it was never ambigious where they started and ended.
Re:Shocked (Score:5, Insightful)
The reason I think literate programming doesn't catch on has mostly to do with the fact that a great many programmers don't bother to think through what they want to do before they code it: they are doing precisely what Knuth mentions he does use unit testing for -- experimental feeling out of ideas. Because they don't start with a clear idea in their heads, of course they don't want to start by writing documentation: you can't document what you haven't thought through. This is the same reason why things like design by contract [wikipedia.org] don't catch on: to write contracts it helps to have a clear idea of what your functions and classes are doing (so you can write your pre-conditions, post-conditions and invariants) before you start hammering out code. The "think first" school of programming is very out of favour (probably mostly because it actually involves thinking).
Re:He's right (Score:4, Insightful)
Re:Unit Tests are not wasteful (Score:2, Insightful)
I think it's possible that this person, despite his earlier genius, has ceased to be as useful as his previous self. Genius is very often like that, they make a good body of work at one point in their life, and their previous success seems to alter them to the point that later work is suspect or just wrong. Sometimes it's ego, other times it's just being stuck in a mental rut, or whatever other reason there may be.
On multicore (Score:3, Insightful)
But to do this we need operating systems that can efficiently and reliably schedule code across cores. Add an ability to 'bind' threads together, so that they schedule always at the same time but on separate real processors. This gives the program the ability know when running an operation split between these threads will always complete faster than sequentially, without vagaries of scheduling possibly starving one thread and making it run much slower.
Once you have this then you can automatically get some speedups from multiple cores on programs that are designed to only run sequentially, and more speedup on programs with just minor tweaks. You aren't going to get perfect scaling this way, but you will get substantial improvements at virtually no cost to the programmer.
In other news, Chuck Norris rips on safety gear... (Score:5, Insightful)
Dismissive of DAK (Score:3, Insightful)
That's a brave stance. He's old, but he hasn't reached his dotage yet. The good doctor has contributed more to the science of information than most, and almost certainly more than you.
One of the reasons why we're reinventing so much over and over with nuisances like VB and C# is that developers are architecting grand toolchains based on ideas that were in the 1960's proven incorrect. They get a lot profits from their workarounds, and then we burn it all down and start over because they all contain the same fatal flaws.
That would be because you haven't installed Vista on it yet.
Having watched this tragedy unfold for a quarter century I've often shook my head and wondered what y'all were thinking. And then I remember that I once thought my parents were fools too. If you can read TAOCP and understand a good fraction of it you will come away with a firmer foundation for the way all things work. It's a tough slog, though, and not everybody is capable.
Re:Unit Tests are not wasteful (Score:4, Insightful)
People in the first group end up with a project full of tests where many are valid, many end up testing basic language functions, and many end up not helping due to oversimplication of behavior in the mocked interfaces.
People in the second group end up missing simple problems that could have been caught had they exercised their code.
Both groups waste a lot of time.
Perhaps this is what you were trying to say when you said "TDD guys are overzealous". I think there are other problems with TDD. Namely that you can only use it when you don't need to learn how to solve a problem as you go... Most interesting programs fall into that category.
Really, people need to use good testing judgement.
Knuth is hardcore (Score:2, Insightful)
Now that's breadth AND depth.
Spaghetti-O Code (Score:5, Insightful)
Re:On multicore (Score:5, Insightful)
There are a few techniques to be mastered and using a language designed with parallelism in mind helps hugely with the picky details.
Re:Spaghetti-O Code (Score:3, Insightful)
>more than a dozen lines long, you create a whole new type of spaghetti code.
There is also a risk that you or a maintenance programmer might re-use such a "function"
that was created simply to make a while loop more aesthetically pleasing, and introduce a bug because that function was not designed or tested for use in isolation.
And in the spirit of the topic, such functions become awkward to unit test, since you're extracting a unit of work out of a loop or control structure, that logically lives there.
Worst Summary Ever (Score:5, Insightful)
The summary sounds like it was written by the headline-producing monkeys at Fox, CNN -- or hell, at the Jerry Springer show. Donald Knuth is not "playing hardball." Nobody needs to call the interview "raw and uncut," or "unplugged."
The interview has almost nothing to do with unit testing and the little Knuth does have to say about the practice is hardly "ripping."
When will people stop sullying peoples' good names by sensationalizing everything they say?
Knuth is a well-respected figure who makes moderate, thoughtful statements. From the summary, you'd think he was a trash-talking pro-wrestler.
heresy (Score:4, Insightful)
After initially being a proponent, I've come to the same conclusion about unit tests myself. I don't think they're worthless, but the time you spend developing or maintaining unit tests could be more profitably spent elsewhere. Especially maintaining.
That's my experience, anyway. I suppose it's pretty heavily dependent on your environment, your customers, and exactly how tolerant your application is of bugs. Avionics software for a new jet fighter has a different set of demands than ye olde "display records from the database" business application. More applications fall in the second category than the first.
Re:He's right (Score:3, Insightful)
Re:Literate programming... (Score:5, Insightful)
I don't think there's anything elite about writing short concise functions and breaking things up. The problem is when people first go into programming, they make these kinds of mistakes unless there are proper code reviews/training (things which often don't happen). When you are at university, the programs you write tend to be quite short and because of that, you don't realise how bad a programmer you actually are at that stage. It's only when you leap into the workplace and start writing a lot that your inadequacies become evident.
To me, programming is a discipline requiring a fair bit of intelligence, but all to often companies hire programmers like they are just hiring shelf-stackers or something. I think there is a lot more professionalism in Open Source projects than in many software houses.
Re:he's from another era (Score:2, Insightful)
Moore's law says (well, indirectly at least) that machines from 2007 should be roughly 256 times as powerful as machines from 1995.
Somehow, the actual performance difference (starting the computer, starting a web-browser, editing text etc.) in running Win95 on hardware from it's time, compared to running Vista on todays hardware, seems to be nowhere near a 256-times improvement.
I can only conclude that while the hardware-industry have improved itself again and again, the software industry have ate almost all of those improvements, instead of giving it to the users.
Knuth dismisses multicore but MMIX is poor design (Score:3, Insightful)
I expected much, much more from Knuth than what I've just seen in that interview and after reading the design of MMIX.
Knuth dismisses multi-core and multi-threading as a fad and an excuse by manufacturers to stop innovating. I'm amazed someone of his intelligence has managed not to read up on exactly WHY this is happening:
So he dismisses the technical problems that manufacturers have been falling over for the last few years as merely a lack of imagination. No - parallelism is here to stay, and people need to realise it rather than just wishing up some magical world where physics aren't a problem.
He dismisses multi-threading as too hard. It isn't, if you're not unfair to the concept. Nobody is getting 100% out of their single-threaded algorithms. You always have stalls due to cache misses, branching, the CPU not having exactly the right instructions you need, linkage, whatever. Nobody EXPECTS you to get 100% of 1 CPU's theoretical speed. So why do people piss all over multi-core/multi-threading when it doesn't achieve perfect speed-ups?
If you achieve only a 50% speed-up using 2 cores compared to 1, you're done a good job, in my opinion. That means you could have dual-core 3GHz CPU or a single core 4.5GHz CPU. Spot which of those actually exists. Getting a 25-50% speed-up from multi-core is easy. The 100% speed-up is HARD. If you stop concentrating on perfection, you'll notice that multi-threading is a) actually not hard to implement, and b) worthwhile.
Then there's MMIX. Knuth thinks that simplicity has to work all the way down to the CPU design. Yes, but not simplicity by way of having instructions made up of 8 bit opcode and 3x8 bit register indexes. A CPU doesn't give a crap how elegant that looks. It's also BAD design - 256 registers makes for a SLOW register file. It'll either end up being the slow critical path in the CPU (limiting top clock speed) or taking multiple cycles to access. There's also no reason to have 256 opcodes. He should have a look at ARM - it manages roughly the same functionality with much less opcodes.
It almost pains me to see the MMIX design and how it's a) not original, b) done better in existing systems already on the market, e.g ARM, and c) doesn't solve any of the performance limit problems he complains about. What's going on with Knuth?
Re:Worst Summary Ever (Score:2, Insightful)
Calling something a "complete waste of time" is, in my book at least, "ripping" on something. I didn't "sully his good name," I posted what I found interesting. You should also point out he has prostate cancer and I left that out. God, what horrible spin I used! You'd think I was talking about someone whose life wasn't at risk, the way I spun that summary!
And also, people are claiming he said these things "in passing." Which I find to be a phrase used when you want to avoid owning up to something you said. If I call you a "whiney bitch in passing" that doesn't lessen it one bit. Knuth claims no one should listen to him. Why is he publishing books if no one should listen to him?
The guy said some inflammatory comments. If you read the following posts, you'll realize that I wasn't the only one that found them inflammatory or controversial.
Declining populary means nothing (Score:1, Insightful)
foreach is a parallelism killer (Score:2, Insightful)
...which the compiler can't discover, because foreach describes a mechanism (looping through a sequence, in order), and not a high-level transformation.
Compare foreach with map. Map is a higher order function that takes a function and a collection, and results in a collection of the same size and structure as the original, but with each element replaced by the result of applying the supplied function to it. Note that the value of each element in the result depends only on the corresponding element of the input. It's trivial to parallelize map.
You can parallelize map easily because it has a favorable contract that specifies the relationship between its inputs and its outputs, and it just so happens that this contract is amenable to parallel execution. A smart compiler, upon seeing a use of map, can trivially tag it as a parallelism candidate.
But since foreach specifies a sequential looping mechanism, there are no guaranteed relations between the input and output (in fact, not even any simple way to determine what should be treated as inputs and outputs). When you write a foreach loop to perform the equivalent of a map, you're underspecifying the transformation you're performing on your collection, and overspecifying the mechanism. That's bad programming.
You mention Parallel LINQ, and this is very relevant. LINQ is based on operations similar to map, that transform sets into sets. LINQ queries, since they abstractly describe the relation between an input and the desired output, can be executed in a number of ways: (a) the system can translate them into SQL queries and send them to a database server to execute; (b) the system can execute them serially; (c) the system can execute them in parallel.
"Bug" is a relative term... (Score:4, Insightful)
But most people -- and certainly the majority of open source projects these days -- define "bug" as "undesirable behaviour"; and by that standards, TeX is chock full of bugs. To pick a couple of obvious examples, incorrect handling of ASCII 0x22 quotation marks, and treating "etc." as the end of a sentence. These aren't "bugs" to Knuth since the incorrect behavious is well documented, but by many people's standards they are.
Re:computer programming (Score:5, Insightful)
No offence meant, but I think your preconceptions may be clouding your judgement here.
You claim that today's programming field is not about clever tricks and fast algorithms. I claim that if more people understood these old-fashioned concepts, we would have much better software today. For a start, anyone developing those "libraries implemented by specialists" you mentioned had better be very good, since a lot of other people's code is going to depend on them. Having worked in groups that develop various kinds of library, I can assure you that a little more general programming knowledge about clever tricks and fast algorithms wouldn't go amiss.
You claim that today's programming field is about big systems with many programmers. I claim that this is because management and technical leadership in most places isn't competent enough to divide up a big system in modular fashion and allow smaller, more flexible teams to solve the little problems before multiplying them all up to solve the big ones. Instead, the guys at the top tend to reduce all problems to the least common denominator, "throw enough people at it and we'll win eventually" philosophy. This explains how a small company with a few dozen employees can produce software that is better in every way than the competing offering from a larger company with hundreds of developers. You don't even need to have a dew dozen genius programmers; you just need to understand the concept that there are O(n^2) lines of communication between n individuals working in a single large team, but if your project is divided hierarchically then logarithms start coming into play, and if you can split a problem into several properly independent smaller ones this becomes a constant factor overhead. This elementary mathematics seems to be beyond a lot of senior management in the software business, and that has far more to do with the need to build monolithic systems maintained by zillions of developers than any actual project requirements do.
Re:Literate programming... (Score:3, Insightful)
#define TRUE_VAL true
#define FALSE_VAL false
if (theVar == TRUE_VAL) {
theVar = FALSE_VAL;
}
(I made this up, but sadly it is not that far removed from actual examples...)
I also worked with a guy (another one) who left a blank line between every two lines of code. ALWAYS.
Anyway, if you are in the neighbourhood, feel free to come over to our office. If you forgot your golf club I'm sure we can rig something using parts from the paper cutter or something...
Re:Literate programming... (Score:3, Insightful)
Furthermore, I don't have the luxury of rewriting millions of lines of existing code. I document the parts I touch and I try not to make anything worse, but rewriting "crap code" is not always an option.
Re:Spaghetti-O Code (Score:3, Insightful)
Oh, please. Don't put on object-oriented airs. A "method" ain't a nothing more than a function associated with a class..
A function (method, procedure, subroutine) should be just as long as it has to be to encapsulate the work it's doing. Sometimes that's one line. Sometimes it's pages.
Breaking those pages of code into a bunch of other subroutines solely on some misguided notion that a function shouldn't be longer than N lines, makes for code that is harder to understand an maintain.
Re:Literate programming... (Score:3, Insightful)
I will say one thing, though: After haml, [hamptoncatlin.com] I never want to write any raw HTML, or any XML, by hand again. Ever.
Re:Out of favor (Score:3, Insightful)
Re:Spaghetti-O Code (Score:1, Insightful)
There is no "final" version (Score:3, Insightful)
If you clean up and refactor as you go, rather than at "the end", what you have described is Agile/XP development.