Classic Computer Science Papers 54
Dean Chouinard writes "For inspiration I occasionally peruse the web pages of Dennis Ritchie, Brian Kernighan and Ken Thompson.
I recently found a pointer to the
following classic paper by Ken Thompson.
Other interesting papers are listed under the
classics directory
as well."
sort of kind of an answer, but not really (Score:1)
having read the article (and enjoyed it immensely), i don't see how free software would be more vulnerable than any other variety. it's transmitted from binary to binary, after having been initially incubated in the source.
actually, the great thing about this would have been if it had scared people into compiling their compilers back and forth among other vendor's compilers and their own. i think (unless i'm as confused as i think i am!) that would tend to kill it. or else would it just tend to add any *other* trojan things that the others might have? you know what, i think it would be the latter. never mind. i've never heard of anybody doing that, but then again i wouldn't have heard of it if they had, so who knows.
if there's anybody out there who doesn't get an attack of vertigo contemplating the code in question, please chime in!
what would a "cure" be? maybe bootstrapping writing an assembler by hand, byte by byte, and then writing a minimal c compiler in assembler . . . ? i know almost nothing about assembly language or whatnot, though i did see somebody once write a trivial (REAL trivial!) program by typing opcodes or whatever you call them into the dos debug.exe thing. it seemed kinda like the long way around, for most purposes, but you could avoid introducing anything weird if you really wanted to. you shouldn't trust debug.exe, but it would be trivial to write a quick minimal write-only binary editor in c, the output of which could be verified. god, this is like a philip k. dick novel.
:)
Rice's Theorem (Score:1)
It's not that complex. (Score:1)
Then, once you have a couple compilers you write other compilers with those existing compilers. The 'test' seems to be writing a compiler that can then be used to compile itself. GCC does this, it has some stub code that can get you a basic compiler and it uses that to compile GCC and then you can redo the whole thing with GCC.
Take a compilers course, I think it was the hardest course I took in college but you learn a ton. It's not too tough to build a very primitive 'yacc compiler' that takes a yacc grammer and outputs assembly for a very simple language. If you assume the first compilers were something like that (and they were) then you suddenyl have a better tool to make an even more sophisticated compiler, and this sprials and spirals and you eventually end up with modern compilers.
First Compiler (Score:1)
Some of the most important things you'll read (Score:1)
Who knows. Someone reading them for the first time because of an introduction on here might be inspired to advance the field of knowledge.
Sure beats people bragging about how many MP3 files they've pirated.
Peace
Ron Rangel
First Compiler (Score:2)
-Confused
Heard this story at MIT... (Score:3)
I expect Thompson didn't do anything with it; I expect he was just holding back to keep an air of mystery around it all. But still... interesting...
(That's one of my best memories of that class. All us youngins hanging on every word of Ward's story about the "real" hacking done by our heros. Or at least mine.
Corolary: forking is good (Score:1)
To a limited extent, at least, forking is good.
sort of kind of an answer, but not really (Score:1)
Re: Hold on. ACM sucks dick. (Score:1)
First Compiler (Score:1)
A little neglected (Score:1)
I love Ritchie's biography! (Score:1)
"Subsequently, I aided Ken Thompson in creating the Unix operating system."
"Early in the development of Unix, I added data types and new syntax to Thompson's B language, thus producing the new language C."
Imagine having that on your resume!
I think you guys missed the point (Score:1)
Since then, gcc has always been compiled using gcc, and its source has been open for public inspection. egcs has also been compiled from gcc and egcs versions which are also publicly inspected.
The proof that publicly-inspected versions of gcc and egcs are not infected follows via recursive proof. G(0) is clean, because it was compiled with a non-gcc compiler; G(n+1) is clean if G(n) is clean, because its source is publicly inspected. Hence G(n) is clean for all n.
The problem enters when a version of gcc or egcs is **secretly** forked from the main (publicly inspectable) source tree, and *its* code is contaminated, and it is then distributed in binary form, as if it were legitimate. Once binaries enter the arena from an unverifiable source, you have a problem, since if anyone compiles a (clean-source) version of gcc with an infected compiler, they only get infected compilers as a result.
Solutions:
* Maintain a "reference compiler binary" for each platform, located at FSF or CYGNUS. This would be a copy of gcc or egcs which had been derived from an old (pre-gcc) compiler, and which was thus known (by the above recursive proof) to be clean.
* Trace through all compilers with a debugger. While it's possible that a debugger could be hacked to skip over the trojan routines, it would be MUCH harder to hide than the largely black-box process of compilation.
An analogy:
Every newly-compiled compiler has two "parents": its own source code, and the compiler which was used to compile it. Infection is a dominant genetic trait: if *either* parent is infected, then the "baby" compiler will be infected too.
If you detect an infection in a compiler (by stepping through it with a debugger, for instance) then you know that one or both of its parents were themselves infected. This could theoretically be used to trace an infection back to the source patch that introduced it.
segfault (Score:1)
Thanks for the verification :)
if this is the type of stuff on segfault I will certainly be checking it out from now on.
For the most part, no, it isn't.
The method is regexp dependant... (Score:1)
This would make it very difficult to produce a trojan that could replace a specific function in a specific program *without* affecting other programs to be compiled.
It is still possible to insert a trojan that will be added to *every* binary compiled, but checking for this is trivial-- the expected machine code for, e.g., hello world can be calculated by hand and compared to the output of your compiler.
Code munging is only a partial protection, probably less effective than bootstrapping, but it would at least give the crackers some headaches.
True. (Score:1)
Machine-code trojans would tend to be platform dependant, and would be much harder to implement, especially for a portable, cross-compiling compiler such as gcc.
I assume that by a "machine code implementation" yo mean a trojan that would wait until code had been compiled, and then examine the resultant bits for a particular pattern to replace.
Small changes to the source code could result in such large changes to the object code that this method would only be practical for infecting code that is not expected to change. The exuberant code munger could randomly salt the code with valid but useless statements-- printing to
The problem of recognizing a particular function in a particular program in machine code for many different platforms is on a par with recognizing a particular function in a particular program in source code with arbitrary function/variable/file names. It is certainly soluble, but much more difficult than the example put forward in the paper.
I stand by the assertion that munging would make life harder for the cracker.
Not forking-- multiple implementations. (Score:1)
But what do you compile *them* with
Cross-compile all the way down, baby (Score:2)
Cross-compile all the way down, baby (Score:2)
'compiler' in the sense of gcc, I meant all the bits that is required to get a new computer system able to reasonably support its own code development.
A classic example of poor web maintenance, too (Score:1)
I guess the ACM don't care about their website that much. (And they're using the CERN webserver still!)
Djikstra (Score:1)
So there.
"Umm... Goto's are bad, m'kay?"
Actually, I've been hunting around for the "Goto's" paper for a while now.
Question (Score:2)
This sounds like the kind of 'warning' that could be leveled at the open source process, and I'd like to be able to refute it as accurately as possible.
First Compiler (Score:2)
Supposedly that's one of the big tests of the viability of a language: whether you can write a compiler for the language in the language. But you've always gotta write the first compiler in some other language.
After taking my compiler design course, I've never looked at programming the same way. Same with assembly programming. IMNSHO, every programmer should be exposed to both of these topics.
Question (Score:1)
No Compiler (Score:1)
There must be quite a few of us around here?
Thanks, that is what I was trying to say (Score:1)
Hold on. ACM sucks dick. (Score:1)
Should a professional organization like the ACM provide free access to their archives? Why?
Chickens & Eggs (Score:1)
An interpreter is another way to start (Score:1)
My solution was to write a quick and dirty hack ReWrite interpreter in Pascal, give it a sufficient subset of the language to get me started, then wrote a pretty minimal compiler in ReWrite, and interpreted the compiler to compile itself. (It took four days on my 8 MHz Mac Classic doing it in little chunks, that interpreter was really slow). Then I stopped using the interpreter and started extending the compiler.
A couple of memorable things from the development process:
I had the interpreter and compiler designed so that they could interoperate, and it was fascinating to watch the compilation rate speed up as more bits of the compiler were compiled (it would link the each compiled function in immediately, and use that in preference to the interpreted one). A gradual speed gain of about a factor of 60.
An interesting part of extending a language written in itself, particularly when it is not a standard language, is that once an extension has been added and works, I start using it in further development on the compiler. The langauge has changed enough since I started it that there is a bit of a mix of coding styles in there - some of the older code is quite hard to read when it has to work around restrictions that no longer exist. (That's why it turns out to have such an unintentionally appropriate name - every so often I rewrite a code module that has grown too old).
By the way, is ReWrite only goes on Macintoshes, is 68K code (not PowerPC), has free binaries but is not distributed with source, and hasn't been released for about three years. I would like to change all of that (except the free part) as I get more time to work on it.
Of course because it only goes on one platform and is wrtten in itself, to only sensible way to port it is to write a cross-compiler, ReWrite to C++ or ReWrite to Java Bytecode compiler (these last two are possible, but hard).
I've strayed off topic a bit, but my experience is that going through that process of getting a compiler going is fun, even if _everyone_ you mention it to before you get it working starts out by saying that it is a waste of time.
I've also get ReWrite on my CV, and I remember going for a job interview where the fact that I had designed and implemented language was seen quite favourably, but:
Interviewer: "What did you write it in?"
Me: "ReWrite"
Interviewer: "No, what did you _write_ it in?"
Me: "ReWrite, it's written in itself"
Interviewer: "No, what did you _write_ it in?"
Me: "The source code for ReWrite is written in ReWrite itself, just like C compilers are written in C"
(snipped a couple more ways of me trying to say the same thing in different ways)
Me: (not wanting the interview to be stuck any longer) "The user interface is written in Pascal"
Interviewer: "Ah". (he was happy then)
Roy Ward.
Djikstra IS THE MAN! (Score:1)
The essence of my paper is that in a computer industry mad to sell boxes -- whether hardware or software -- solutions are merely a problem's way of creating another problem.
--------
Truths about the old guys.. (Score:2)
I actually work with Denis, Ken, and Brian and they're still cranking out some really cool stuff. None of it is mainstream due to Lucent's (Bell Labs) marketing of it and the huge installed base of current poop (MS Win) and software (*NIX)..
True that GOTOs are still running amuck in their code sometimes. Some of their newer stuff, papers, and such should be public like the theories behind OSs like Plan9 and Inferno, two OSs based completely on a file stucture for everything.
If you check out http://plan9.bell-labs.com/cm/cs/ [bell-labs.com] you can find them and some off their recent work. Check it out.
Question (Score:1)
In fact I would argue that this problem cannot be feasibly solved but merely represents a risk in trusting any person/entity.