Imparting Malware Resistance With a Randomizing Compiler 125
First time accepted submitter wheelbarrio (1784594) writes with this news from the Economist: "Inspired by the natural resistance offered to pathogens by genetically diverse host populations, Dr Michael Franz at UCI suggests that common software be similarly hardened against attack by generating a unique executable for each install. It sounds like a cute idea, although the article doesn't provide examples of what kinds of diversity are possible whilst maintaining the program logic, nor what kind of attacks would be prevented with this approach." This might reduce the value of MD5 sums, though.
Cute but dumb (Score:5, Insightful)
Re:Cute but dumb (Score:5, Insightful)
Re: (Score:3, Informative)
And broken clocks tells the right time twice a day. How often do you expect the randomization could/would help rather than hinder bug zapping?
Bug reporting itself would become a much bigger problem due to the greater difficulty in reproducing them.
Re:Cute but dumb (Score:5, Insightful)
Re: (Score:2, Insightful)
And would make that buggy software nearly impossible to patch.
Every time there's a security vulnerability found, you'd essentially have to reinstall the whole application.
Knock on wood, but I've not had enough bad experiences with malware to think the tradeoff is worth it.
Re: (Score:2)
And would make that buggy software nearly impossible to patch. Every time there's a security vulnerability found, you'd essentially have to reinstall the whole application.
Is there any way to run the patch through the same process (using the same per-install key, of course) so that the result is a locally-transmuted patch that can be applied to the locally-transmuted application?
(Not that updating the entire application is necessarily a deal-breaker anyway; we all have broadband now, right?)
Re: (Score:3)
If that were possible, then malware could do the same thing (because we all know the random seed isn't going to be stored securely by average users).
Re: (Score:1)
>> And would make that buggy software nearly impossible to patch.
A patch applies to the source, recompile, and there you are.
>> Every time there's a security vulnerability found, you'd essentially have to reinstall the whole application.
No, you have to patch the source and recompile the exe. It's a much saner workflow than to patch a binary (who does this anyway?).
Re: (Score:2)
Re: (Score:1)
>> and in 1 in 1000 installs that cases has some weird behavior.
Get the compiler rand seed with the bug report.
Reproduce the compilation and the test the bug.
Profit.
This could help to force coders to write tidier code.
Re: (Score:1)
Not really, this is a simple to do. We already do it to a minor degree. Every time we make a change and recompile the order gets shifted a little. Because most (nearly all) modern programs are modular. (meaning they are segmented often methods or functions that can be rearranged in any order, without changing the programs logic or flow.) All we need to do is reorder the program. It would even be possible to encrypt or sign parts or the whole of a program. This would make more of a challenge for hackers. (bo
Re: (Score:2)
And make Heisenbugs the norm: Just compile, and you bug may vanish, multiply or behave completely different. Not smart at all...
Would cause major debugging headaches (Score:5, Insightful)
Can you imagine parsing a stack trace or equivalent from one of these? Each stack is different.
Ignoring the fact that Heisenbugs would be much more prevalent.
Part of programming is paring of states. The computer is an (effectively) infinite-state machine. When you add bounds and checks you're reducing the number of states. This would add a great deal, making bugs more prevalent. Since a lot of attacks are based on bugs, this may increase the likelihood of some attacks.
Re:Would cause major debugging headaches (Score:5, Funny)
Ahh, but don't forget the benefits! If random bugs could appear or disappear on installs, think of how much tech support time you can save by just saying "Re-install it and you'll be fine."
Half the time that's what they do now anyways, now you can replace ALL the calls with that!
Re:Would cause major debugging headaches (Score:5, Interesting)
The randomizing compiler could easily be designed to base it's randomizations on a seed, and then include that seed in the obj headers and stack dump trace library of the libc it links against. Then the bug would be just as reproducable as with a standard compiler.
Re: (Score:3)
although the article doesn't provide examples of what kinds of diversity are possible whilst maintaining the program logic, nor what kind of attacks would be prevented with this approach."
I don't know if TFA actually didn't, but the UCI group has published some papers on the multicompiler work, including this one from CGO last year [uci.edu]. The main goal for this is to provide defence against return-oriented programming (ROP) [wikipedia.org] attacks, where you chain together 'gadgets' (small chunks of code
the crutch of determinism (Score:5, Interesting)
I must respectfully disagree with you on every point you raise.
A randomised stack would cause certain types of bugs to manifest themselves much earlier in the development process. Nothing decreases the cost of a bug hunt more than proximity to the actual coding event.
Such an environment rewards programmers who invest more to validate their loops and bounds more rigorously in the first place. Nothing reduces the cost of a bug more than not coding it in the first place.
There's nothing that stops the debugging team from debugging against a canonical build, if they wish to do so. If they have a bug that the canonical build won't manifest, they wouldn't even have known about the bug without this technique added to the repertoire. If many such bugs become known early in the development process—bugs that manifest on some randomised builds, but not on the canonical debug build—you're got an excellent warning klaxon telling you what you need to know—your coding or management standards suck. Debugging suck, if instigated soon enough to matter, returns 100x ROI as compared to debugging code.
Certainly the number of critical vulnerabilities that exist against some compiled binary can only increase in number. So what? The attacker most likely doesn't know in advance which version any particular target will run. The attacker must now develop ten or one hundred exploits where previously one sufficed (or one exploit twice as large and ten times more clever).
If the program code mutated on every execution, you would have some valid points. That would be stupid beyond all comprehension. An attacker could just keep running your program until it comes up cherries.
The developer controls the determinism model. It's an asset in the war. There can be more when it helps our own cause, and less when it assists our adversaries.
Determinism should be not reduced to a crutch for failing to code correctly in the first place. Get over it. Learn how. Live in an environment that punishes mistakes early and often.
Re:the crutch of determinism (Score:4, Insightful)
All in all, your post reads like a smug "Code better, noob!" while completely ignoring the tremendous extra costs that are going to be necessary to properly test hundreds of thousands of randomized builds for consistency.
So you are arguing to leave bugs in place ? (Score:5, Interesting)
What kinds of bugs do you think would manifest earlier using this technique ...
The GP mentioned a randomized stack. An uninitialized variable would be one, something that often accidentally has a value that does no harm (a zero possibly).
... and why do you think that earlier manifestation of that class of bugs will outweigh the tremendous burden of chasing down all the heisenbugs that only occur on some small percentage of randomized builds?
You do realize that your argument for the status quo and not dealing with the "heisenbugs" is essentially arguing to leave a coding bug in place? Recompiling will not necessarily introduce new bugs, rather change the behavior of existing bugs.
I've seen many of the sort of bugs this recompiling technique may expose, I spent some years porting software between different architectures. Not only did we have different compilers but we had different target CPUs. It was a friggin awesome environment for exposing unnoticed bugs. Software that had run reliably under internal testing for weeks on its original platform failed immediately when run on a second platform. And it kept failing immediately after several crashing bugs were fixed. The original developers, who were actually quite skilled, looked at several of the bugs eventually found and wondered how the program ever ran at all. I've seen this repeated on multiple teams at multiple companies over the years.
Also developers working on one platform eventually learned to visit a colleague working on the "other" platform when they had a bug that was hard to reproduce. There was a good chance that a hard to manifest bug on one platform would be easier to reproduce on the other.
There is nothing like cross platform development to help shake out bugs.
This recompilation idea would seem to offer some of these same benefits. Yes it complicates reproducibility of crashes in the field but if one can get a recompilation seed with that crash dump/log its more like of dealing with an extra step not some impossible hurdle.
Plus recompiling with a different seed each time the developer does a test run at their workstation could help find bugs in the first place, reducing the occurrences of these pesky crashes in the field.
I'm not saying these proposed recompilations in the field are definitely a good idea, just that the negatives seem to be exaggerated. It looks like something interesting, worth looking into a bit more.
Re: (Score:2)
I mean how the costs don't outweight the benefits. (Score:2)
Re: (Score:2)
I mean how the costs don't outweight the benefits. Dammit, I always proof-read what i think I wrote, not what I actually wrote.
Me too. That is when I bother to proofread. :-)
Re: (Score:2)
Re:Would cause major debugging headaches (Score:4, Interesting)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Can you imagine parsing a stack trace or equivalent from one of these? Each stack is different.
Ignoring the fact that Heisenbugs would be much more prevalent.
Part of programming is paring of states. The computer is an (effectively) infinite-state machine. When you add bounds and checks you're reducing the number of states. This would add a great deal, making bugs more prevalent. Since a lot of attacks are based on bugs, this may increase the likelihood of some attacks.
I don't know about you, but with the limited programming I have, I'd save this new compiler for release version and use a normal compiler for internal version, so I can debug and make sure it's working great. Then I'd use the new compiler for the .exe I'm going to produce and give to people (sell/whatever).
Hopefully by then most the major bugs are found. If not, I can compile the source code on a normal compiler and do normal debugging.
Swear to Gog no one uses their brains anymore.
Formal verification costs money (Score:2)
Re: (Score:3)
....why? (Score:5, Insightful)
..would a professor of CompSci think this is a good idea, despite the hundreds of problems it *causes* with existing practices and procedures?
Oh, wait.. maybe because the idea is patented and he'll get paid a lot.
http://www.google.com/patents/US8239836
University of California requires patents ... (Score:4, Informative)
..would a professor of CompSci think this is a good idea, despite the hundreds of problems it *causes* with existing practices and procedures? Oh, wait.. maybe because the idea is patented and he'll get paid a lot.
http://www.google.com/patents/... [google.com]
As an employee of the University of California a professor is *required* to report any discovery or method that *might* be patentable to the University.
The University takes it from there, it has an office that researches viability, handles the process and then licenses the patents to "industry". With respect to licensing small local companies are given a better deal than larger internationals. As for the licensing fees collected, 50% goes to the University, 25% to the department (UC Irvine's Computer Science department in this case) and 25% to the employee(s).
At least that is how it was a few years ago when I was a grad student at UC.
So we're stuck with the source then? (Score:4, Insightful)
So we should use something like ABS with that randomisation enabled? Or should we trust to download distinct blobs for every download? For the latter, nice try NSA, but I don't want you to be abled to incorporate spyware into my download and not be noticed. ... extra features. The blobs should be signed by more entities, so then all would have to be NSLed.
Its already a pity software gets signed only by so few entities (usually one at a time, at least for deb). Perhaps I know that the blob came from Debian, but I can't verify whether it is the version the public gets, or the special version with some
Re: (Score:2)
So we should use something like ABS with that randomisation enabled? Or should we trust to download distinct blobs for every download? For the latter, nice try NSA, but I don't want you to be abled to incorporate spyware into my download and not be noticed. ... extra features. The blobs should be signed by more entities, so then all would have to be NSLed.
Its already a pity software gets signed only by so few entities (usually one at a time, at least for deb). Perhaps I know that the blob came from Debian, but I can't verify whether it is the version the public gets, or the special version with some
I wouldn't trust it either way. A randomized binary from some site would be insanely dangerous. But even a randomized binary that you compiled yourself is questionable. Who's to say your compiler isn't compromised? Without being able to compare binaries against other peoples with identical checksums you've now turned the effort to verify a file from a global one to just you. You're far more at risk.
Re: (Score:2)
But even a randomized binary that you compiled yourself is questionable. Who's to say your compiler isn't compromised? Without being able to compare binaries against other peoples with identical checksums you've now turned the effort to verify a file from a global one to just you. You're far more at risk.
Do you mean Trusting trust? You don't have to also randomize the compiler. Instead of the resulting programs, you can compare the compiler binaries, and check whether they are globally the same. There is only a small loss in security as you would need to globally ensure the compiler works right.
Copying the Bad Guys (Score:3, Informative)
Nice idea (Score:2)
by generating a unique executable for each install
... and cloning a unique customer support team for each install!
Gentoo (Score:5, Funny)
You can already do this with Gentoo, you're highly unlikely to use the same combination of compiler, kernel, assembler, libraries, use flags, compiler flags etc as anyone else...
Re: (Score:2)
Gentoo isn't about speed. It's about control and configurability.
All those packages with optional Gnome support? Turned on in every other distribution, but turned off for me.
Want to add patches to a package? Just put the patch file under /etc/portage/patches// and it gets included. I currently have 9 patches applied. I can upgrade the packages, and keep my patches as long as they continue to merge cleanly.
Re: (Score:1)
As another poster has pointed out, I give you an example of that in a real-world scenario:
For my home virtualization server, I run CentOS, throw VirtualBox and phpvirtualbox on it. However, the act of installing VB pulls in a bunch of library files that are related to managing and displaying the VMs on the server itself (particularly X, Qt? and some other stuff) that I will never use, as I am running headless VMs, and using phpvirtualbox for all my remote management.
Short of rolling my own version of Virtua
Re: (Score:2)
Machine time is cheap. What do I care that it takes a couple of hours to rebuild some binaries over night? The speed benefit, which might be minor in many cases, is real but not the biggest benefit. The biggest benefit is being able to say system-wide that I'd rather use Qt and not Gtk and have all my current and future binaries built to order.
I'm not wasting my time for a speed benefit, I am spending my machine's down time reducing my surface area and moving parts which has several benefits.
Re: (Score:1)
Do you funroll your loops and fomit pointers?
Increased resistance, just not the right kind. (Score:3)
The real solution to the problem he is trying to solve is not having a monoculture. This does nothing to solve it. If you have different code bases for operating systems, browsers, etc., the ability to infect all of them may be hampered. That's the same advantage of humans and dogs and snakes not being susceptible to the same pathogens. His form of diversty is more of an environmental one, so it's like different potatoes in a bag looking different despite the fact that they are almost certainly clones of each other. That does nothing against a blight.
Re: (Score:2)
It blocks ROP. So it is an effective way of preventing an primary attack vector.
It's not a defense against resident malware.
Trojans are already doing live randomization. But ROP attacks like predictable software so the attack can be developed offline.
Re: (Score:2)
You are missing why it's a boon to trojans. I can confirm that my software is legit by using a hash. If it doesn't match the hash, I know it's likely a trojan.
Re: (Score:2)
I saw Prof, Franz give his talk last year and got a few minutes to pick his brain on this. The details were quite clear. Given the audience he wasn't holding back on details. The delivered software is unchanged. You can randomize at install time or (maybe) at load time. So your hashes are fine. You local file integrity is a local problem.
The shortcoming that I see is shared libraries. Shared libraries are evil from a security context and in the current invocation they don't get randomized (because they are
Re: (Score:2)
I've heard that said on multiple occasions, but I haven't seen much to back it up. I suspect that even if there are theoretical advantages, in practice, it's worse security. Out of date software remains one of. if not the biggest source of vulnerabilities. If multiple instances of the same library need to be updated, the likelihood that at least one of them will go unupdated is a great
Trusting trust (Score:4, Informative)
The problem with any nondeterministic compiler is that it prevents use of diverse double-compiling [dwheeler.com], a method to detect the sort of compiler backdoor described by Ken Thompson in "Reflections on Trusting Trust" [bell-labs.com]. You'd have to bootstrap the compiler with nondeterminism turned off (and with GUIDs, timestamps, and multithreaded allocation of symbols for anonymous objects turned off too) in order for the DDC bootstrap construction to converge.
In any case, I've implemented a technique like this on the Nintendo Entertainment System. I wrote a preprocessor that shuffles the order of functions in the file, the order of opcodes within a function that don't depend on each other's results, and the order of global variables (or the order of fields in an object). One reason I implemented it was to use one variable as another's canary [wikipedia.org] to make buffer overflows easier to detect in an assembly language program. The other is watermarking the binary [nesdev.com] so that I can tell who leaked a particular copy of the beta version to the public. If you're interested, you can find my shuffle tool in the source code of Concentration Room [pineight.com].
Re: (Score:2)
Anti Cheat Maybe (Score:3)
ASLR (Score:3)
Re: (Score:2)
If you think a bit further... An operating system could load an executable at a different address [wikipedia.org] every time it is used, without recompilation!
The problem with ASLR is that it involves Position Independent Code [wikipedia.org]. The absolute addresses may change, but functions are called by their relative addresses to each other. When you know were one function is you know were all the others are as well. A mild example of this new randomization technique is to randomize the file order being fed into the linker. Different file order means different function layout. Then even if you know where one function is you don't know where all the others are without loo
The beef (Score:3)
Re: (Score:1)
Exactly.
Plus the advantage that generalized local compilation is good for avoiding backdoors, you can vompare your source with an audited one, which is not so easy for a bloody binary...
I'm a step ahead (Score:2)
I swapped all the data bits around on my motherboard!
Hahaha!
Good luck!
Oh wait...
Overengineered for it's eventual use.. (Score:3)
Why bother with this at the compiler level?
Just find 10,000 instruciton pairs that can be reordered as they have no interdependancies, and reorder each of the pairs at random during the install phase. That gives you 2^10,000 unique executibles, but all the debugging symbols and so on will remain the same.
I guess that doesn't help you against stack-smashing and so on. But will allow you to fingerprint who leaked your binary onto bittorrent - which would be its eventual use.
Re: (Score:2)
That's a nice idea, but it won't work everywhere.
In x86, for instance, the majority of instructions affect global flag registers. You can have two instructions that operate on entirely different memory locations and GP registers, but when you swap them the flags will end up set differently.
You'll find very few instruction pairs that you can do this to without some ability to perform local analysis of the code.
Re: (Score:3)
It isn't that hard.... there are plenty of low hanging fruit - the classic easy case is the NOPs that are used to align jump destinations. Just find :
[NON PC RELATIVE INSTRUCTION]
NOP
NOP
and replace it with
NOP
[NON PC RELATIVE INSTRUCTION]
NOP
You could even patch the PC relative offset if you wanted to...
Re: (Score:1)
OS X applications are only sandbox to if the developer chooses them to be so such as by wanting them to be in the Mac App Store.
Explain Like I'm Five (Score:5, Insightful)
The problem with this in "Explain like I'm Five" terms:
You can have no idea what the program you are running does.
You cannot trust it. You cannot know it hasn't been tampered with. You cannot know a given copy works the same as another copy. You cannot know your executable has no back doors.
On the security minded front we have a trend towards striving for deterministic build capability; so that we have some confidence and method of validating that a source code to executable transformation hasn't been tampered with, that the binaries you just downloaded were actually generated from the source code in a verifiable way.
Another technique I'm seeing in secure conscious areas is executable whitelisting, where IT hashes and whitelists executables, and stuff not on the whitelist is flagged and/or rejected.
Now this guy comes along and runs headlong in the other direction suggesting every executable should be different. And I'm not sure I see any real benefit, nevermind a benefit that offsets the losses outlined above.
Re: (Score:1)
It's simple. You use signed source code instead of signed binaries.
Then you use a compiler and linker that does some simple things like randomly ordering variables and functions in the executable and on the stack. That makes it impossible for an attacker to know where some key variable is and exploit it though an overflow (whether on the stack or elsewhere). The attacker is far more likely to crash your program than to exploit a bug, which is much easier to recover from.
Also, as pointed out elsewhere, wh
Re:Explain Like I'm Five (Score:5, Interesting)
It's simple. You use signed source code instead of signed binaries.
That doesn't really help.
If every executable is different, then I have no information about the binaries i downloaded. I have to download the source, verify that its the 'audited trusted source' by checking its hash and signatures, and then I have to compile it myself. Most people don't want to compile all their own code.
It is good enough that OpenBSD released the source code, trusted auditing group audited the source code, and trusted build validation group verifies that the binaries on the OpenBSD site were generated from the audited source. I can just download the binaries check the hash/signatures and I'm good to go.
And in the case of a corporate IT department, you use the randomizing compiler to build the binary that you push out to your clients. It may be the same throughout your company, but it will be different from anything anyone outside would have access to, which is probably good enough.
The technique can be expanded to the home market; whereby joe-sixpack is running executable whitelist-reputation subscription software that will flag anything on his system that isn't "known good". Antivirus software is starting to head in this direction -- where it maintains databases of 'known good' executables; you've probably even seen them say "this executable is not known... submit it for analysis" -- take that system to its logical conclusion; and we could see community sites maintain executable whitelists that are as effective as adware blockers. (And they'd have no qualms about flagging "technically not illegal malware but nobody actually wants to run this shit" (e.g. toolbar search redirections through popup advertisting portals that the AV guys are currently too scared to just block outright.)
Community managed executable whitelists with operating system level enforcement support could potentially make a serious dent in malware on the average uninformed users computer. It would help close a lot of attack vectors. More effective I think than 'randomizing' variable layout at in the compiled executable.
Also re:
Then you use a compiler and linker that does some simple things like randomly ordering variables and functions in the executable and on the stack.
Stronger ASLR and DEP type features in the OS to do executable layout randomization at runtime I think represents a better approach to this than randomization at compile time.
This isn't new (Score:1)
Er... no. (Score:1)
I worked in this field a good many years ago, and I remember how we hoped that new Windows environments would suppress the prevalence of viral executables.
Then Macro Viruses turned up.
Now, Macro Viruses work at a higher level than machine code. They will therefore work on ANY machine that recognises, for instance, the WORD macro language - a mainframe, if WORD was ported to it. And you can't change macro languages - they are standardised.
I've seen many academics propose the 'answer' to viruses, and watched
Making safe code un-recognizable. (Score:1)
The anti-virus product makers are really going to hate this.
Or deal with pointer arithmetic properly (Score:1)
Re: (Score:2)
Re: (Score:2)
Already done... (Score:2)
This is what polymorphic software does, and I think you'll find it on pretty much every computer that's part of a botnet.
By this measure, botnet software should be really difficult to detect and compromise -- and yet it isn't.
Also, it's worth noting that while government-sponsored and targeted attacks would be more difficult using this method, most malware depends on whatever the current security flaws are and/or human failure to initially get its foot in the door.
And the logic path wouldn't be changing, ev
It won't help (enough) (Score:2)
Viruses in nature mutate randomly. Computer viruses don't.
Computer virus designers are intelligent, hostile, and evil in intent.
If there's a way around it, they'll find it and it's game over.
Besides, many if not most attack vectors wouldn't care a whit - tricking a user into executing code would still work, SQL injection, cross site scripting...
Re: (Score:2)
Yes, and virus designers already use this technique to defeat signature scanners in AV programs.
At a different level (Score:2)
This seems to me the wrong level for software diversity, too low. A bug in the source will be executed in all variants (think sql injection), while an exploit that depends on particular bytes in particular locations can already be made difficult by ASLR.
What about having higher level protocols that the software of a given category must adhere to, and various programs that treat data according to those protocols? You know, like that internet thing before the prevalence of web2.0 megasites, or like posix. The
This would so piss off law enforcement (Score:1)
Re: (Score:2)
Sounds like a plus to me.
Genetically diverse host population .. (Score:1)
What a good idea, isn't this what they did with the Space Shuttle
Just scramble the Microcode .. (Score:1)
dont do it (Score:1)
As a professional software tester let me be the first to say noooooooooooo !
Are you going to trust a 99% solution? (Score:2)
This doesn't fix the problem. It makes the chances of exploitation a bit smaller, on a "per-try" basis.
Back in the old days, some daemons or setuid programs would do insecure things with /tmp. So the hacker would make a program:
target = "/tmp/somefile";
while (1) {
unlink (target);
link ("/etc/passwd", target);
unlink (target);
link ("/tmp/myfile", target);
}
The daemon would check access permissions of the "target", hopefully
malware with randomisation (Score:2)
huh. this sounds very similar to the theoretical virus designs i came up with many years ago. yes, you heard right: turn it round. instead of the programs on the computer being randomised so that they are resistant to malware attacks, randomise the *malware* so that it is resistant to *anti-virus* detection. the model is basically the flu or common cold virus.
here's where it gets interesting: comparing the use of randomisation in malware vs randomisation in defense against malware, it's probably going t
Re: (Score:1)
That's not theoretical at all. You're over 20 years late to the party. It's called polymorphism or metamorphism (depending on whether it changes individual instructions for similar ones, or actually self-modifies its code).
The idea was first predicted by the computer scientist Fred Cohen. The Slovenian VXer Lucky Lady demonstrated it in 1988 on the Atari ST, and around about the same time, Mark Washburn with V2PX/1260 on the PC, a Vienna modification; more practically, the first widely released version of s
IT Crowd (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Instructions unclear, anus stuck in ceiling fan.