Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software 245
another random user writes "The claim here is that the encrypted software can be executed, but not reverse-engineered. To quote from the article: 'UCLA computer science professor Amit Sahai and a team of researchers have designed a system to encrypt software so that it only allows someone to use a program as intended while preventing any deciphering of the code behind it. According to Sahai, previously developed techniques for obfuscation presented only a "speed bump," forcing an attacker to spend some effort, perhaps a few days, trying to reverse-engineer the software. The new system, he said, puts up an "iron wall," making it impossible for an adversary to reverse-engineer the software without solving mathematical problems that take hundreds of years to work out on today's computers — a game-change in the field of cryptography.'"
Deciphering != Reverse Engineering (Score:4, Informative)
Deciphering/Decrypting is not the same thing as Reverse Engineering.
Reverse Engineering is duplicating the functionality without seeing the source code. That should still be possible if you have the ability to run the program.
Re:I Call BS (Score:5, Informative)
When I was writing Windows device drivers, we so, really loved stupid games that detected a debugger and refused to run.
Fortunately most developers stopped doing that once they realised that their customers would have to wait for bug-fixes until they'd reported it to us and then we'd contacted the developers and they'd sent us a special version without the retarded code that stopped us debugging the problem.
The original paper (Score:5, Informative)
Cryptology ePrint Archive: Report 2013/451
Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits
Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters
Abstract: In this work, we study indistinguishability obfuscation and functional encryption for general circuits:
Indistinguishability obfuscation requires that given any two equivalent circuits C_0 and C_1 of similar size, the obfuscations of C_0 and C_1 should be computationally indistinguishable.
In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually.
We give constructions for indistinguishability obfuscation and functional encryption that supports all polynomial-size circuits. We accomplish this goal in three steps:
- We describe a candidate construction for indistinguishability obfuscation for NC1 circuits. The security of this construction is based on a new algebraic hardness assumption. The candidate and assumption use a simplified variant of multilinear maps, which we call Multilinear Jigsaw Puzzles.
- We show how to use indistinguishability obfuscation for NC1 together with Fully Homomorphic Encryption (with decryption in NC1) to achieve indistinguishability obfuscation for all circuits.
- Finally, we show how to use indistinguishability obfuscation for circuits, public-key encryption, and non-interactive zero knowledge to achieve functional encryption for all circuits. The functional encryption scheme we construct also enjoys succinct ciphertexts, which enables several other applications.
Category / Keywords: public-key cryptography / Obfuscation, Functional Encryption, Multilinear Maps
Date: received 20 Jul 2013, last revised 21 Jul 2013
Contact author: amitsahai at gmail com
Available format(s): PDF [iacr.org] | BibTeX Citation [iacr.org]
Re:I Call BS (Score:5, Informative)
Which isn't to say you can't look at some variable in RAM, realize it is the boolean for 'authorized,' and then overwrite it. So it's essentially a complex obfuscation technique, which may or may not be useful. (That is my understanding of the article and abstract, if I am wrong please correct me).
Re:I Call BS (Score:5, Informative)
Not exactly. Reverse engineering starts with the analysis of a running program with the goal of obtaining enough data about the locations of executable code vs data, major variable locations/identifiers etc to allow you to start running automated disassembly of the total code. All you can get out of a run-time analysis is a specific execution path, which does not embody the full code.
Now yes, "if your CPU can read it, then you can read it", but unfortunately in this situation, your CPU can only read it in certain circumstances, so you'll only be able to read it in those same circumtances: execution or simulated execution, leading back to the situation where you're stuck looking at traces of specific executions...
Re:I Call BS (Score:5, Informative)
You can make the algorithm as complex as you'd like, but at the end of the day, you have an input and output(s). You may decide to take a long time to get there, but at the end of the day, I know what you did when the code ran.
This is referred to as "black-box" reverse engineering in the paper. You know what the code did for one input. And if you injected every possible input to the program, at the end you'd have worked out a complete specifications for the function. But how long will that take? It's not always "the end of the day". For functions with a wide range of inputs, it could take the life of the universe to map them all unless you know what you're looking for in advance.
Right now, obfuscation approaches for software usually have some small chocking point to attack. It might be an encryption wrapper around the main binary. Bust that with a debugger, you can get to unobfuscated code for the main system, and then really start to work your way through the program.
But if you have to fight this every step of the way, where all you do is inject inputs and get an output to figure out what the program does, it will take you forever to reverse engineer things. That's the claim of the paper. The code itself is so obfuscated that you can't read it straight and understand it, ever. It looks like random junk. All you can do is run it with an input and see what comes back, and from that reverse engineer what it does. Assemble enough of those and you can see how the program really works. But that black box teardown process--determining possible inputs, injecting them, collecting outputs, and then deriving the function behavior--is time intensive enough that it may not be practical to actually do it anymore. You don't learn anything from reversing any component that speeds handling the next; you have to attack them all like this.
There's a great line from the seminal paper on this subjectOn the (Im)possibility of Obfuscating Programs [weizmann.ac.il]: "Any programmer knows that total unintelligibility is the natural state of computer programs (and one must work hard in order to keep a program from deteriorating into this state)" Extracting meaning from source code, being able to predict what some lines of code will do, it's hard. Ideally you'll just be able to read the code, make sense of it, and then reverse from there. Most systems that are thoroughly cracked have some sections where it's hard, but once you get those the remaining code is straightforward to read. And in fact, it's impossible to make something where you cannot reverse it. The question though is how long it will take.
If no sense is ever made of the code, you can only apply the "black-box" reverse engineering, where you inject inputs, watch outputs, and from there determine what the code does. You can't prevent that, but you can make the box so big that such work is impossible to do. That's what this technique tries to accomplish. You never find an easy part to the code you can read; all you ever find are ones where you have to map every input into an output to figure out what the code does.
Re:Deciphering != Reverse Engineering (Score:4, Informative)
Yes, it is robust. I read the paper a few days ago.
All these comments about how you can "just look at the CPU instructions" are made by people who haven't been following developments in the field. The program never gets decrypted into CPU instructions. Heck, it was never even compiled into CPU instructions in the first place. It gets compiled into a form of boolean circuit, a mathematical equivalent of an electronic circuit that is composed of AND, NOT, OR, XOR gates and wires between them. Then that circuit is itself again transformed into a series of matrices and at that point I hit the limit of what I could understand without needing to read some of the cited papers.
This is a very, very complicated technique that builds upon decades of cryptographic research. If they say it's secure in the cryptographic sense, I think it's very likely to be so.
Re:I Call BS (Score:5, Informative)
I know a fellow who worked on a system that would take IBM 360 machine code, convert it into graph format, and re-engineer it as C/C++ code. They made a killing on Y2K, because so many companies didn't have the source code for some of their software any more.
So yes, I call BS as well. Computationally complex, yes, as graph theory often is. But far from impossible to reverse engineer.
If it can execute, it can be reverse engineered. Only the most obtuse and purposefully warped hand-written assembler can even approach the ability to hide an algorithm from deep inspection, and even that can be largely overcome by applying graph theory to restructure the code graphs.