Forgot your password?
typodupeerror
Software Security

Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software 245

Posted by samzenpus
from the looking-for-edges dept.
another random user writes "The claim here is that the encrypted software can be executed, but not reverse-engineered. To quote from the article: 'UCLA computer science professor Amit Sahai and a team of researchers have designed a system to encrypt software so that it only allows someone to use a program as intended while preventing any deciphering of the code behind it. According to Sahai, previously developed techniques for obfuscation presented only a "speed bump," forcing an attacker to spend some effort, perhaps a few days, trying to reverse-engineer the software. The new system, he said, puts up an "iron wall," making it impossible for an adversary to reverse-engineer the software without solving mathematical problems that take hundreds of years to work out on today's computers — a game-change in the field of cryptography.'"
This discussion has been archived. No new comments can be posted.

Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software

Comments Filter:
  • I Call BS (Score:5, Insightful)

    by MightyMartian (840721) on Thursday August 01, 2013 @12:36AM (#44443399) Journal

    I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.

    • by shentino (1139071)

      TPM could put a stop to that by requiring auditing of voltages.

      MS already requires this of PVP drivers for vista and will revoke them if you allow copyrighted content to leak.

    • Re:I Call BS (Score:5, Informative)

      by phantomfive (622387) on Thursday August 01, 2013 @01:10AM (#44443599) Journal
      The title is wrong. It's not talking about encrypting the software, it's talking about obfuscating the software. They put the compiled code through a function that obfuscates it thoroughly. Their function is complex enough that de-obfuscating the code (that is, returning it to the original form) would be computationally expensive. The paper also talks about encryption, but that is a different part of the paper.

      Which isn't to say you can't look at some variable in RAM, realize it is the boolean for 'authorized,' and then overwrite it. So it's essentially a complex obfuscation technique, which may or may not be useful. (That is my understanding of the article and abstract, if I am wrong please correct me).
      • Re:I Call BS (Score:5, Informative)

        by msobkow (48369) on Thursday August 01, 2013 @06:51AM (#44444835) Homepage Journal

        I know a fellow who worked on a system that would take IBM 360 machine code, convert it into graph format, and re-engineer it as C/C++ code. They made a killing on Y2K, because so many companies didn't have the source code for some of their software any more.

        So yes, I call BS as well. Computationally complex, yes, as graph theory often is. But far from impossible to reverse engineer.

        If it can execute, it can be reverse engineered. Only the most obtuse and purposefully warped hand-written assembler can even approach the ability to hide an algorithm from deep inspection, and even that can be largely overcome by applying graph theory to restructure the code graphs.

        • by msobkow (48369)

          You can contact professor Thomas Dean at Queen's University in Kingston for details of how such graph theory applies -- he was involved with the development of this reverse engineering tool.

      • Yeah, but the real problem I see with this method is that once you found an entry point and a set of conditions to one specific "tile of the jigsaw" you might be able to build up an entire tree of solutions quite quickly and solve it. It obviously requires some intelligence, hundreds of years is a brute force, but nobody said you can't use a smart attack vector on this one either. And the moment you have a point you can latch onto most obfuscation techniques tend to fail quite quickly. At least if they do n
    • by tlhIngan (30335)

      I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.

      Not really. I've seen memory encryption units that ensure that all data hitting memory is encrypted, and it's possible to have the startup code also

    • Yeah I agree, there is something fundamentally wrong with the claims being made. If I have byte code, I can rebuild loops and conditionals with a decompiler. Sure, I don't have comments or var names, but those can be guessed or worked out in something less than 'several hundred years'.

      Suppose you build something that throws in a lot of crazy random jmp calls to make this harder, and I cant be bothered to re-construct a program I want to steal. At some point a single Boolean decision hits the call stack
      • by superwiz (655733)
        You are missing the point of code being data. What it means in this context is that there is a duality between functions and the data they operate on. In other words, you can think of data as operating on the code instead of code operating on data. So your data becomes your maps (in the mathematical sense of the word map) and your functions become the mapped objects.
      • by greg1104 (461138)

        You're focused on a crack of one function in this sort of program. Given enough time, it is always possible to map all the inputs and outputs of a function, and therefore replace it with another that does the same thing--but with a change like a crack installed. The question, though, is how much time will it take?

        There's not an easy to spot boolean on the line here; we're past when idiots built these things now. Let's say the output from the DRM checker is a 1024 bit key that unlocks the next function in

    • "at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code"

      Sure, and one way to reconstruct the program is to provide it every possible input and map the outputs. For that matter, one way to reconstruct the program is simply to load it up, see what it does, and code your own version that does the same thing.

      But the question is how deeply you can inspect the algorithms based on what you see happening i

    • by jhol13 (1087781)

      I do not think this the aim. The aim is to hide the "high level" algorith so that you cannot find it out. The obfuscation seems to be done by transforming the source, not binary.

      This way if you have some "interesting" high level algorithm (to solve some real life problem) you can make reverse-engineering it not only a tad hard, but extremely difficult. For example, suppose I can solve NP in polynomial time. You can see from the virtual machine traces how one case of the problem is solved, but you cannot get

  • by Dynedain (141758) <slashdot2NO@SPAManthonymclin.com> on Thursday August 01, 2013 @12:37AM (#44443405) Homepage

    Deciphering/Decrypting is not the same thing as Reverse Engineering.

    Reverse Engineering is duplicating the functionality without seeing the source code. That should still be possible if you have the ability to run the program.

    • by wierd_w (1375923) on Thursday August 01, 2013 @12:52AM (#44443501)

      One way around this (for reverse engineering) would simply be to run it inside a VM with a built in stack trace debugger, like Bochs.

      You can peek at the raw instructions percolating through the VM's emulated CPU that way. The application itself is then the decryption key, since the system has to be able to run the application.

      PITA, but I don't see how this is anything more than just a "bigger" speedbump, at least as far as disassembly goes. Now, making said decryption require nasty bits of hardware to thwart using a VM, and clinging tightly to treacherous computing with a TPM and a hardware hypervisor on top of this? That's more dicey.

      The question here is... why do this? Is preventing Cracker Jack from the warez scene from simply disabling your "authenticity checks" so horrifically important? That's the major application for this that I really see, besides making epically horrific malware. (Though if you ask me, they are both the same thing.)

      Seriously, other than making personal computing become something from communist russia, what is the benefit of this?

      • Yes, the only way this works is with hardware to back it up. But if we're back to effectively using some sort of dongle to execute, well that road has been well-traveled as well.

        • by wierd_w (1375923)

          I was thinking more along the lines of "special processor instructions", that make use of quirks of real silicon. Given how intel and amd both have taken to cramming lots of extra devices onto the die, a small "inside the CPU" black box designed for this very application would do the trick just fine, and would likewise ensure its presence on modern rigs.

          A virtual machine halting execution would be detected by the running software, because it wouldn't get the proper responses from said black box. You'd have

        • Hardware that can't be emulated you mean.

          Even high grade encryption hardware that refuses to allow debugging can be emulated, if slowly, once you have keys.

      • Seriously, other than making personal computing become something from communist russia, what is the benefit of this?

        Wait, isn't that considered to be feature enough?

      • by bondsbw (888959)

        Seriously, other than making personal computing become something from communist russia, what is the benefit of this?

        Security. One major benefit of obfuscation is making it much more difficult to find local data store encryption keys, service endpoints, etc. It makes it harder to find bugs/exploits such as SQL injection.

        Remember that not all attacks are aimed at the software in general. Many, many attacks on medical/banking/government systems are aimed at finding specific data on specific computers, and the attacker isn't running it on a VM. These attacks rely on perhaps a trojan or backdoor. The harder it is to buil

      • You don't need to emulate the hardware to see a program's output, you can just look at your screen. There is little point in hiding the output of a program from the user who's running it so I don't see that as the point (if one cannot observe the output of a program, it is of little utility).

        Which instructions a given program executes depends on the inputs to said program. For any given input, most programs only execute a tiny portion of their code. Therefore, in order to completely reverse engineer a pr

        • by wierd_w (1375923)

          Agreed, for each nested brach, complexity increases at LEAST geometrically. (A branch always has at least 2 paths.) However, one may not NEED to know *all* branches.

          Say for instance, the behavior you want to modify (we will assume you are a cracker making a memory poking drm defeat patch) is rigidly defined in where it gets called from (by this, I mean what code branches spawn the check). You don't really need to look at the other execution paths, just the ones that trigger the drm check, or better still, t

  • by Anonymous Coward

    Clearly the journalist had no idea what they were writing about.

    • by MrEricSir (398214)

      You mean a link to the paper [iacr.org], like the one in the third paragraph of the article?

      • When you get to that, then it's the GP turn to have no idea what the article is writing about.

        I took a glance at it. Probably don't really understand it unless I spend a week or two seriously reading it thoroughly and its cited articles...

  • by technosean (169328) on Thursday August 01, 2013 @12:38AM (#44443417)

    Who gains from this? Most of all? Virus writers.

    Perhaps the real solution is to have the OS never execute code looking anything like this.

    • Heh. No fucking shit. It would be a malware writer's holy grail; software that has no identifiable signature.

      I still think it's BS, at least on any kind of processor or operating system we're running now.

      • by ls671 (1122017)

        Heh. No fucking shit. It would be a malware writer's holy grail; software that has no identifiable signature.

        I don't think anti-virus software needs to understand, reverse-engineer or decipher a virus program in order to identify its signature. That's not how it works. That's why there is false positives with programs that do not do what the anti-virus software thinks it does.

        • As long as the code is always identical, you're right. It is likely, though, that this machine is able to produce vastly different results from minuscle changes in code, resulting in a very easy way to generate an arbitrary amount of copies of your virus code that doesn't look anything like the others to a scanner.

    • by c0lo (1497653) on Thursday August 01, 2013 @02:59AM (#44444021)
      Sell a program protected like this in Europe [europa.eu] and you may end paying hundreds of millions [forbes.com]:

      (14) A person having a right to use a computer program should not be prevented from performing acts necessary to observe, study or test the functioning of the program, provided that those acts do not infringe the copyright in the program.

      (15) [...]Nevertheless, circumstances may exist when such a reproduction of the code and translation of its form are indispensable to obtain the necessary information to achieve the interoperability of an independently created program with other programs.
      It has therefore to be considered that, in these limited circumstances only, performance of the acts of reproduction and translation by or on behalf of a person having a right to use a copy of the program is legitimate and compatible with fair practice and must therefore be deemed not to require the authorisation of the rightholder. An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. [...].

  • make a crackme (Score:3, Insightful)

    by ZeroNullVoid (886675) on Thursday August 01, 2013 @12:41AM (#44443433)

    If they really think it is so good, then they should put their money where their mouth is.
    Make it into a crackme, issue a large award for solving it.
    Post it online. I give it a few weeks max, if that.
    And who is to say it can't still be manipulated once running?
    Think of the performance cost.

    Either way, I have no faith in an article with little details.

  • and programs will continue to run like it's 1987

  • Or is there an original unscrambled source code retained somewhere?
  • by Sean (422) on Thursday August 01, 2013 @12:53AM (#44443503)

    Call the kernel to access files, sockets, etc.

    Also unless the developer is super 31337 and likes to write everything I expect shares library calls too.

    By watching calls to those interfaces we can figure out what it does.

    • Even if all the libraries are statically compiled, it's still going to be making system calls. Yes, it's behavior could be very hard to determine, but unless it's a standalone operating system that runs entirely on its own, it's going to be traceable. Even then, one could still run it in a VM. Perhaps harder to get useful information out of, but still...

    • by TheLink (130905)
      It could be a custom/modified cryptographic hash used to verify and create signatures. In which case it won't make that many syscalls.

      Reverse engineering something like that will be hard if its obfuscated.

      If you can't reverse engineer it you can't make a compatible version without breaking copyright by copying the hash module as is.
  • The original paper (Score:5, Informative)

    by HatofPig (904660) <<moc.liamg> <ta> <keegehtnotnilc>> on Thursday August 01, 2013 @12:56AM (#44443517) Homepage
    The original paper is available here [iacr.org].

    Cryptology ePrint Archive: Report 2013/451

    Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits

    Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters

    Abstract: In this work, we study indistinguishability obfuscation and functional encryption for general circuits:

    Indistinguishability obfuscation requires that given any two equivalent circuits C_0 and C_1 of similar size, the obfuscations of C_0 and C_1 should be computationally indistinguishable.

    In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually.

    We give constructions for indistinguishability obfuscation and functional encryption that supports all polynomial-size circuits. We accomplish this goal in three steps:

    - We describe a candidate construction for indistinguishability obfuscation for NC1 circuits. The security of this construction is based on a new algebraic hardness assumption. The candidate and assumption use a simplified variant of multilinear maps, which we call Multilinear Jigsaw Puzzles.

    - We show how to use indistinguishability obfuscation for NC1 together with Fully Homomorphic Encryption (with decryption in NC1) to achieve indistinguishability obfuscation for all circuits.

    - Finally, we show how to use indistinguishability obfuscation for circuits, public-key encryption, and non-interactive zero knowledge to achieve functional encryption for all circuits. The functional encryption scheme we construct also enjoys succinct ciphertexts, which enables several other applications.

    Category / Keywords: public-key cryptography / Obfuscation, Functional Encryption, Multilinear Maps

    Date: received 20 Jul 2013, last revised 21 Jul 2013

    Contact author: amitsahai at gmail com

    Available format(s): PDF [iacr.org] | BibTeX Citation [iacr.org]

  • I just scanned the original paper, reserving its detailed lecture for another moment. But it is one of those things that make me think: "Damn ! Wish I had thought of this myself..."
  • encryption used for DVDs was believed to be unbreakable, until it was broken. How many companies have released encryption schemes, for DRM or otherwise and it gets cracked within hours of actual release. more than once Microsoft encryption has been cracked before its official release.

    Though I haven't studied this particular one, my general impression is that it was not designed by cryptography experts and then vetted the way well known cryptographic algorithms are vetted by other experts.
    • by Meneth (872868)
      Aye, it's easy to construct an encryption scheme that you yourself can't break.
    • Actually the encryption itself on these schemes was not broken. Rather, player emulators became good enough that the industry could not revoke hacked players fast enough to keep up.

  • It's called Forth [wikipedia.org]

  • by Animats (122034) on Thursday August 01, 2013 @02:22AM (#44443915) Homepage

    This is fascinating, but hard to understand. It's not clear how broad a result this is.

    This seems to apply to "circuits", which are loop-free arrangements of AND, OR and NOT gates. These take in some bit string with a fixed number of bits and emit another bit string with a fixed (but possibly different) number of bits. Many computations can be encoded into this form, but the "circuits" are typically very deep, since this is sort of like loop unwinding. Although this seems kind of limited, you could, for example, make a reasonably sized "circuit" to compute DES, which is a static pattern of Boolean operations.

    "Obfuscation" here means taking in some "circuit" and producing a bit for bit equivalent "circuit" from which little information can be extracted. The obfuscated circuit may be (will be?) more complex than the original, because additional logic has been inserted in a somewhat random way.

    The contribution in this paper seems to be that this might be useful for "functional encryption". This is where you have some values, such as A and B, which are encrypted and are to be protected, but the result of some function f(A,B) is not protected. The idea is to have an implementation of f which combines the decryption and the desired function, an implementation which cannot be reverse engineered to return decrypted versions of A or B. While this has been suggested as a possible feature for databases, it's hard to apply to useful functions, and so far, it's mostly a research idea there. It's been suggested for some complex access control schemes involving mutual mistrust, but that too is a research topic.

    Anyway, this doesn't mean you're going to be able to run your big executable program through some kind of magic obfuscator that will prevent someone from figuring out how it works. Not yet, anyway.

  • by Badge 17 (613974) on Thursday August 01, 2013 @02:27AM (#44443929)

    I can't really comment on the slashdot summary, but take a look at the actual abstract: http://eprint.iacr.org/2013/451 [iacr.org]

    "In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually."

    In other words, it seems that their technique allows you to encrypt some secret data, then (provably) only release the result of some arbitrary function of that data. It sounds like this means you could (in principle) release health data in encrypted form, then allow researchers to study some ensemble properties of it by giving out the appropriate keys. This aspect of it certainly seems pretty cool.

  • But UCLA doesn't get to claim the credit. MIT was first to present homomorphic encryption: http://web.mit.edu/newsoffice/2013/algorithm-solves-homomorphic-encryption-problem-0610.html [mit.edu]

    Roughly speaking, homomorphism is a map which preservers the properties pertinent to a category. Now think of data as acting on code instead of code acting on data. Since "acting" is a mapping, acting in a homomorphic way would produce program results which are equivalent but without the decrypting step.

  • by Opportunist (166417) on Thursday August 01, 2013 @03:20AM (#44444075)

    Remember, kids, everything that can be applied for good can be applied for ill. And code that is impossible to decipher and analyze is the holy grail of malware.

  • by Kazoo the Clown (644526) on Thursday August 01, 2013 @04:09AM (#44444287)
    It sounds like what they are trying to do here is to refactor the algorythm such that f(x)=y produces the same answer but it's not practical to modify because the new function is mathematically scrambled-- the effect of each component of the code is so obscured that it's not so easy to tell how it contributes to the results. It's like using a neural net to implement your solution, the contribution of each neural node to the overall result can be significantly obscure. Won't stop someone from stealing the algorythm as is, but hacking it to produce an alternate result set will be out of the question, as long as they can't just build a truth table of inputs to outputs...
  • RTFA is shallow on details, but it sounds very much like the "phantom" self-mutating MS-DOS viruses from the later 90. (The ones which forced anti-virus vendors to effectively implement a debugger/a interpreter for the code to detect the malicious code not by the look but by the actual work it does.)

    Otherwise, the main problems with such intrusive obfuscations is that code is never 100% bug free and bugs sometimes trigger undefined/unspecified behaviors. It might work before obfuscation - but would break

  • Normally obfuscation is bad in cryptography - it means that the system is theoretically broken, but that the way to break it is quite well hidden.

    This refers to cryptographically secure obfuscation. This is an entirely new field, and hasn't been possible till now. This paper doesn't prove how to do it, but proves that it is possible for a certain subset of operations.

    Basicly it boils down to the fact it is possible to make a computer program that, for a given set of inputs a) generates a set of outputs b)

10 to the 12th power microphones = 1 Megaphone

Working...