Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google Privacy Security IT

Google Has Demonstrated a Successful Practical Attack Against SHA-1 (googleblog.com) 143

Reader Artem Tashkinov writes: Ten years after of SHA-1 was first introduced, Google has announced the first practical technique for generating an SHA-1 collision. It required two years of research between the CWI Institute in Amsterdam and Google. As a proof of the attack, Google has released two PDF files that have identical SHA-1 hashes but different content. The amount of computations required to carry out the attack is staggering: nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total which took 6,500 years of CPU computation to complete the attack first phase and 110 years of GPU computation to complete the second phase.

Google says that people should migrate to newer hashing algorithms like SHA-256 and SHA-3, however it's worth noting that there are currently no ways of finding a collision for both MD5 and SHA-1 hashes simultaneously which means that we still can use old proven hardware accelerated hash functions to be on the safe side.

This discussion has been archived. No new comments can be posted.

Google Has Demonstrated a Successful Practical Attack Against SHA-1

Comments Filter:
  • by JoshuaZ ( 1134087 ) on Thursday February 23, 2017 @01:15PM (#53918833) Homepage
    If one looks at the history of what happened the last time a major hash was broken, there was a large gap between when MD5 has its first collisions and when it became practical to detect collisions. There was about a little under a decade between when the first collisions were found and when it became easy to find collisions. The general expectation is that hash systems will fail gracefully in a similar way so we have a large amount of warning to switch over. Unfortunately, we've also seen that in practice people don't adopt new hash algorithms nearly as fast as they should. The second to last Yahoo security breach was so bad in part because the passwords were hashed with a completely unsalted MD5 https://nakedsecurity.sophos.com/2016/12/15/yahoo-breach-ive-closed-my-account-because-it-uses-md5-to-hash-my-password/ [sophos.com]. The lack of salting would have been by itself a problem even when MD5 was still considered insecure. That in 2015, a decade after MD5 was broken for almost all purposes, Yahoo was still using it, is appalling. Unfortunately, they likely aren't the only one. And I fully expect that if Slashdot is around in a decade we'll read about someone who has foolishly stored passwords using SHA-1.
    • by swillden ( 191260 ) <shawn-ds@willden.org> on Thursday February 23, 2017 @04:00PM (#53920099) Journal

      The second to last Yahoo security breach was so bad in part because the passwords were hashed with a completely unsalted MD5 https://nakedsecurity.sophos.com/2016/12/15/yahoo-breach-ive-closed-my-account-because-it-uses-md5-to-hash-my-password/ [sophos.com]. The lack of salting would have been by itself a problem even when MD5 was still considered secure.

      Actually, even with salting, no standard cryptographic hash function is appropriate for password databases. You can squeak by if you iterate the hash function enough times, but even that is pretty weak, since it means that an attacker with lots of GPUs -- or, even worse, special-purpose hardware -- can perform hashes so much faster than you can that the key stretching you obtain is minimal.

      The state of the art in password hashing is algorithms like Argon2 [password-hashing.net], with parameters that are tuned to require significant amounts of not just CPU time, but RAM and threads. Argon2, tuned to require, say, 10ms of time on four cores and 256 MiB of RAM, is going to significantly strengthen passwords. The RAM requirement means a GPU with 4 GiB of RAM can only test 16 passwords in parallel, making GPU-based cracking essentially useless, since what GPUs provide is huge parallelism. Custom ASICs would do better, but would still run into bottlenecks on the speed of the RAM. Making really fast cracking hardware would require either huge amounts of RAM, or large amounts of extremely fast RAM. Either way, big $$$.

      Even better, if at all possible you should use a hash that is keyed as well as salted. Doing that requires having some place to store the key that won't be compromised by the same sorts of attacks that compromise your password database. In most cases that's hard to do. Argon2 will accept a key so you can get both sorts of protection, though if you can be really, really certain that no attacker can ever get the key, then you can use a standard cryptographic hash function in a keyed mode, e.g. HMAC-SHA256, though I'd still recommend using a purpose-designed password hash (e.g. Argon2) in case your key is compromised.

      • The problem with that is on the other practical end: if you massively increase the resources needed will also increase the server side resources; it won't be as bad as it will be on the cracking end, but server resources are expensive. There's a point beyond which you aren't going to get people to agree to do it and a certain point where that insistence really does become reasonable. This is similar to why we don't use much longer keys for public key encryption and use really large primes for DH key exchang
        • The problem with that is on the other practical end: if you massively increase the resources needed will also increase the server side resources; it won't be as bad as it will be on the cracking end, but server resources are expensive.

          It won't be as bad as on the cracking end, that's the whole point. The reason for doing password hashing is to exploit the asymmetric level of effort between hashing and brute force search. To make that work, though, you do need to invest as much as you can afford in the server, to move the bar for the attacker as high as possible -- hopefully out of reach of all but the most serious. If you can't afford very much, fine, but realize that you're also not setting the bar very high.

          But this is exactly why go

          • But this is exactly why good password hashing algorithms are moving to RAM consumption as the primary barrier. It's pretty trivial for a server with many GiB of RAM to allocate 256 MiB to hashing a password, for a few milliseconds, but it gets very costly, very fast, for the attacker. And if you can't afford 256 MiB, how about 64?

            Using memory dependent hashes works better if one is a small server since one will rarely have a lot of people sending in their passwords at the same time, so the RAM space you need isn't that large. If you are a large organization then this doesn't work as well because you then need room to be able to do many such calculations functionally simultaneously.

            Nope. The leverage factor in the password hashing case is linear, since the entropy of passwords is constant (on average). The leverage factor for cryptographic keys is exponential. The reason we don't use much longer keys for public key encryption, etc., is because there's no point in doing so, not because we can't afford it. The key sizes we use are already invulnerable to any practical attack in the near future. For data that must be secret for a long time, we do use larger key sizes, as a hedge against the unknown.

            I agree that there's a linear v. exponential difference there(although for many of these it is more like linear and subexponential due to algorithms like

            • Using memory dependent hashes works better if one is a small server since one will rarely have a lot of people sending in their passwords at the same time, so the RAM space you need isn't that large. If you are a large organization then this doesn't work as well because you then need room to be able to do many such calculations functionally simultaneously.

              Meh. If you are a large organization, you can afford more.

              Anyway, the point is that you should turn it up as much as you can afford.

              I agree that there's a linear v. exponential difference there(although for many of these it is more like linear and subexponential due to algorithms like the number field sieve),

              Yes, NFS is subexponential, but not very "sub". And anyway, RSA is old, broken crypto which should be migrated away from.

              but the rest of your comment is essentially wrong. We keep keys just long enough that we consider it to be highly unlikely that they are going to be vulnerable, but not much more than that.

              I hate to resort to appeal to authority, but the actual analysis required to prove it is way more effort than I have time for this morning. Take a look at keylength.com, it has a host of authoritative references.

              In fact, it would be a lot safer if we increased key sizes more than we do, but there are infrastructural problems with that. See e.g. discussion at http://crypto.stackexchange.com/questions/19655/what-is-the-history-of-recommended-rsa-key-sizes [stackexchange.com]

              Heh. In my previous reply I actually typed a

              • If you are a large organization, you can afford more.

                Yes, but the point is the way it scales; If you are tiny you can reasonably assume that the almost no occasions will occur when you need to do multiple hashes in a small amount of time. If you are larger then you end up with a lot of extra RAM that you aren't going to use regularly but will need to use during peak log-in times. I agree that you can probably afford more, but getting corporations to do so is difficult; at the end of the day, everyone cares about their bottom lines.

                RSA is old, broken crypto which should be migrated away from.

                This suggests that you have

    • There's a straightforward reason why lots of web apps continued to use MD5 *long* after it was deprecated: MySQL had a function for md5() almost from the start, but didn't have an inline function for SHA() until 4.0.2... and 4.x didn't become the default version in long-term stable server-oriented distros until 2006 or later. Getting a new release to run on YOUR development computer? Easy. Convincing an enterprise sysadmin to let you have it on a production server before it has become the 'stable' default *

      • by flink ( 18449 )

        There's a straightforward reason why lots of web apps continued to use MD5 *long* after it was deprecated: MySQL had a function for md5() almost from the start, but didn't have an inline function for SHA() until 4.0.2...

        What sane person does their hashing inside their RDBMS? You hash it in code before you store it.

        • by Altrag ( 195300 )

          The sane person who isn't given several days to investigate and implement their own algorithms (not to mention months of testing if you want to be sure you didn't fuck it up) when there's a 3-character function call sitting there waiting to be used.

          Especially if they're not crypto experts and don't even know that doing the hash in the rdbms is bad, never mind why its bad. Which, for better or worse, is a large portion of the programming professionals, especially when you're including outsourcing companies

  • The thing is that there is not actually a lot you can do with an SHA1 hash collision. Sure, you may be able to impersonate a site by use of a fake certificate. But these are around anyways because of CAs with shoddy security and governments that do not understand the value of security and just coerce CAs in giving them out. So an SHA1 collision is actually a bit of overkill for that and likely the most expensive option by a large margin. So what else is left? I do not see anything.

    Sure, if this was somethin

    • by EndlessNameless ( 673105 ) on Thursday February 23, 2017 @02:24PM (#53919331)

      Not a lot you can do?

      Anything that requires signatures is vulnerable to forgery if the signer's certificate specifies SHA1.

      An attacker could forge:

      1. Software signatures - to slip malware into a software vendor's distribution channels.

      2. SSL certificates - to MITM web connections to phish, steal data, or distribute malware.

      3. Personal digital signatures - to fabricate documents, including emails, transaction, orders, etc that are normally trusted implicitly due to the signature

      4. Subordinate CA certificates - to create trusted certificates which permit all of the above

      The problem lies with #4. The real risk is not a one-off duplicate of John Doe's smart card. The real danger is the CAs signed with SHA1 who are still trusted by browsers, applications, and OSes around the world. If an attacker counterfeits one of their certificates, he can issue arbitrary certificates for any web site, any software publishers, or any user.

      The only solution is to discontinue the use of SHA1 internally and to revoke trust for all CAs that still use SHA1. Better crypto has existed for a long time---the standard for SHA2 was finalized in 2001, well over a decade ago.

      • by gweihir ( 88907 )

        Indeed. Not a lot you can do even when you ignore the high effort needed and that it is a 2-sided collision. I do not dispute that you should not use SHA1 when you want security, but the actual attacks possible at this time are pretty much irrelevant. Your list just confirms that. It looks impressive (well, sort of), but when you take into account the effort of each attack and the possible gain, they become meaningless, because higher gains at lower effort are around plenty.

        • Would it be possible to tamper a famous git repo (e.g. Linux) by writing a malicious commit with the same hash?

        • by Altrag ( 195300 )

          TFS says a total of ~6600 computer years.

          Get a botnet of ~80k computers and you have it in around a month. That's hardly "meaningless" when we're talking something as globally damaging as a root CA certificate, especially when its coming at a time that we're seeing botnets hitting millions of nodes.

      • Not a lot you can do?

        Anything that requires signatures is vulnerable to forgery if the signer's certificate specifies SHA1.

        An attacker could forge:

        1. Software signatures - to slip malware into a software vendor's distribution channels.

        That requires a second pre-image attack, not just a collision attack. (What gweihir called "two-sided" rather than "one-sided"... though that is not standard terminology).

        2. SSL certificates - to MITM web connections to phish, steal data, or distribute malware.

        Also requires a second pre-image attack.

        3. Personal digital signatures - to fabricate documents, including emails, transaction, orders, etc that are normally trusted implicitly due to the signature

        This one can be done with a collision attack. You generate two different documents which hash to the same value, but have different contents. The PDF format, unfortunately, make it pretty easy to generate documents which look sensible and have this property. It's not possible with more transparent fo

        • This can only be done with a collision attack if the CA is really, really stupid. Proper CAs should include chain-length restrictions in their certificates.

          Please correct me if I'm wrong, but it appears that most CAs are really, really stupid. Here's a list of the CAs included in Firefox: https://mozillacaprogram.secur... [force.com] . I split the PEMs into a pile of files, and checked them:

          $ for pem in * ; do openssl x509 -text -in $pem | grep pathlen ; done
          CA:TRUE, pathlen:4
          CA:TRUE, pathlen:1
          CA:TRUE, pathlen:1
          CA:TRUE, pathlen:7

          • So out of 172 root CAs only 14 include any path length restrictions, and even the ones who do still allow some chaining.

            O_o

            We're doomed.

            I don't think the SHApocalypse will be tomorrow. This was an identical-prefix attack instead of a chosen-prefix which constrains the attacker considerably, and the computation required is much higher even to generate simple collisions. However, (again, please correct me if I'm missing something) it does seem plausible that that further weaknesses will be found which provide just enough leverage to forge a signature with one of those 172 CAs, and we may eventually see a rogue sha1WithRSAEncryption CA issued.

            I concur, completely.

    • by Anonymous Coward

      You seem to miss the point entirely. This is the expected course of degradation for a hashing algorithm. First it becomes theoretically possible, then it becomes demonstrable at extreme cost (meaning only a few organizations could pull it off). Then it becomes expensive but possible, then your watch can do it. From the amount of time it takes to hit these waypoints you can determine when it will be useless. Because changing the course of large organizations is like trying to turn around an aircraft carrier,

      • by gweihir ( 88907 )

        Actually, that _is_ my point. This is not any big news, it is a small step in an expected progression. But my second point is that the value of certificates (what this mostly applies to) is generally vastly overrated.

    • But these are around anyways because of CAs with shoddy security and governments that do not understand the value of security and just coerce CAs in giving them out.

      There's a big difference between having a systemic flaw and working on social engineering, and the presence of one doesn't invalidate strengthening against the other.

      • by gweihir ( 88907 )

        That is just my point. There is a big difference between a high-effort attack that is hard to do and a simpler one that has been done mass-scale. The second is a real risk, the first one is pretty irrelevant. Incidentally, the defects of the CA system are systematic, and they cannot be fixed by merely moving to a non-broken hash function.

        • No you're missing the point. Social engineering and a trust model is not a systemic failure in the sense that the attack is replicatable with any kind of certainty. A broken hash function however is.

          You break a hash function, it's broken.
          You trick a guy into giving you a certificate to break the chain of trust, doesn't mean you can do it again when that cert becomes revoked.

          • by gweihir ( 88907 )

            You really have no clue about IT Security Risk Management. A broken trust model most certainly is a systematic failure and it is far, far worse than a defect implementation detail like an insecure hash function.

  • by Anon E. Muss ( 808473 ) on Thursday February 23, 2017 @01:27PM (#53918929)

    ... however it's worth noting that there are currently no ways of finding a collision for both MD5 and SHA-1 hashes simultaneously

    Any crypto geeks want to weigh in on the truth of this statement? I've often wondered about this. Wouldn't using two hash algorithms be easier and more effective over the long term than getting the whole world to upgrade to the Latest And Greatest Hash every ~10 years?

    • Perhaps I was completely wrong [sans.edu] - skip to the Mysid's comment. My sincere apologies then. But this explanation just doesn't work/compute in my head - even today finding MD5 collisions is extremely computationally expensive, yet the person says SHA1 + MD5 is only slightly more computationally expensive.

      Let's put it in layman's terms: let's say your cluster made of a thousand GPUs finds MD5 collisions for given data every second. Now finding an SHA1 collision in Google's case required 9,223,372,036,854,775,8

    • Taking the MD5 and the SHA1 of something isn't significantly more secure than just taking the SHA1 of said something. This was demonstrated in 2004 here: http://link.springer.com/chapt... [springer.com] This was then further elaborated and improved upon here: http://eprint.iacr.org/2008/07... [iacr.org] So, don't concatenate hashes kids. It doesn't do what you think it does. Using a proper hash from the start is the only safe way to do things. Even if nobody has figured out how to do it yet the math conclusively shows that breakin
      • by namgge ( 777284 ) on Thursday February 23, 2017 @02:39PM (#53919445)

        I think the idea was that one finds an MD5 collision for document A by adding a block of data B to the end of it creating a new documents C. The SHA hash of document C, SHA(C) will not, in general, match SHA(A).

        FInding B such that both MD5(C)==MD5(A) and SHA(C) ==SAH(A) is still unfeasible.

      • by tlhIngan ( 30335 )

        Taking the MD5 and the SHA1 of something isn't significantly more secure than just taking the SHA1 of said something. This was demonstrated in 2004 here: http://link.springer.com/chapt [springer.com]... This was then further elaborated and improved upon here: http://eprint.iacr.org/2008/07 [iacr.org]... So, don't concatenate hashes kids. It doesn't do what you think it does. Using a proper hash from the start is the only safe way to do things. Even if nobody has figured out how to do it yet the math conclusively shows that breaking

        • That's for concatenated hashes. As in, you hash the two hashes to form one number, usually by XOR'ing the numbers together. Which can be shown to increase the solution space considerably.

          But that reduced the difficulty of breaking SHA1, since you now just need a collision for the length of a shorter MD5 hash.

    • by skids ( 119237 )

      It really doesn't matter for the most important use case, because X.509 does not have a way to use more than one hash in a certificate.

      And there's a patent for that (US7793097, US20080270788), so unless the owners decide to be benevolent, there's a roadblock to actually implementing it.

    • by kiwix ( 1810960 )
      Finding collisions in MD5 and SHA-1 simultaneously is only marginally harder. Using Joux's multicollisions, you can finding a collision in both by running the SHA-1 attack 64 times: this defines 2^64 messages colliding for SHA-1, and you can locate a collision by evaluating all those hashes (wich costs less than finding the SHA-1 collisions with the current attack).
    • ... however it's worth noting that there are currently no ways of finding a collision for both MD5 and SHA-1 hashes simultaneously

      Any crypto geeks want to weigh in on the truth of this statement? I've often wondered about this. Wouldn't using two hash algorithms be easier and more effective over the long term than getting the whole world to upgrade to the Latest And Greatest Hash every ~10 years?

      MD5 + SHA1 is a "new hash algorithm". Think about what you have to do to shift to a new algorithm... all of the message formats that have to be updated, all of the stored values that have to be recomputed, all of the cross-system compatibility dances you have to do to ensure that you can upgrade both sides (or all sides; there are often more than two) in order to update without having to make some error-prone attempt to cut over simultaneously.

      The challenge of changing hash algorithms has nothing to do wi

    • by AHuxley ( 892839 )
      Argentina tried a few ideas like that in the 1980's knowing the skills the UK's GCHQ had https://en.wikipedia.org/wiki/... [wikipedia.org]. Argentina needed a lot of fast, real time communications using equipment that was in place.
      The GCHQ has most of the worlds mil grade export crypto decryption computer ready thanks to most brands and firms been happy to help the UK gov. Real time decryption was easy and any efforts to link two layers of junk export grade crypto was detected and useless.
      The only aspect that slowed th
  • Apollo program demonstrated practical transportation to the moon.
  • I know this isn't the primary point of the announcement, but does anyone know where the authors get 10 years from, as included in this statement: "Today, 10 years after of SHA-1 was first introduced..."? Best I can tell, SHA-1 was formally defined in 1995 (FIPS PUB 180-1), and I'm pretty certain it was in common widespread use long before 2007. Are they referring to the first time it was introduced into one of their own products or something? or I'm I missing something obvious?

  • by rastos1 ( 601318 ) on Thursday February 23, 2017 @02:33PM (#53919397)

    I occasionally use a signed .jar in the company intranet. Reading TFA, I wondered what hash is used to sign that? It seems that jarsigner is not willing to divulge, so I had to write a little piece of code aaand ... yup, it's SHA1!

    How common is this? Is loads of software now susceptible to attack by replacing a original code by a malware with the same SHA1?

    • by viperidaenz ( 2515578 ) on Thursday February 23, 2017 @02:56PM (#53919599)

      Perhaps you could have just read the documentation?

      Supported Algorithms
      By default, the jarsigner command signs a JAR file using one of the following algorithms:

      Digital Signature Algorithm (DSA) with the SHA1 digest algorithm

      RSA algorithm with the SHA256 digest algorithm

      Elliptic Curve (EC) cryptography algorithm with the SHA256 with Elliptic Curve Digital Signature Algorithm (ECDSA).

      If the signer's public and private keys are DSA keys, then jarsigner signs the JAR file with the SHA1withDSA algorithm. If the signer's keys are RSA keys, then jarsigner attempts to sign the JAR file with the SHA256withRSA algorithm. If the signer's keys are EC keys, then jarsigner signs the JAR file with the SHA256withECDSA algorithm.

      These default signature algorithms can be overridden using the -sigalg option

      http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jarsigner.html

  • Crafted data (Score:5, Informative)

    by OneHundredAndTen ( 1523865 ) on Thursday February 23, 2017 @02:39PM (#53919451)
    The data was crafted in order to simplify attaining their goal. It would be far more damning if they could put together a document that results in the same SHA-1 hash as that of an externally specified document.
  • The Internet Archive to the rescue:
    http://web.archive.org/web/*/https://marc-stevens.nl/research/papers/C13-S.pdf

    The paper describes the tech behind the code (linked below) capable of *detecting* the probability of a file + sha-1 hash being either the result of a forgery, or "easier to forge" (from what I understand of the attack against SHA1 done by Google, et all.):

    https://github.com/cr-marcstevens/sha1collisiondetection

    Now, to try that code against some files as well as the two PDFs google released... :-

  • The web site where it is published use SHA-256

  • Fancy numbers, but they're incomplete. What was the total cost for this 2 year exercise if a ordinary human with a soul sat down and paid for every watt - how much did this PDF experiment cost? The maths has been clear about this for a very long time, google is drinking cool-aid if they think this means anything more than what has already been discussed or discovered.

    Let me put it another way, in recent history the A+B=C mathematical formula was proven by a reclusive Japanese mathematician. Have we stopped

  • You may notice they never do these stunts on ASCII text files.

    The attack always requires to be able to store a large number of random data on the document.
    There are 62 different bytes in the provided PDFs : part of these are the edited message, part of them are the 'attack'. That's 496 bits.

    So it may be applied to any kind of document allowing to store about 496 bits of (invisible) data on it (and to be on the safe side, I'm assuming even half of that could be enough).
    Not all successful attacks will requir

  • Since SHA-2 ciphers where introduced in TLS 1.2, that suggest we should remove TLS 1.0 and TLS 1.1. suppoort soon.

Life is a game. Money is how we keep score. -- Ted Turner

Working...