Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Google

Google's Releases New AI Photo Upscaling Tech (petapixel.com) 97

Michael Zhang writes via PetaPixel: In a post titled "High Fidelity Image Generation Using Diffusion Models" published on the Google AI Blog (and spotted by DPR), Google researchers in the company's Brain Team share about new breakthroughs they've made in image super-resolution. [...] The first approach is called SR3, or Super-Resolution via Repeated Refinement. Here's the technical explanation: "SR3 is a super-resolution diffusion model that takes as input a low-resolution image, and builds a corresponding high resolution image from pure noise," Google writes. "The model is trained on an image corruption process in which noise is progressively added to a high-resolution image until only pure noise remains." "It then learns to reverse this process, beginning from pure noise and progressively removing noise to reach a target distribution through the guidance of the input low-resolution image." SR3 has been found to work well on upscaling portraits and natural images. When used to do 8x upscaling on faces, it has a "confusion rate" of nearly 50% while existing methods only go up to 34%, suggesting that the results are indeed photo-realistic.

Once Google saw how effective SR3 was in upscaling photos, the company went a step further with a second approach called CDM, a class-conditional diffusion model. "CDM is a class-conditional diffusion model trained on ImageNet data to generate high-resolution natural images," Google writes. "Since ImageNet is a difficult, high-entropy dataset, we built CDM as a cascade of multiple diffusion models. This cascade approach involves chaining together multiple generative models over several spatial resolutions: one diffusion model that generates data at a low resolution, followed by a sequence of SR3 super-resolution diffusion models that gradually increase the resolution of the generated image to the highest resolution." "With SR3 and CDM, we have pushed the performance of diffusion models to state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks," Google researchers write. "We are excited to further test the limits of diffusion models for a wide variety of generative modeling problems."

This discussion has been archived. No new comments can be posted.

Google's Releases New AI Photo Upscaling Tech

Comments Filter:
  • old porn (Score:4, Informative)

    by stud9920 ( 236753 ) on Monday August 30, 2021 @07:35PM (#61746927)

    Can't wait to upscale my old porn collection

  • by lsllll ( 830002 ) on Monday August 30, 2021 @08:05PM (#61746981)
    The enhanced pictures in the article are pretty impressive. So we get to make fun of any movie where the guy says "blow that portion up" as long as the movie was made before 2022?
    • Here's the technical explanation: "SR3 is a super-resolution diffusion model that takes as input a low-resolution image, and builds a corresponding high resolution image from pure noise," Google writes.

      Interesting in that noise is usually something one tries to avoid in communications. Seems nervous systems may use noise to improve.

      • Re:Impressive! (Score:5, Interesting)

        by Baron_Yam ( 643147 ) on Monday August 30, 2021 @09:06PM (#61747129)

        They do. There's a method for improving balance in seniors that involves electronic shoe liners that add random stimulation to their soles.

        It works... presumably because the random misfires of their own nerves are mixed in the with random noise introduced by the devices, and the result is their brain filters them out together leaving higher quality nerve input for balancing. I say 'presumably' because I didn't either have long forgotten that part, or never read it in the first place.

      • Re:Impressive! (Score:5, Informative)

        by AmiMoJo ( 196126 ) on Tuesday August 31, 2021 @04:36AM (#61747853) Homepage Journal

        Adding noise can actually improve measurements in some cases.

        For example, say you are measuring a voltage and your analogue to digital converter (ADC) has a resolution of only 1 volt. The actual voltage is 7.4V, but your ADC always reads 7V because that's its maximum resolution.

        Now add in 0.3V random noise so that the voltage is randomly between 7.1V and 7.7V. Read the value 10 times and your ADC will randomly measure 7V or 8V, depending on which way it rounded due to the noise. Average your results out and you will get a value closer to 7.4V than you would have got by taking one reading with no noise.

        Google seems to have done something similar but presumably not relying on this mathematical relationship. They take the original image, upscale it using nearest neighbour, add a massive amount of noise and then use AI to do noise reduction.

        • They take the original image, upscale it using nearest neighbour, add a massive amount of noise and then use AI to do noise reduction.

          That's not what TFS says. It says that noise is added in incremental steps to the hi-res image, and the network learns to reverse that process conditional on the low-res image.

      • by noodler ( 724788 )

        It starts with noise because the process is inherently statistical.
        The algorithm needs something to work on and noise is a good statistical starting point. Then the algorithm filters the picture out of that noise.

      • Interesting in that noise is usually something one tries to avoid in communications.

        Only audible noise to your ears. Most if not all modern communications rely on artificially adding noise to improve communication, a process known as "dithering".

    • by ceoyoyo ( 59147 )

      Yes. This is a very common problem with generative models like these. You'll notice their benchmark is the "confusion matrix." Basically they test how well people are able to figure out which is the real high res picture, and which is the simulated one.

      So they're evaluating systems that can create good looking images. They're not evaluating whether the system produces *accurate* images. David Caruso might say enhance and get a very nice looking picture of the wrong person in that taillight reflection.

      In the

      • In the blog they mention that it's good for medical imaging. Yeah.

        Good point. One might hope they don't leave it to the initial noise to decide whether to upscale the image to look like a tumor or not!

    • by jaa101 ( 627731 )

      While the "before" images are low res, they all look like they've been digitally scaled down from high quality originals, some of them professional studio portraits. I'd like to see what happens when they try with real-world, low-res originals rather than these synthetic ones. I suspect just a little noise in the input, as most low-res originals would naturally have, will substantially degrade the output quality.

      • by Junta ( 36770 )

        If the model was retrained with those sorts of artefacts, then it's probably pretty good at basically deleting them from the source as it invents details. Generally these things include both making up missing detail and dropping undesired artefacts at the same time. As they appear in the materials that it did analysis on before, it would start including that the upscales deleted some apparent details in the source based on some signs (e.g. a suspiciously JPEG block sized sharp change in color isn't in the

    • In the movie trope, they are looking for something specific in the enhanced image (like an actual license plate number), rather than just something plausible (like, say, the texture of hair on top of a pixelated head). The latter is doable, the former is not.
      • by Junta ( 36770 )

        I tend to agree, though one thing that might help is to be less distracting. Sometimes we aren't good at ignoring the blockiness or blurriness, or picturing every possible font and letter to see if it could match a sample of unreadable text.

        But there is the danger of it inventing details that would complicate identification since it won't match up and we don't *know* what details are real and what are invented.

    • by nagora ( 177841 )

      The enhanced pictures in the article are pretty impressive.

      Yes. Adverts rarely mention flaws.

    • by noodler ( 724788 )

      Impressive is not the right word. We all know that you can enhance foto's with these kinds of tools.
      What is missing from their article is ground truth. They don't show the actual original hires photos. So we don't know whether the reconstructed pictures look anything like the real people. How well did the algorithm manage to guess the right pixels?
      The fact that they show no hires originals tells me that this algorithm just makes shit up and the result does not really match the original faces very well.

      • by Junta ( 36770 )

        The paper did include. The answer is while the upscales are plausible pictures, they of course can't match details to the original. The details are gone. So at the top of the paper, you have a kid's face, and the upscale... well it has one pretty clear tell because it kind of lost it around the collar and basically rendered static instead of a guess. But putting that aside, you may think it's a kid with somewhat weird teeth rather than a totally made up picture. Comparing the two faces side by side the tee

        • by noodler ( 724788 )

          This is to make low quality material more pleasant, not some declaration that details can be recovered that are irreversibly lost.

          LOL, i'm pretty sure law enforcement is already forming lines at the google headquarters carrying signs with the text: "Shut up and take the money!".

  • by marcle ( 1575627 ) on Monday August 30, 2021 @08:05PM (#61746983)

    Does anybody edit this website?

    • Does anybody edit this website?

      Yeah, the gloss on that phrase had several branching red-herrings.
      My theory would be that it originally was entered as something like "Google's New AI Photo Upscaling Tech Released", then re-ordered in active voice for more click appeal, but the editor neglected to removed the possessive case.

  • Ah good something better than this [benvista.com] and free as well.

  • by bustinbrains ( 6800166 ) on Monday August 30, 2021 @08:15PM (#61747013)

    Law enforcement is drooling already. They can finally enhance all those grainy, blurry, unrecognizable videos but still manage to arrest the wrong person.

    • Re:ENHANCE! (Score:5, Insightful)

      by Morpeth ( 577066 ) on Monday August 30, 2021 @08:24PM (#61747031)

      That's a great point, because it would be VERY easy for an image to be "enhanced" in a way that the wrong person gets identified. If I were a defense attorney, I'd be sure from now on (if they don't already), to ask to see the raw / unedited images they use as evidence -- not the modified 'improved' versions they'd want to show a jury or judge.

      • by AmiMoJo ( 196126 )

        I've seen that done for parking tickets, of all things. In the UK parking tickets are a civil matter and some of the parking companies have been known to submit photoshopped images into evidence. Sometimes they get caught out because they accidentally submit the original as well, other times the victim has their own photos that don't match up.

        The usual modification is to make any signage more visible, or occasionally even add in signage that wasn't there.

    • How do you think rare artists picture are forged, and resold as the original? Thanks to 3D plotters/printers you can get both the paint and the brushstroke length almost perfect. The different wavelengths, panoine all very good indeed. If crooks can fake old masters, police can fake and misrepresent enhanced photos, even deepfake if they have deep pockets. The real story is will enhanced pictures be used in evidence or to bias cops to look for some enhanced suspect only. The good news is past presidents wil
    • Re:ENHANCE! (Score:4, Interesting)

      by Zaiff Urgulbunger ( 591514 ) on Tuesday August 31, 2021 @06:32AM (#61747959)
      It is very worrying!

      In TFA, there's an example of an up-scaled tram. There a blue ... something, perhaps a warning notice or something near the front of the tram. But imagine this is a registration number of the tram. It is not visible in the original, and I don't believe there are enough pixels to even take a guess. However, I suspect the "AI magic" would use detail from other similar images it has been exposed to... thus, it _could_ perhaps show an incorrect tram-registration number.

      *We* know this information isn't correct, but it could be difficult to a lay-person to defend themselves if presented with an apparent photograph that shows, what appears to be, absolute incontrovertible truth!

      Obviously, right now, an "expert" would be required to enhance a photo, and thus they should know what may or may not work. But imagine a future where this tech is embedded into cheap consumer cameras. At that point, I worry that people would believe what they see in a photo that they believe they took.

      Imagine a phone-camera where you have (effective) infinite pinch/zoom!

      Still cool though! ;-)
  • by giampy ( 592646 ) on Monday August 30, 2021 @08:53PM (#61747093) Homepage

    https://scifiinterfaces.com/20... [scifiinterfaces.com]

  • by PeeAitchPee ( 712652 ) on Monday August 30, 2021 @08:56PM (#61747105)
    It's just another Photoshop filter until we see apples-to-apples comparisons of what Google's Upscale AI "thinks" the sharp source looks like versus the actual original (and I'm sure that's how they trained it). I'm sure it's cool and will yield realistic-looking "upscaled" images -- but whether or not they are true to the original is a completely different story. You cannot get a whole fish back out of a blender -- it's always a one-way trip. Sounds like this processing gives you a fish, but not necessarily *your* fish.
    • A very similar fish will do for most applications of these algorithms.
    • The actual paper does show quite a few very good examples.
      https://arxiv.org/pdf/2104.076... [arxiv.org]

      • by noodler ( 724788 )

        And quite some bad ones that have significant differences compared to the reference image.
        None of these flaws are mentioned by the fucking article.

    • Maybe you just can't get the fish back with today's techonology. Who knows what tomorrow will bring? After all, we now know how to unboil an egg [washingtonpost.com]...

    • but whether or not they are true to the original is a completely different story.

      Not really. The overwhelming majority of images people use are not for preserving an official record. Whether some tiny detail is incorrect in a scene because it's made up is completely irrelevant.

      You won't be using an image with this technology in court to identify a suspect (I hope), but for the 99.99% of other people on the planet they are happy with any fish.

    • It's purely a puff piece saying, "Look what real-looking images we can create!," instead of comparing against reality. This is not science.
  • But from a research point of view this is awesome news. This is a very clever approach that should be celebrated with everything AI.
  • by Baron_Yam ( 643147 ) on Monday August 30, 2021 @09:12PM (#61747133)

    There are details in those 'restored' images that are far, far finer than single pixels from the original.

    Look at the forehead on that black woman with the tight braids and tell me honestly that you believe those acne scars were restored accurately by the AI without some form of 'cheating'.

    There is no way to extract that save to have trained your AI off the original image.

    • by ceoyoyo ( 59147 )

      Yes. It's generating realistic looking images. The picture of Geoff Hinton is the one from the ACM award page, it's easy to find the original. The reconstruction has given him some freckles he doesn't have, and done some weird stuff to his eyebrow and earlobe.

      It's really unfortunate "super resolution" caught on for this process, which is really interpolation. Actual super resolution imaging uses multiple images with sub-pixel shifted fields of view to actually reconstruct sub-pixel details.

    • by taustin ( 171655 ) on Monday August 30, 2021 @11:38PM (#61747465) Homepage Journal

      Hence the description "Image generation", not image "enhancement."

    • It's not cheating. The catch is that those acne scars may not have actually existed on the original.
      The model is trained to infer what might be there. That doesn't mean it's right. The result though, is sure to be quite good.
      • by noodler ( 724788 )

        Good but not accurate.
        I just can't wait for this to be used by US law enforcement to 'identify' blurry blobs in the background. It'll be the Darwin awards on a continental scale.

        • Good but not accurate.

          Correct.

          I just can't wait for this to be used by US law enforcement to 'identify' blurry blobs in the background. It'll be the Darwin awards on a continental scale.

          I don't understand what you're afraid of.
          Something like this could never be submitted as evidence in a court.
          However, like a sketch made by a police sketch artist, it could be used to target suspicion. There's nothing new about that.

          • by noodler ( 724788 )

            I don't understand what you're afraid of.

            I'm afraid of this being abused in the future. The algorithm invents information that wasn't there and it does so in a way that make the picture indistinguishable from a real photo. This tech is just begging to be abused to 'prove' things that are not real. The US police already has a history with abusing tech to prosecute innocent people because computer knows all.

            I'm also afraid that this kind of thinking/behavior is spreading across the world.

            Something like this could never be submitted as evidence in a court.

            Far worse things have been submitted as evidence, like plain a

            • by narcc ( 412956 )

              From the actual paper [arxiv.org]:

              SR3 should not be used for any real world super-resolution tasks

              Take some comfort in that.

            • Far worse things have been submitted as evidence, like plain and simple facial recognition outcomes (computer says it so it must be true kind of reasoning) despite laws and regulations supposedly forbidding this.

              Successfully submitted? I'll google that. If so- that's horrible.

    • Have you seen Nvidia's DLSS? It's pretty impressive what can be done.
  • Sorry, I don't care for "photo upscaling".
    • Google does. They want to offer unlimited (or nearly, effectively unlimited) photo storage for Android users with minimal actual storage used. So they'll compress and shrink it to death but "rebuild" it on demand. Just a theory.

  • Can't wait for idiot cops to start thinking this is a flawless silver bullet and use it with facial recognition.

    • by taustin ( 171655 )

      It's been a staple of cop shows on TV for years.

      I've always wanted to see a realistic depiction of that:

      Detective: "Can you enhance that so we can run it through facial rec?"

      Tech: "Sure can, boss. Who do you want it to look like?"

  • Diffusion methods? Already in the GIT master of Darktable 3.7 [github.io]. Doesn't do upscaling (as yet), but can be used for a variety of other things.

  • How does Google usually push products like this to the public? Beta sign up? 1 year trial period with certain user groups? I can't wait to redo my finances old family photos.
  • All of those times they blurred the witness face on TV... Informers, Insiders, Criminals, Police, Government officials.

    And now, they can all be identified.

    This can't be a good thing, but at least we'll know who really knew about the aliens at Area 51.

    • by narcc ( 412956 )

      Yes... and we'll find they all look suspiciously like the stock photo models from the training data...

         

  • While this might improve the visual quality of images it has not seen before, it is unlikely that the added detail will be what was actually there. For some applications, that doesn't matter, for others, it does. Police are not known for their discerning technical acumen.

To err is human, to moo bovine.

Working...