Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Technology

The Dumb Reason Your Fancy Computer Vision App Isn't Working: Exif Orientation (medium.com) 64

Adam Geitgey: Exif metadata is not a native part of the Jpeg file format. It was an afterthought taken from the TIFF file format and tacked onto the Jpeg file format much later. This maintained backwards compatibility with old image viewers, but it meant that some programs never bothered to parse Exif data. Most Python libraries for working with image data like numpy, scipy, TensorFlow, Keras, etc, think of themselves as scientific tools for serious people who work with generic arrays of data. They don't concern themselves with consumer-level problems like automatic image rotation -- even though basically every image in the world captured with a modern camera needs it. This means that when you load an image with almost any Python library, you get the original, unrotated image data. And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data. You might think this problem is limited to Python scripts written by beginners and students, but that's not the case! Even Google's flagship Vision API demo doesn't handle Exif orientation correctly. And while Google Vision still manages to detect some of the animals in the sideways image, it detects them with a non-specific "Animal" label. This is because it is a lot harder for a model to detect a sideways goose than an upright goose.
This discussion has been archived. No new comments can be posted.

The Dumb Reason Your Fancy Computer Vision App Isn't Working: Exif Orientation

Comments Filter:
  • The human brain (the real intelligence) doesn't need the orientation tag which means that this AI is still not "intelligent" at all if it fails to recognize something is not properly rotated and rotate it. And then again the human brain is perfectly capable of recognizing objects even if we see the world upside down.
    • We ~can~ do it, but there's an increased cognitive load, so that means we have to work harder. For some people reading text upside-down is difficult enough that their solution is often to reorient their head to the text. We also need context to ~know~ that the image is upside-down, which is sometimes missing. That's why this video has such an effective reveal: https://www.youtube.com/watch?... [youtube.com]

      • by Viol8 ( 599362 )

        There's no extra cognitive load recognising a dog from a different orientation. Now you may say thats because we've been trained to see dogs from all different viewpoints - but show me a dog I haven't seen before and I'll still recognise it if its turned 90 degrees and thats after seeing ONE picture, not the thousands it takes to train an ANN.

        • There's more load than you think, which is what makes these images so much fun:
          https://www.boredpanda.com/dog... [boredpanda.com]

          Also, keep in mind, you have like a billion times more processing power than these neural networks.

          Keep in mind some animals, such as koala bears, have difficulty recognizing things out of context (they won't recognize food on a plate instead of on a branch). These things require a lot more processing power than you imply they do.

        • There's no extra cognitive load recognising a dog from a different orientation.

          Yes there is. It is slower, and you miss details

          Here is an image [pinimg.com] both rotated and not rotated. Most people don't process it correctly.

          thats after seeing ONE picture, not the thousands it takes to train an ANN.

          Nonsense. You have seen millions of retina-imprints of dogs.

          A proper comparison is with a newborn baby, who has really never seen a dog before.

          The ANN doesn't deal well with rotated images because that was not part of its training set. Include rotated images in the training set, and it will recognize them.

          • Babies only need to see a dog once or twice to recognise Dog. Or teddy etc. They don't need to see it from every conceivable angle first. All parents know this.

            • by shmlco ( 594907 )

              From the second they're born and their eyes are open, babies are being fed images of their environment. They begin to register shapes. There are shapes that stay in place as baby is moved around and there are other shapes that move around and move in front of other shapes and shapes that turn and rotate such that baby can see all sides.

              By the time a baby is six months old they've probably seen at least a half a billion "training" images of their environment. (Assuming a rather slow 60/fps data rate.)A five-

            • by Agripa ( 139780 )

              Babies only need to see a dog once or twice to recognise Dog. Or teddy etc. They don't need to see it from every conceivable angle first. All parents know this.

              Babies have many general and specific processing tasks designed in as "instinct". These processing tasks may require training, but the implementation lowers this requirement considerably.

      • by dgatwood ( 11270 )

        That's why this video has such an effective reveal:...

        No, not really. The main reason it had such an effective reveal is that humans recognize faces primarily by the eyes, which were drawn last, and were partially or completely cut off below the bottom of the screen for most of the time up until the reveal. If it had been drawn with the eyes first, and if the entire image had been visible the whole time, you would have known that it was a face long before the reveal.

      • by jbengt ( 874751 )
        In high school print shop (yes, I'm that old) we had to be able to read the composed type, which is mirror image to the printed text. As long as you rotated it so each line read left-to-right, it was surprisingly easy to do - no training required even though the letters were upside down and mirrored.
        • even though the letters were upside down and mirrored

          Upside down and mirrored is the same as rotated 180 degrees ;)

    • by PPH ( 736903 )

      Is it a Dodge Viper or Daffy Duck [gregspradlin.com]?

      • I kept looking for Darkwing Duck and got moderately perplexed until I realized you said Daffy. Whoops.

    • You'd think that but helicopter troops have to train extensively to fight disorientation in the case of a crash. They are trained to stay in place until they have their bearings and to stay anchored so they don't get lost on the way out of the helicopter. You may think it's hard to get lost in a helicopter but apparently it's enough of a hazard to pay a heap of money to train people how to avoid it. Humans aren't always successful at the identifying objects task in situations of extreme disorientation.
      • You may think it's hard to get lost in a helicopter

        Difficult, yes. But there's a reason for this training. When a helicopter lands hard, there is a possibility the little stones in your ears can get knocked out of place. These help control your balance and orientation.

        If theses stones aren't where they're supposed to be, something as simple as walking upright can be difficult. It would be similar to leaning over, putting your forehead on the top of a baseball bat, spinning around several time
    • All the current neural nets are just very sophisticated statistical analysers, there's no thinking going on whatsoever.

  • The software is doing exactly what you tell it to do. Maybe hire somebody competent to curate your training data.

    • by jrumney ( 197329 )

      Personally, I think any image recognition software that is unable to recognize rotated images is worthless. On the other hand, knowing that I can walk around with my head tilted to avoid pervasive facial recognition is useful information in modern society.

  • This is one of my biggest frustrations when viewing photos taken on an iPhone on a Windows 10 machine. Pics are 90 or 180 degrees off. Annoying as hell.
    • Directly from the phone over USB? I dont have that issue from stuff i pull down from icloud.
    • Thats funny, because the opposite problem happens here. the windows photo (metro) application is one of the only ones to support exif rotation, so people view the file on their win10 machine, it looks correct and then get all upset when some other software is always "rotating" it.

      I just had to explain exif data to someone this week and they really did not understand it at all.

      I have always thought it a stupid mac problem, but with more and more pictures being taken on phones, and i assume all phones now ha

  • by scorp1us ( 235526 ) on Friday October 11, 2019 @01:32PM (#59296924) Journal

    The first time I wrote a an AI system to collect pictures from mobile phones (2014), the *first thing* I did after decrypting it was to apply the EXIF orientation and remove the tag, so there could be no possibility that anywhere down the line it could be displayed wrong. In Node.JS, I used jpeg-autorotate.

    This is a no-brainer. But in theory with enough samples, the AI would learn the various rotations of your hotdog in your hotdog/not-hotdog classifier.

    • the AI would learn the various rotations of your hotdog in your hotdog/not-hotdog classifier

      Sure, if you want to do it the hard way.

      Or you could use fractally connected convolution layers that don't care about orientation or size...

    • So defeat those police cameras by tilting your head as you walk past...

    • by Viol8 ( 599362 )

      "I wrote a an AI system"

      "In Node.JS"

      Nothing to see here, move along please...

  • by JoeDuncan ( 874519 ) on Friday October 11, 2019 @01:33PM (#59296926)

    And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.

    And guess what happens if you're not an idiot, and use fractally connected convolution layers?

    You get scale and rotation invariance because you're not an idiot.

  • Why does my web browser do this on Slashdot?
     
    "but thatâ(TM)s not the case"
     
    It always seems to replace an apostrophe with gibberish on Slashdot.

    • It is not 'your' browser, it is the poster's browser that wants to use UTF-8 everywhere; and that poster proves they aren't a nerd by not knowing it will be a problem on slashdot. The fact that the editor(s) didn't care to fix it in the summary is quite telling of the current state of affairs here.
    • Troll post? It's because Slasdhot doesn't support Unicode (and it shouldn't, honestly), and morons with Apple devices try to post their angled apostrophes and curly quotation marks instead of the superior '.

      • Not a troll post. I just figured it was something on my end because why wouldn't the poster/editors fix it if they saw it too?
         
        At least now I know the problem is caused by the idiots making such posts and not me.

    • by ledow ( 319597 )

      Because the Slash code hasn't been updated in decades.

      SoylentNews, based on modern Slash code, no problem at all.

      This site, whatever setting, or keyboard, or language, or whatever it is, no matter what browser or computer... same crap.

      Don't even try to do things like UK pound signs: £

      Whoever owns Slashdot nowadays doesn't care any more than any of the previous owners.

  • by SuricouRaven ( 1897204 ) on Friday October 11, 2019 @01:35PM (#59296934)

    Rotating an image before encoding isn't exactly hard. It's even possible to rotate an encoded JPEG image in-memory losslessly - the maths works out nicely. The rotation field is a tool of lazy programmers.

    • by mccrew ( 62494 ) on Friday October 11, 2019 @01:44PM (#59296978)

      To software people, nothing is particularly hard. Just code, we say.

      Well, until you start adding limitations. Limited CPU, limited storage, required performance levels for "burst" sequences, and low cost to stay competitive in the marketplace. Noting the proper orientation and shifting the burden to display device - which may have more CPU and RAM - IS a reasonable solution.

      • If your device includes a JPEG encoder, it has more than enough resources for rotation - an operation that requires both less memory and less processing time than encoding a JPEG. You can do it in four lines of C, if you do it before encoding.

        • The data from image sensors is read in scanline order. The data arrives at the image processor in this fixed order, whether you hold the camera horizontally or vertically. Every computationally expensive step the processor needs to do to compress the image data into a JPG can be performed on the streaming data with a buffer that holds just a few scanlines. JPEG is a very old standard. Compression was (and still is) performed by an image processor with limited amounts of on-die RAM, because that stuff is exp
  • by the_skywise ( 189793 ) on Friday October 11, 2019 @01:36PM (#59296936)
    Ok really... "My python image library doesn't handle image rotation properly so I don't know what to doooooooo"
    I swear programming is getting worse and worse...
    • by whoda ( 569082 )

      Exactly. The story behind "anybody can #Learntocode", is teaching them how to copy and merge someone else's code. The majority of them don't actually know how to code.

  • by Improv ( 2467 ) <pgunn01@gmail.com> on Friday October 11, 2019 @01:44PM (#59296980) Homepage Journal

    You're already likely going to be making a bunch of other changes to your input images during ingestion. Contrast correction. Cropping. So on. If this comes up in your line of research and your library isn't invariant to rotation, deal with it at the same time. This is a silly thing to worry about, particularly because it's so trivial to work with.

    • Re: (Score:2, Interesting)

      by HiThere ( 15173 )

      It's not really all that simple. And it doesn't always happen at 90 degree increments. And frequently it's not a 0 degrees WRT the z axis.

      OK, the *exact* problem that the article is about is (relatively) simple. But it's misstating the real problem.

      • by dissy ( 172727 )

        It's not really all that simple. And it doesn't always happen at 90 degree increments. And frequently it's not a 0 degrees WRT the z axis.
        OK, the *exact* problem that the article is about is (relatively) simple. But it's misstating the real problem.

        The problem the article describes is that all the data needed to make those corrections is provided but being ignored.
        The solution would be to not ignore that data and actually use it, which sounds like a very simple solution.

        What is the "real problem" you refer to that is different from that?
        Is missing or incorrect EXIF data a thing?

        Wouldn't it still be massively better to apply the EXIF data when present so at least the vast majority of images are processed correctly?
        Missing/bad EXIF data would be a probl

  • And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.

    Seems to me the detection fails because the detector is a joke. Let me guess - it was an AI neural net trained by carefully hand feeding it?

  • Bad data? (Score:4, Insightful)

    by thegarbz ( 1787294 ) on Friday October 11, 2019 @02:27PM (#59297208)

    How is training a model to detect a face regardless of orientation feeding it "bad data"? If you are building models that only recognise perfect upright faces... THAT is bad data and will produce a bad model.

    • It's a model that will likely be better in by far the common case of upright faces. Do you want to take the performance hit for the 99.9% case so you can do better on the.1%?

      Obviously that depends on your problem. It's not a bad model, just an engineering choice.

    • by jrumney ( 197329 )

      Hw useful is an algortithm that can detect faces that are rotated exactly 90, 180 and 270 degrees, but not 28 degrees? Recognition despite rotation needs to be built into the recognition algorithm, not the training data.

  • by fahrbot-bot ( 874524 ) on Friday October 11, 2019 @03:00PM (#59297374)
    My Exif data self-identifies as "Top, Left Side"
  • ... to avoid facial recognition.

  • With or without Exif, computer vision has still got a long way to go to mimic even the visual capabilities of a poodle.
  • If you're just taking a load of input data and pumping it into your training system without review, you really deserve all the bad results you get, like when your self-driving car swerves to run over cats or tweenagers. If you're training without vetting your input data, you deserve to go bankrupt or to prison.

    Software's going to eat the world, then soon after we'll all die of software errors because a critical mass of developers were idiots.

  • Well depending how you count, you could say there's more than three kinds of jpeg metadata. But some kinds are hard properties like lens data. I'm talking about the soft kinds which are largely editable.

    Some people aren't aware of the non-Exif metadata and some people use "Exif" as a synonym for "metadata" as if IPTC and XMP are part of the Exif stuff.

    Anyway, there's Exif, IPTC, and XMP. There's not much overlap between Exif and IPTC but about everything in those two has a counterpart in XMP. Ideally they s

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...