The Dumb Reason Your Fancy Computer Vision App Isn't Working: Exif Orientation

The Dumb Reason Your Fancy Computer Vision App Isn't Working: Exif Orientation (medium.com) 64

Posted by msmash on Friday October 11, 2019 @02:14PM from the closer-look dept.

Adam Geitgey: Exif metadata is not a native part of the Jpeg file format. It was an afterthought taken from the TIFF file format and tacked onto the Jpeg file format much later. This maintained backwards compatibility with old image viewers, but it meant that some programs never bothered to parse Exif data. Most Python libraries for working with image data like numpy, scipy, TensorFlow, Keras, etc, think of themselves as scientific tools for serious people who work with generic arrays of data. They don't concern themselves with consumer-level problems like automatic image rotation -- even though basically every image in the world captured with a modern camera needs it. This means that when you load an image with almost any Python library, you get the original, unrotated image data. And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data. You might think this problem is limited to Python scripts written by beginners and students, but that's not the case! Even Google's flagship Vision API demo doesn't handle Exif orientation correctly. And while Google Vision still manages to detect some of the animals in the sideways image, it detects them with a non-specific "Animal" label. This is because it is a lot harder for a model to detect a sideways goose than an upright goose.

The Dumb Reason Your Fancy Computer Vision App Isn't Working: Exif Orientation

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 64 Comments Log In/Create an Account

Comments Filter:

Subject (Score:1)

by Artem S. Tashkinov ( 764309 ) writes:

The human brain (the real intelligence) doesn't need the orientation tag which means that this AI is still not "intelligent" at all if it fails to recognize something is not properly rotated and rotate it. And then again the human brain is perfectly capable of recognizing objects even if we see the world upside down.
- Re: (Score:1)
  
  by Dracolytch ( 714699 ) writes:
  
  We ~can~ do it, but there's an increased cognitive load, so that means we have to work harder. For some people reading text upside-down is difficult enough that their solution is often to reorient their head to the text. We also need context to ~know~ that the image is upside-down, which is sometimes missing. That's why this video has such an effective reveal: https://www.youtube.com/watch?... [youtube.com]
  - Re: (Score:3)
    
    by Viol8 ( 599362 ) writes:
    
    There's no extra cognitive load recognising a dog from a different orientation. Now you may say thats because we've been trained to see dogs from all different viewpoints - but show me a dog I haven't seen before and I'll still recognise it if its turned 90 degrees and thats after seeing ONE picture, not the thousands it takes to train an ANN.
    - Re: (Score:1)
      
      by Dracolytch ( 714699 ) writes:
      
      There's more load than you think, which is what makes these images so much fun:
      https://www.boredpanda.com/dog... [boredpanda.com]
      Also, keep in mind, you have like a billion times more processing power than these neural networks.
      Keep in mind some animals, such as koala bears, have difficulty recognizing things out of context (they won't recognize food on a plate instead of on a branch). These things require a lot more processing power than you imply they do.
    - Re: (Score:2)
      
      by ShanghaiBill ( 739463 ) writes:
      
      There's no extra cognitive load recognising a dog from a different orientation.
      Yes there is. It is slower, and you miss details
      Here is an image [pinimg.com] both rotated and not rotated. Most people don't process it correctly.
      thats after seeing ONE picture, not the thousands it takes to train an ANN.
      Nonsense. You have seen millions of retina-imprints of dogs.
      A proper comparison is with a newborn baby, who has really never seen a dog before.
      The ANN doesn't deal well with rotated images because that was not part of its training set. Include rotated images in the training set, and it will recognize them.
      - Re: Subject (Score:3)
        
        by Viol8 ( 599362 ) writes:
        
        Babies only need to see a dog once or twice to recognise Dog. Or teddy etc. They don't need to see it from every conceivable angle first. All parents know this.
        
        Re: (Score:3)
        
        by shmlco ( 594907 ) writes:
        
        From the second they're born and their eyes are open, babies are being fed images of their environment. They begin to register shapes. There are shapes that stay in place as baby is moved around and there are other shapes that move around and move in front of other shapes and shapes that turn and rotate such that baby can see all sides.
        By the time a baby is six months old they've probably seen at least a half a billion "training" images of their environment. (Assuming a rather slow 60/fps data rate.)A five-
        
        Re: (Score:2)
        
        by Agripa ( 139780 ) writes:
        
        Babies only need to see a dog once or twice to recognise Dog. Or teddy etc. They don't need to see it from every conceivable angle first. All parents know this.
        Babies have many general and specific processing tasks designed in as "instinct". These processing tasks may require training, but the implementation lowers this requirement considerably.
  - Re: (Score:3)
    
    by dgatwood ( 11270 ) writes:
    
    That's why this video has such an effective reveal:...
    No, not really. The main reason it had such an effective reveal is that humans recognize faces primarily by the eyes, which were drawn last, and were partially or completely cut off below the bottom of the screen for most of the time up until the reveal. If it had been drawn with the eyes first, and if the entire image had been visible the whole time, you would have known that it was a face long before the reveal.
  - Re: (Score:2)
    
    by jbengt ( 874751 ) writes:
    
    In high school print shop (yes, I'm that old) we had to be able to read the composed type, which is mirror image to the printed text. As long as you rotated it so each line read left-to-right, it was surprisingly easy to do - no training required even though the letters were upside down and mirrored.
    - Re: (Score:1)
      
      by hvidstue ( 1260682 ) writes:
      
      even though the letters were upside down and mirrored
      Upside down and mirrored is the same as rotated 180 degrees ;)
- Re: (Score:3)
  
  by PPH ( 736903 ) writes:
  
  Is it a Dodge Viper or Daffy Duck [gregspradlin.com]?
  - Re: (Score:2)
    
    by weilawei ( 897823 ) writes:
    
    I kept looking for Darkwing Duck and got moderately perplexed until I realized you said Daffy. Whoops.
- Re: (Score:1)
  
  by Linkreincarnate ( 840046 ) writes:
  
  You'd think that but helicopter troops have to train extensively to fight disorientation in the case of a crash. They are trained to stay in place until they have their bearings and to stay anchored so they don't get lost on the way out of the helicopter. You may think it's hard to get lost in a helicopter but apparently it's enough of a hazard to pay a heap of money to train people how to avoid it. Humans aren't always successful at the identifying objects task in situations of extreme disorientation.
  - Re: (Score:2)
    
    by smooth wombat ( 796938 ) writes:
    
    You may think it's hard to get lost in a helicopter
    
    Difficult, yes. But there's a reason for this training. When a helicopter lands hard, there is a possibility the little stones in your ears can get knocked out of place. These help control your balance and orientation.
    
    If theses stones aren't where they're supposed to be, something as simple as walking upright can be difficult. It would be similar to leaning over, putting your forehead on the top of a baseball bat, spinning around several time
- Thats because its not AI (Score:3)
  
  by Viol8 ( 599362 ) writes:
  
  All the current neural nets are just very sophisticated statistical analysers, there's no thinking going on whatsoever.
That doesn't seem like a problem with the software (Score:3)

by OverlordQ ( 264228 ) writes: on Friday October 11, 2019 @02:25PM (#59296906) Journal

The software is doing exactly what you tell it to do. Maybe hire somebody competent to curate your training data.

- Re: (Score:2)
  
  by jrumney ( 197329 ) writes:
  
  Personally, I think any image recognition software that is unable to recognize rotated images is worthless. On the other hand, knowing that I can walk around with my head tilted to avoid pervasive facial recognition is useful information in modern society.
iPhone pics on Windoze (Score:2)

by mccrew ( 62494 ) writes:

This is one of my biggest frustrations when viewing photos taken on an iPhone on a Windows 10 machine. Pics are 90 or 180 degrees off. Annoying as hell.
- - Re: (Score:3)
    
    by mccrew ( 62494 ) writes:
    
    Ha! How very Slashdot of you. Your advice is worth at least double what I paid for it. Makes me wonder why great minds like yours aren't solving our big problems of the day!
- Re: (Score:2)
  
  by spire3661 ( 1038968 ) writes:
  
  Directly from the phone over USB? I dont have that issue from stuff i pull down from icloud.
- Re: (Score:2)
  
  by citylivin ( 1250770 ) writes:
  
  Thats funny, because the opposite problem happens here. the windows photo (metro) application is one of the only ones to support exif rotation, so people view the file on their win10 machine, it looks correct and then get all upset when some other software is always "rotating" it.
  I just had to explain exif data to someone this week and they really did not understand it at all.
  I have always thought it a stupid mac problem, but with more and more pictures being taken on phones, and i assume all phones now ha
This is pretty basic stuff (Score:3)

by scorp1us ( 235526 ) writes: on Friday October 11, 2019 @02:32PM (#59296924) Journal

The first time I wrote a an AI system to collect pictures from mobile phones (2014), the *first thing* I did after decrypting it was to apply the EXIF orientation and remove the tag, so there could be no possibility that anywhere down the line it could be displayed wrong. In Node.JS, I used jpeg-autorotate.
This is a no-brainer. But in theory with enough samples, the AI would learn the various rotations of your hotdog in your hotdog/not-hotdog classifier.

- Re: (Score:2)
  
  by JoeDuncan ( 874519 ) writes:
  
  the AI would learn the various rotations of your hotdog in your hotdog/not-hotdog classifier
  Sure, if you want to do it the hard way.
  Or you could use fractally connected convolution layers that don't care about orientation or size...
- Re: (Score:2)
  
  by Darinbob ( 1142669 ) writes:
  
  So defeat those police cameras by tilting your head as you walk past...
- Re: (Score:2)
  
  by Viol8 ( 599362 ) writes:
  
  "I wrote a an AI system"
  "In Node.JS"
  Nothing to see here, move along please...
This is what convolution layers are for... (Score:5, Insightful)

by JoeDuncan ( 874519 ) writes: on Friday October 11, 2019 @02:33PM (#59296926)

And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.
And guess what happens if you're not an idiot, and use fractally connected convolution layers?
You get scale and rotation invariance because you're not an idiot.

Off-topic question (Score:1)

by An0nYm0u5c0wArD ( 6251996 ) writes:

Why does my web browser do this on Slashdot?

"but thatâ(TM)s not the case"

It always seems to replace an apostrophe with gibberish on Slashdot.
- Re: (Score:2)
  
  by Fly Swatter ( 30498 ) writes:
  
  It is not 'your' browser, it is the poster's browser that wants to use UTF-8 everywhere; and that poster proves they aren't a nerd by not knowing it will be a problem on slashdot. The fact that the editor(s) didn't care to fix it in the summary is quite telling of the current state of affairs here.
- Re: (Score:2)
  
  by sexconker ( 1179573 ) writes:
  
  Troll post? It's because Slasdhot doesn't support Unicode (and it shouldn't, honestly), and morons with Apple devices try to post their angled apostrophes and curly quotation marks instead of the superior '.
  - Re: (Score:1)
    
    by An0nYm0u5c0wArD ( 6251996 ) writes:
    
    Not a troll post. I just figured it was something on my end because why wouldn't the poster/editors fix it if they saw it too?
    
    At least now I know the problem is caused by the idiots making such posts and not me.
- - Re: (Score:1)
    
    by An0nYm0u5c0wArD ( 6251996 ) writes:
    
    Yeah, except it's not things I type. I would actually pay money to have someone come take away any Apple devices that were left at my house.
- Re: (Score:2)
  
  by ledow ( 319597 ) writes:
  
  Because the Slash code hasn't been updated in decades.
  SoylentNews, based on modern Slash code, no problem at all.
  This site, whatever setting, or keyboard, or language, or whatever it is, no matter what browser or computer... same crap.
  Don't even try to do things like UK pound signs: Â£
  Whoever owns Slashdot nowadays doesn't care any more than any of the previous owners.
The rotation was always pointless. (Score:3)

by SuricouRaven ( 1897204 ) writes: on Friday October 11, 2019 @02:35PM (#59296934)

Rotating an image before encoding isn't exactly hard. It's even possible to rotate an encoded JPEG image in-memory losslessly - the maths works out nicely. The rotation field is a tool of lazy programmers.

- Re:The rotation was always pointless. (Score:5, Insightful)
  
  by mccrew ( 62494 ) writes: on Friday October 11, 2019 @02:44PM (#59296978)
  
  To software people, nothing is particularly hard. Just code, we say.
  Well, until you start adding limitations. Limited CPU, limited storage, required performance levels for "burst" sequences, and low cost to stay competitive in the marketplace. Noting the proper orientation and shifting the burden to display device - which may have more CPU and RAM - IS a reasonable solution.
  
  - Re: (Score:2)
    
    by SuricouRaven ( 1897204 ) writes:
    
    If your device includes a JPEG encoder, it has more than enough resources for rotation - an operation that requires both less memory and less processing time than encoding a JPEG. You can do it in four lines of C, if you do it before encoding.
    - Re: (Score:2)
      
      by Nick's new Name ( 6306300 ) writes:
      
      The data from image sensors is read in scanline order. The data arrives at the image processor in this fixed order, whether you hold the camera horizontally or vertically. Every computationally expensive step the processor needs to do to compress the image data into a JPG can be performed on the streaming data with a buffer that holds just a few scanlines. JPEG is a very old standard. Compression was (and still is) performed by an image processor with limited amounts of on-die RAM, because that stuff is exp
Facepalm (Score:3)

by the_skywise ( 189793 ) writes: on Friday October 11, 2019 @02:36PM (#59296936)

Ok really... "My python image library doesn't handle image rotation properly so I don't know what to doooooooo"
I swear programming is getting worse and worse...

- Re: (Score:3)
  
  by whoda ( 569082 ) writes:
  
  Exactly. The story behind "anybody can #Learntocode", is teaching them how to copy and merge someone else's code. The majority of them don't actually know how to code.
If you're doing this, handle it in ingestion code (Score:4, Interesting)

by Improv ( 2467 ) writes: <pgunn01@gmail.com> on Friday October 11, 2019 @02:44PM (#59296980) Homepage Journal

You're already likely going to be making a bunch of other changes to your input images during ingestion. Contrast correction. Cropping. So on. If this comes up in your line of research and your library isn't invariant to rotation, deal with it at the same time. This is a silly thing to worry about, particularly because it's so trivial to work with.

- Re: (Score:2, Interesting)
  
  by HiThere ( 15173 ) writes:
  
  It's not really all that simple. And it doesn't always happen at 90 degree increments. And frequently it's not a 0 degrees WRT the z axis.
  OK, the *exact* problem that the article is about is (relatively) simple. But it's misstating the real problem.
  - Re: (Score:2)
    
    by dissy ( 172727 ) writes:
    
    It's not really all that simple. And it doesn't always happen at 90 degree increments. And frequently it's not a 0 degrees WRT the z axis.
    OK, the *exact* problem that the article is about is (relatively) simple. But it's misstating the real problem.
    The problem the article describes is that all the data needed to make those corrections is provided but being ignored.
    The solution would be to not ignore that data and actually use it, which sounds like a very simple solution.
    What is the "real problem" you refer to that is different from that?
    Is missing or incorrect EXIF data a thing?
    Wouldn't it still be massively better to apply the EXIF data when present so at least the vast majority of images are processed correctly?
    Missing/bad EXIF data would be a probl
jhead -autorot *.jpg (Score:1)

by Nick's new Name ( 6306300 ) writes:

You're welcome.
Pathetic (Score:2)

by sexconker ( 1179573 ) writes:

And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.
Seems to me the detection fails because the detector is a joke. Let me guess - it was an AI neural net trained by carefully hand feeding it?
Bad data? (Score:4, Insightful)

by thegarbz ( 1787294 ) writes: on Friday October 11, 2019 @03:27PM (#59297208)

How is training a model to detect a face regardless of orientation feeding it "bad data"? If you are building models that only recognise perfect upright faces... THAT is bad data and will produce a bad model.

- Re: (Score:3)
  
  by serviscope_minor ( 664417 ) writes:
  
  It's a model that will likely be better in by far the common case of upright faces. Do you want to take the performance hit for the 99.9% case so you can do better on the.1%?
  Obviously that depends on your problem. It's not a bad model, just an engineering choice.
- Re: (Score:2)
  
  by jrumney ( 197329 ) writes:
  
  Hw useful is an algortithm that can detect faces that are rotated exactly 90, 180 and 270 degrees, but not 28 degrees? Recognition despite rotation needs to be built into the recognition algorithm, not the training data.
Exif Orientation (Score:3)

by fahrbot-bot ( 874524 ) writes: on Friday October 11, 2019 @04:00PM (#59297374)

My Exif data self-identifies as "Top, Left Side"

I'm going to be looking landscape ... (Score:2)

by CaptainDork ( 3678879 ) writes:

... to avoid facial recognition.
BS (Score:2)

by OneHundredAndTen ( 1523865 ) writes:

With or without Exif, computer vision has still got a long way to go to mimic even the visual capabilities of a poodle.
YOU MUST REVIEW YOUR TRAINING DATA. (Score:2)

by shess ( 31691 ) writes:

If you're just taking a load of input data and pumping it into your training system without review, you really deserve all the bad results you get, like when your self-driving car swerves to run over cats or tweenagers. If you're training without vetting your input data, you deserve to go bankrupt or to prison.
Software's going to eat the world, then soon after we'll all die of software errors because a critical mass of developers were idiots.
Exif is only one of three types of jpeg metadata (Score:2)

by dos equis ( 21086 ) writes:

Well depending how you count, you could say there's more than three kinds of jpeg metadata. But some kinds are hard properties like lens data. I'm talking about the soft kinds which are largely editable.
Some people aren't aware of the non-Exif metadata and some people use "Exif" as a synonym for "metadata" as if IPTC and XMP are part of the Exif stuff.
Anyway, there's Exif, IPTC, and XMP. There's not much overlap between Exif and IPTC but about everything in those two has a counterpart in XMP. Ideally they s

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Subject (Score:1)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: Subject (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Thats because its not AI (Score:3)

That doesn't seem like a problem with the software (Score:3)

Re: (Score:2)

iPhone pics on Windoze (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

This is pretty basic stuff (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

This is what convolution layers are for... (Score:5, Insightful)

Off-topic question (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

The rotation was always pointless. (Score:3)

Re:The rotation was always pointless. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Facepalm (Score:3)

Re: (Score:3)

If you're doing this, handle it in ingestion code (Score:4, Interesting)

Re: (Score:2, Interesting)

Re: (Score:2)

jhead -autorot *.jpg (Score:1)

Pathetic (Score:2)

Bad data? (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Exif Orientation (Score:3)

I'm going to be looking landscape ... (Score:2)

BS (Score:2)

YOU MUST REVIEW YOUR TRAINING DATA. (Score:2)

Exif is only one of three types of jpeg metadata (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals