The Dumb Reason Your Fancy Computer Vision App Isn't Working: Exif Orientation (medium.com) 64
Adam Geitgey: Exif metadata is not a native part of the Jpeg file format. It was an afterthought taken from the TIFF file format and tacked onto the Jpeg file format much later. This maintained backwards compatibility with old image viewers, but it meant that some programs never bothered to parse Exif data. Most Python libraries for working with image data like numpy, scipy, TensorFlow, Keras, etc, think of themselves as scientific tools for serious people who work with generic arrays of data. They don't concern themselves with consumer-level problems like automatic image rotation -- even though basically every image in the world captured with a modern camera needs it. This means that when you load an image with almost any Python library, you get the original, unrotated image data. And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data. You might think this problem is limited to Python scripts written by beginners and students, but that's not the case! Even Google's flagship Vision API demo doesn't handle Exif orientation correctly. And while Google Vision still manages to detect some of the animals in the sideways image, it detects them with a non-specific "Animal" label. This is because it is a lot harder for a model to detect a sideways goose than an upright goose.
Subject (Score:1)
Re: (Score:1)
We ~can~ do it, but there's an increased cognitive load, so that means we have to work harder. For some people reading text upside-down is difficult enough that their solution is often to reorient their head to the text. We also need context to ~know~ that the image is upside-down, which is sometimes missing. That's why this video has such an effective reveal: https://www.youtube.com/watch?... [youtube.com]
Re: (Score:3)
There's no extra cognitive load recognising a dog from a different orientation. Now you may say thats because we've been trained to see dogs from all different viewpoints - but show me a dog I haven't seen before and I'll still recognise it if its turned 90 degrees and thats after seeing ONE picture, not the thousands it takes to train an ANN.
Re: (Score:1)
There's more load than you think, which is what makes these images so much fun:
https://www.boredpanda.com/dog... [boredpanda.com]
Also, keep in mind, you have like a billion times more processing power than these neural networks.
Keep in mind some animals, such as koala bears, have difficulty recognizing things out of context (they won't recognize food on a plate instead of on a branch). These things require a lot more processing power than you imply they do.
Re: (Score:2)
There's no extra cognitive load recognising a dog from a different orientation.
Yes there is. It is slower, and you miss details
Here is an image [pinimg.com] both rotated and not rotated. Most people don't process it correctly.
thats after seeing ONE picture, not the thousands it takes to train an ANN.
Nonsense. You have seen millions of retina-imprints of dogs.
A proper comparison is with a newborn baby, who has really never seen a dog before.
The ANN doesn't deal well with rotated images because that was not part of its training set. Include rotated images in the training set, and it will recognize them.
Re: Subject (Score:3)
Babies only need to see a dog once or twice to recognise Dog. Or teddy etc. They don't need to see it from every conceivable angle first. All parents know this.
Re: (Score:3)
From the second they're born and their eyes are open, babies are being fed images of their environment. They begin to register shapes. There are shapes that stay in place as baby is moved around and there are other shapes that move around and move in front of other shapes and shapes that turn and rotate such that baby can see all sides.
By the time a baby is six months old they've probably seen at least a half a billion "training" images of their environment. (Assuming a rather slow 60/fps data rate.)A five-
Re: (Score:2)
Babies only need to see a dog once or twice to recognise Dog. Or teddy etc. They don't need to see it from every conceivable angle first. All parents know this.
Babies have many general and specific processing tasks designed in as "instinct". These processing tasks may require training, but the implementation lowers this requirement considerably.
Re: (Score:3)
No, not really. The main reason it had such an effective reveal is that humans recognize faces primarily by the eyes, which were drawn last, and were partially or completely cut off below the bottom of the screen for most of the time up until the reveal. If it had been drawn with the eyes first, and if the entire image had been visible the whole time, you would have known that it was a face long before the reveal.
Re: (Score:2)
Re: (Score:1)
even though the letters were upside down and mirrored
Upside down and mirrored is the same as rotated 180 degrees ;)
Re: (Score:3)
Is it a Dodge Viper or Daffy Duck [gregspradlin.com]?
Re: (Score:2)
I kept looking for Darkwing Duck and got moderately perplexed until I realized you said Daffy. Whoops.
Re: (Score:1)
Re: (Score:2)
Difficult, yes. But there's a reason for this training. When a helicopter lands hard, there is a possibility the little stones in your ears can get knocked out of place. These help control your balance and orientation.
If theses stones aren't where they're supposed to be, something as simple as walking upright can be difficult. It would be similar to leaning over, putting your forehead on the top of a baseball bat, spinning around several time
Thats because its not AI (Score:3)
All the current neural nets are just very sophisticated statistical analysers, there's no thinking going on whatsoever.
That doesn't seem like a problem with the software (Score:3)
The software is doing exactly what you tell it to do. Maybe hire somebody competent to curate your training data.
Re: (Score:2)
Personally, I think any image recognition software that is unable to recognize rotated images is worthless. On the other hand, knowing that I can walk around with my head tilted to avoid pervasive facial recognition is useful information in modern society.
iPhone pics on Windoze (Score:2)
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Thats funny, because the opposite problem happens here. the windows photo (metro) application is one of the only ones to support exif rotation, so people view the file on their win10 machine, it looks correct and then get all upset when some other software is always "rotating" it.
I just had to explain exif data to someone this week and they really did not understand it at all.
I have always thought it a stupid mac problem, but with more and more pictures being taken on phones, and i assume all phones now ha
This is pretty basic stuff (Score:3)
The first time I wrote a an AI system to collect pictures from mobile phones (2014), the *first thing* I did after decrypting it was to apply the EXIF orientation and remove the tag, so there could be no possibility that anywhere down the line it could be displayed wrong. In Node.JS, I used jpeg-autorotate.
This is a no-brainer. But in theory with enough samples, the AI would learn the various rotations of your hotdog in your hotdog/not-hotdog classifier.
Re: (Score:2)
the AI would learn the various rotations of your hotdog in your hotdog/not-hotdog classifier
Sure, if you want to do it the hard way.
Or you could use fractally connected convolution layers that don't care about orientation or size...
Re: (Score:2)
So defeat those police cameras by tilting your head as you walk past...
Re: (Score:2)
"I wrote a an AI system"
"In Node.JS"
Nothing to see here, move along please...
This is what convolution layers are for... (Score:5, Insightful)
And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.
And guess what happens if you're not an idiot, and use fractally connected convolution layers?
You get scale and rotation invariance because you're not an idiot.
Off-topic question (Score:1)
Why does my web browser do this on Slashdot?
"but thatâ(TM)s not the case"
It always seems to replace an apostrophe with gibberish on Slashdot.
Re: (Score:2)
Re: (Score:2)
Troll post? It's because Slasdhot doesn't support Unicode (and it shouldn't, honestly), and morons with Apple devices try to post their angled apostrophes and curly quotation marks instead of the superior '.
Re: (Score:1)
Not a troll post. I just figured it was something on my end because why wouldn't the poster/editors fix it if they saw it too?
At least now I know the problem is caused by the idiots making such posts and not me.
Re: (Score:1)
Yeah, except it's not things I type. I would actually pay money to have someone come take away any Apple devices that were left at my house.
Re: (Score:2)
Because the Slash code hasn't been updated in decades.
SoylentNews, based on modern Slash code, no problem at all.
This site, whatever setting, or keyboard, or language, or whatever it is, no matter what browser or computer... same crap.
Don't even try to do things like UK pound signs: £
Whoever owns Slashdot nowadays doesn't care any more than any of the previous owners.
The rotation was always pointless. (Score:3)
Rotating an image before encoding isn't exactly hard. It's even possible to rotate an encoded JPEG image in-memory losslessly - the maths works out nicely. The rotation field is a tool of lazy programmers.
Re:The rotation was always pointless. (Score:5, Insightful)
To software people, nothing is particularly hard. Just code, we say.
Well, until you start adding limitations. Limited CPU, limited storage, required performance levels for "burst" sequences, and low cost to stay competitive in the marketplace. Noting the proper orientation and shifting the burden to display device - which may have more CPU and RAM - IS a reasonable solution.
Re: (Score:2)
If your device includes a JPEG encoder, it has more than enough resources for rotation - an operation that requires both less memory and less processing time than encoding a JPEG. You can do it in four lines of C, if you do it before encoding.
Re: (Score:2)
Facepalm (Score:3)
I swear programming is getting worse and worse...
Re: (Score:3)
Exactly. The story behind "anybody can #Learntocode", is teaching them how to copy and merge someone else's code. The majority of them don't actually know how to code.
If you're doing this, handle it in ingestion code (Score:4, Interesting)
You're already likely going to be making a bunch of other changes to your input images during ingestion. Contrast correction. Cropping. So on. If this comes up in your line of research and your library isn't invariant to rotation, deal with it at the same time. This is a silly thing to worry about, particularly because it's so trivial to work with.
Re: (Score:2, Interesting)
It's not really all that simple. And it doesn't always happen at 90 degree increments. And frequently it's not a 0 degrees WRT the z axis.
OK, the *exact* problem that the article is about is (relatively) simple. But it's misstating the real problem.
Re: (Score:2)
It's not really all that simple. And it doesn't always happen at 90 degree increments. And frequently it's not a 0 degrees WRT the z axis.
OK, the *exact* problem that the article is about is (relatively) simple. But it's misstating the real problem.
The problem the article describes is that all the data needed to make those corrections is provided but being ignored.
The solution would be to not ignore that data and actually use it, which sounds like a very simple solution.
What is the "real problem" you refer to that is different from that?
Is missing or incorrect EXIF data a thing?
Wouldn't it still be massively better to apply the EXIF data when present so at least the vast majority of images are processed correctly?
Missing/bad EXIF data would be a probl
jhead -autorot *.jpg (Score:1)
Pathetic (Score:2)
And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.
Seems to me the detection fails because the detector is a joke. Let me guess - it was an AI neural net trained by carefully hand feeding it?
Bad data? (Score:4, Insightful)
How is training a model to detect a face regardless of orientation feeding it "bad data"? If you are building models that only recognise perfect upright faces... THAT is bad data and will produce a bad model.
Re: (Score:3)
It's a model that will likely be better in by far the common case of upright faces. Do you want to take the performance hit for the 99.9% case so you can do better on the.1%?
Obviously that depends on your problem. It's not a bad model, just an engineering choice.
Re: (Score:2)
Hw useful is an algortithm that can detect faces that are rotated exactly 90, 180 and 270 degrees, but not 28 degrees? Recognition despite rotation needs to be built into the recognition algorithm, not the training data.
Exif Orientation (Score:3)
I'm going to be looking landscape ... (Score:2)
... to avoid facial recognition.
BS (Score:2)
YOU MUST REVIEW YOUR TRAINING DATA. (Score:2)
If you're just taking a load of input data and pumping it into your training system without review, you really deserve all the bad results you get, like when your self-driving car swerves to run over cats or tweenagers. If you're training without vetting your input data, you deserve to go bankrupt or to prison.
Software's going to eat the world, then soon after we'll all die of software errors because a critical mass of developers were idiots.
Exif is only one of three types of jpeg metadata (Score:2)
Well depending how you count, you could say there's more than three kinds of jpeg metadata. But some kinds are hard properties like lens data. I'm talking about the soft kinds which are largely editable.
Some people aren't aware of the non-Exif metadata and some people use "Exif" as a synonym for "metadata" as if IPTC and XMP are part of the Exif stuff.
Anyway, there's Exif, IPTC, and XMP. There's not much overlap between Exif and IPTC but about everything in those two has a counterpart in XMP. Ideally they s