An Advance In Image Recognition Software

An Advance In Image Recognition Software 81

Posted by kdawson on Saturday May 24, 2008 @08:45PM from the needle-in-a-haystack dept.

Roland Piquepaille alerts us to work by US and Israeli researchers who have developed software that can identify the subject of an image characterized using only 256 to 1024 bits of data. The researchers said this "could lead to great advances in the automated identification of online images and, ultimately, provide a basis for computers to see like humans do." As an example, they've picked up about 13 million images from the Web and stored them in a searchable database of just 600 MB, making it possible to search for similar pictures through millions of images in less than a second on a typical PC. The lead researcher, MIT's Antonio Torralba, will be presenting the research next month at a conference on Computer Vision and Pattern Recognition.

An Advance In Image Recognition Software

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 81 Comments Log In/Create an Account

Comments Filter:

There goes the neighborhood (Score:4, Funny)

by HeavensBlade23 ( 946140 ) writes: on Saturday May 24, 2008 @08:48PM (#23532724)

This will be used to break CAPTCHA-type schemes even worse than they already are.

- Re: (Score:2)
  
  by somersault ( 912633 ) writes:
  
  I had thought it would be more along the lines of categorising pictures into 'PORN' and 'NOT PORN'. Or possibly even 'PORN', 'GAY PORN', 'SHEEP PORN' and 'NOT PORN' if it's really advanced.
- Re:There goes the neighborhood (Score:5, Insightful)
  
  by Jeremiah Cornelius ( 137 ) * writes: on Saturday May 24, 2008 @08:57PM (#23532792) Homepage Journal
  
  This will be used to identify YOU, citizen.
  
  - Re:There goes the neighborhood (Score:5, Funny)
    
    by dbIII ( 701233 ) writes: on Saturday May 24, 2008 @09:41PM (#23533048)
    
    Looks like I'll have to stop attaching my low bit facial image as a signature.
    
    - Re: (Score:1)
      
      by tsjaikdus ( 940791 ) writes:
      
      Looks like I'll have to stop attaching my low bit facial image as a signature.
      No, it doesn't work with strange and unusual images.
  - Re: (Score:1)
    
    by slamden ( 104718 ) writes:
    
    There was a very interesting article [rollingstone.com] in Rolling Stone recently regarding China's use of such technology in their ermerging hyper-surveillance society.
- Re:There goes the neighborhood (Score:5, Informative)
  
  by elnico ( 1290430 ) writes: on Saturday May 24, 2008 @09:22PM (#23532926)
  
  Incorrect. First of all, in a CAPTCHA, you're trying to very rigorously inspect a single image. This advance seems to be more about taking quick glances at lots of images. Furthermore, in the article, they talk about recognizing flowers and cars. The fact is, computers already have no problem recognizing letters and numbers in images. We got that down a long time ago. The difficult things about reading a CAPTCHA image are removing distortion and splitting the whole image into the component characters. If you read the article, you'd see that this research has nothing to do with that.
  
  - Re: (Score:1)
    
    by wertigon ( 1204486 ) writes:
    
    So in other words it just killed the Kitten... Auth?
  - Re: (Score:2)
    
    by EMeta ( 860558 ) writes:
    
    Yes, but as you say, letters and numbers are getting trivial, so we will need captchas outside of that. "Click the pictures with people out of the following set of 20 pictures" could have been a useful method.
- - - Re: (Score:1)
      
      by iminplaya ( 723125 ) writes:
      
      Works better if you have good smoke
      - Re: (Score:1, Offtopic)
        
        by Jeremiah Cornelius ( 137 ) * writes:
        
        Jump down, turn around,
        Pick a bale of hay.
tests (Score:1, Funny)

by nawcom ( 941663 ) writes:

thank god, now i can get some assistance when i'm taking one of those "real tits or fake tits" online quizes. fapfapafap.
- Re: (Score:1)
  
  by damaki ( 997243 ) writes:
  
  I have never had the hang on this online stuff. I'm too touchy.
Like every other "advance" in image recognition... (Score:3, Insightful)

by ZxCv ( 6138 ) writes: on Saturday May 24, 2008 @08:55PM (#23532786) Homepage

...I'll believe it when I see it.

Until then, it's snake oil, as far as I'm concerned.

- Re:Like every other "advance" in image recognition (Score:5, Funny)
  
  by Jeremiah Cornelius ( 137 ) * writes: on Saturday May 24, 2008 @09:03PM (#23532822) Homepage Journal
  
  If I read you correctly - and I think I do... You mean to say that snake oil is somehow... invisible?
  
  No wonder those snakes are not only so quiet, but I never even see 'em coming!
  
  Geez. We don't stand a chance.
  
  - Re: (Score:1)
    
    by BlueshiftVFX ( 1158033 ) writes:
    
    No, I think the parent is making a joke, something about recognizing the advance in recognition, when the advances become recognizably advanced or apparent... I reckon.
- Re:Like every other "advance" in image recognition (Score:5, Funny)
  
  by Yvan256 ( 722131 ) writes: on Saturday May 24, 2008 @09:41PM (#23533046) Homepage Journal
  
  Or rather... I'll believe it when it sees me!
  
- Re:Like every other "advance" in image recognition (Score:3, Informative)
  
  by Spikeman56 ( 543509 ) writes:
  
  The actual paper is at http://people.csail.mit.edu/torralba/publications/nipsRecognitionBySceneAlignment.pdf [mit.edu]
  
  From what I can tell, it's basically, "blur the image down to only a few hundred pixels and then you have less data to comb through!"
  - Re: (Score:1)
    
    by clichescreenname ( 1220316 ) writes:
    
    The PDF linked to is named "nipsrecognitionbysciencealignment.pdf"
    
    So... the are using it to identify nipples?
- Re:Like every other "advance" in image recognition (Score:3, Interesting)
  
  by Concerned Onlooker ( 473481 ) writes:
  
  Yep, I second that. The article is really short on details. Not surprising since they're presenting it at a conference next month. We don't even know what kind of features they are extracting from the images. Are they using wavelets? Texture descriptors? Color information? Shape recognition? It sounds like a combination of true content based image recognition with keyword input association if I read the article correctly.
  
  If they are claiming to have a general image recognition algorithm that'd be
  - Re:Like every other "advance" in image recognition (Score:5, Informative)
    
    by Hays ( 409837 ) writes: on Saturday May 24, 2008 @11:27PM (#23533361)
    
    Read the papers then
    http://people.csail.mit.edu/torralba/tinyimages/ [mit.edu]
    
    - Re: (Score:1)
      
      by clichescreenname ( 1220316 ) writes:
      
      Read the papers then
      http://people.csail.mit.edu/torralba/tinyimages/ [mit.edu]
      Anyone actually look through any of these? I've noticed some... imperfections. More instance, under "greenweed" you can find a picture of a dog on a map of the US. There's also a few pictures that are actually green weeds, but not enough to counter out these two exceptions.
- Another Roland Piquepaille story on Slashdot (Score:2)
  
  by Futurepower(R) ( 558542 ) writes:
  
  Another Roland Piquepaille story on Slashdot. He is paid to get stories on the internet. Does he pay Slashdot?
- Re: (Score:1)
  
  by linhares ( 1241614 ) writes:
  
  ...I'll believe it when I see it.
  Exactly. This is a routine rite of passage for all Pattern Recognition researchers: when they need to justify more funding, they claim to be "just around the corner", hence we have this type of hyperbole. The paper is ridiculously simplistic, and does not deserve /. attention. Before we have this tech, we will have to solve Bongard problems [scribd.com]. Machines can't even distinguish the content between two sets of B&W (binary) images with simple figures. Solve Bongard problems, and you'll be just around t
Oh thats really simple to do... (Score:2)

by 3seas ( 184403 ) writes:

.... the answer is always "a picture"
Of course it helps if you read the papers... (Score:4, Informative)

by Steve Mitchell ( 3457 ) writes: <steve@NOSPAM.componica.com> on Saturday May 24, 2008 @09:08PM (#23532852) Homepage

I hate reading press releases of reading papers with real explanations of what's going on.

I just finished reading "Small Codes and Large Image Databases for Recognition" written by the guy. All he did was implemented Geoff Hinton's idea of databasing images with a binarized coefficients produced by Restricted Boltzmann Machines.

Hinton himself gave a talk on it for Google here:
http://www.youtube.com/watch?v=AyzOUbkUf3M [youtube.com]

Actually I'm wondering, is he plagiarizing Hinton?

- Re: (Score:3, Insightful)
  
  by samkass ( 174571 ) writes:
  
  I think plagiarizing is a strong word to throw around. And particular implementations of general approaches can often be very interesting when one considers what tradeoffs are made transferring pure theory to practical applications. If this sort of thing were attempted in the 90's, they'd probably arbitrarily pick a few hundred features by hand and KL-transform it down to the most significant dimensions and hash those into one of these codes. Since I've been out of "the biz" for awhile, it's pretty inter
- Re:Of course it helps if you read the papers... (Score:4, Insightful)
  
  by Hays ( 409837 ) writes: on Saturday May 24, 2008 @11:17PM (#23533305)
  
  Jeff Hinton worked with them, you really think they're plagiarizing him? That claim doesn't even make sense, this is a novel research domain. A big part of science is taking people's ideas, reproducing them, and applying them to novel domains. That's how it's SUPPOSED to work.
  
  This research involves the use of one of the largest image databases seen in computer vision. It shows that you can do extremely rapid scene matching for databases of this scale. No, that's not obvious no matter what you think. This image data is fairly high dimensional.
  
  This research says something about the space of likely scenes and it might be a key enabling technology to a lot of the heavily data driven computer vision and computer graphics approaches popping up lately.
  
  - Re: (Score:2)
    
    by Steve Mitchell ( 3457 ) writes:
    
    My mistake...I just saw the video where Hinton was suggesting to Google, tongue-in-cheek, to use his RBM bottleneck trick as a method of databasing and then to see this guy's paper mentioning the very same thing a few weeks later.
    
    It was a skim to see what the hell the article was really about, didn't know these two were connected. I jumped the gun 'cause I got burned by a plagiarizer in the past, sorry.
- Re: (Score:1)
  
  by atamagabakkaomae ( 1241604 ) writes:
  
  Agree with parent, before reading the actual paper the press release is kinda useless.
  
  Also the example shown in the article does not really make sense for me. I mean of course if we look at a blurred and rotated object in a series of images it is hard to discern. Also we might think the object is not the same in the different images, although it actually is. But the question is, does that mean that the algorithm can reliably determine the identity of an object even if a human viewer cannot?
  
  Actually I doubt
Oh really? (Score:2)

by Whuffo ( 1043790 ) writes:

They're going to distinguish an individual based on images with 256 to 1024 bits of data?
I guess nobody there thought to do the math before making these claims. This story probably shouldn't have made it to the front page; it's less than useful.
- Re: (Score:3, Insightful)
  
  by elnico ( 1290430 ) writes:
  
  "They're going to distinguish an individual based on images with 256 to 1024 bits of data?"
  
  No one said they were going to identify individual people with this. The main gist of this research seems to be efficiency (in both space and time, if I read it correctly). For instance, if one wanted to identify every face in a picture of a crowd, they could apply this algorithm to a low-res version of the image to quickly find the locations of every "face," and then use a more advanced face recognition algorithm t
- Re: (Score:1)
  
  by hilather ( 1079603 ) writes:
  
  2^1024 is 1.797693134862315907729305190789e+308. I think you could do a lot with a number that big.
not so much an advance (Score:2)

by nguy ( 1207026 ) writes:

This is not so much an "advance", and more a demonstration that some image recognition problems can be solved with fairly simple, well known methods and a lot of data.
I can see it now (Score:1)

by LuxMaker ( 996734 ) writes:

all the Obama halloween masks setting off false positives from sea to shining sea.
- Re: (Score:2)
  
  by rts008 ( 812749 ) writes:
  
  I only have a Tricky Dick maskhttp://en.wikipedia.org/wiki/Richard_Nixon [wikipedia.org], you insensitive clod!
  
  Get off my lawn, you young whippersnapper!
But can it tell the difference between.... (Score:2)

by 3seas ( 184403 ) writes:

a fake ufo picture and a real one?

How will spammers make use of this? Well just make that viagra pill be reflected in a coke bottle.

anyone for a random bit generator to see what random results gets labeled?

Wonder what fractals might produce?

moral of the question: we can always break what we make.
Search Jenna Jameson? (Score:5, Funny)

by Bayoudegradeable ( 1003768 ) writes: on Saturday May 24, 2008 @09:23PM (#23532936)

Oh my, the soon to be most searched "name" on the web is... Jenna Jameson! Wait a minute, I think I misunderstood "facial" recognition...

i'm a little concerned about the licensing. (Score:1)

by notgm ( 1069012 ) writes:

if they grabbed 13 million images from the net, there's a good chance that many of them are copyrighted. if they are using those copyrighted images in their (presumably FOR SALE) software, wouldn't that require some serious licensing fees, even if it's an internal-you-never-see-the-pictures usage, since it's a part of their algorithm, or what-have-you?

for the record, i say this as a concerned/curious artist who isn't looking for a payout.
- Re: (Score:1)
  
  by edschurr ( 999028 ) writes:
  
  It's not for sale, but rather "publically available". Anyhow, I'd bet it isn't a copyright infringement. Images are reduced to under 1 KB, making them less like an image and more like a statistic.
- Link to the source code (Score:2)
  
  by mangu ( 126918 ) writes:
  
  their (presumably FOR SALE) software
  
  As I understood it, it's not for sale, you can get it at his MIT website [mit.edu]
- Re: (Score:2)
  
  by moderatorrater ( 1095745 ) writes:
  
  Kind of like how the brain uses heuristics and previous knowledge to recognize images instead of actually solving the problem (whatever that means)?
- Re: (Score:2)
  
  by Hays ( 409837 ) writes:
  
  Yeah, what a trivial test environment, their "All pictures on Flickr" database. That's so narrow.
  - Since 8% of pix on flickr... (Score:1)
    
    by A New Normalcy ( 1190543 ) writes:
    
    ...are of Phuket.
- Re: (Score:2)
  
  by Concerned Onlooker ( 473481 ) writes:
  
  Well, no. As Hays pointed out they have a really huge database of images, plus by limiting severely how many bits it takes to represent an image they've made it possible to do the search in real time, all in RAM, with a modest investment in computing power. In addition, it's not simply pixel matching. The images are arranged according to a semantic tree of English nouns so, if I'm interpreting it correctly, you will end up with images next to each other on the map that are also semantically similar. Thi
Scary. (Score:2)

by drolli ( 522659 ) writes:

Given this works under real world conditions, it would make it possible that every shop gets the list of faces of criminals or other people which "you dont want" in your shop, recognize them in real time using little investment only and throw them out. Or the possibility to track a person on traffic surveillance cameras. That is pretty freaky. Politicians, please make laws which restrict such databases. I don't like the idea of beiing escorted out of the shopping mall because of my credit rating. Or that so
- Re: (Score:1)
  
  by psued0ch ( 1200431 ) writes:
  
  We are already past that stage, chronologically and comparatively.
  - Re: (Score:2)
    
    by Peaker ( 72084 ) writes:
    
    Really? Which people you know have mysteriously disappeared lately?
Very cool stuff... (Score:2, Interesting)

by Pedrito ( 94783 ) writes:

This actually solves a problem I've been stumped on for a while. I need a way to search for similar images such that images that are similar have a searchable value with an inherent "nearness" quality.

That is, there are a number of image similarity algorithms, but the computed values of two similar images are not necessarily mathematically near to each other. This algorithm produces values that are, which can make searching for similar images among very many images, quite fast.
- Re:Very cool stuff... (Score:4, Informative)
  
  by Hays ( 409837 ) writes: on Saturday May 24, 2008 @11:26PM (#23533353)
  
  What you're asking for is ill-defined, but much sought after.
  
  A reasonable descriptor which produces distances that seem somewhat correlated with human perception would indeed be Antonio Torralba and Aude Oliva's gist descriptor.
  
  http://people.csail.mit.edu/torralba/code/spatialenvelope/ [mit.edu]
  
  It's become quite popular in computer vision and computer graphics for scene matching.
  
Hmmm.... (Score:1)

by thetoadwarrior ( 1268702 ) writes:

I hope they didn't use any of my pictures because I sure as hell didn't give them permission to use my images for something like that and I clearly state that my own work is my property on my website for my use only.

While I don't care about most uses about my images (go a head and PS a penis in my mouth) but I would fight it if I found out it was used for this.
Think it through (Score:2)

by Woundweavr ( 37873 ) writes:

Even if this was perfectly efficient, I'm pretty sure there's more than 256 * 1024 things you could have an image of out there. The amount of information this analysis could give just can't be very useful.

Thats not meant to disparage the work - image recognition is important and difficult. This particular 'advance' just isn't that 'advancing'
- Re: (Score:1)
  
  by Jere_Jones ( 1095681 ) writes:
  
  You might have misread the numbers. They said 256 'bits'. 256 bits can distinguish roughly 1.1 * 10^77 states. That's a LOT. 1024 bits can distinguish roughly 1.8 * 10^308 states. I don't think there are that many atoms in the universe. Jere
  - Re: (Score:1)
    
    by Rockoon ( 1252108 ) writes:
    
    Your first number, 10^77, is approximately the number of hydrogen atoms estimated to be in the universe.
    
    Your second number is essentialy the number of hydrogen atoms in 10^231 universes, all similar to our own.
    
    In english, 2^1024 is approximately equal to the number of hydrogen atoms in a googol googol universes. Imagine if you will, a replica universe for every hydrogen atom in our universe. Now imagine a replica universe for each of the hydrogen atoms in those replica universes.
    
    We are still about 10^
    - Re: (Score:2)
      
      by fyngyrz ( 762201 ) * writes:
      
      It still isn't big enough to do even a half decent job.
      
      Imagine that a bit has a coherent meaning, such as "image contains a kitten." If the bit is zero, there is no kitten in the image; if it is one, there is at least one kitten in the image.
      Now imagine that system extended for all types of animals. Going to go past a requirement for 1024 bits pretty fast, isn't it?
      Now imagine that a bit represents "image contains a keyboard" in the same fashion.
      Now imagine a bit for every type of macro and mic
      - Re: (Score:1)
        
        by Rockoon ( 1252108 ) writes:
        
        Imagine that a bit has a coherent meaning, such as "image contains a kitten." If the bit is zero, there is no kitten in the image; if it is one, there is at least one kitten in the image.
        Why would I imagine that?
        
        The most trivial example that busts your theory is the game "20 questions." Each answer is yes/no, a single bit of data mapping a 20 bit value to 1 million arbitrary things. There is no bit assigned to "this is a cat", each bit is taken in context with all of the other bits.
        
        A more complex example is Arithmetic Compression, which assigns fixed-length input codes to variable-length output codes. The shortest output code length is much smaller than a single bit (any symbol with
        
        Re: (Score:2)
        
        by fyngyrz ( 762201 ) * writes:
        
        You should imagine it because you are conflating the number of bits required to count things with the number of bits required to discriminate among things. The two are entirely disjoint.
        
        Re: (Score:1)
        
        by Rockoon ( 1252108 ) writes:
        
        Minimal Perfect Hashing says differently, and I didnt "conflate" anything.
        
        I simply expressed 2^1024 in understandable terms, keeping on point with the person I replied to who was trying to compare his numbers with the number of atoms in the universe.
        
        Also, you apparently didn't read the article because you attribute qualities to the technique which the article specifically denies. It does not say that it will descriminate against individual people (such as "one bit that says 'image contains pesson Rockoo
I forsee nefarious law enforcement uses (Score:2)

by LM741N ( 258038 ) writes:

But then again, rotating an image 90 degrees might defeat the whole system. Scientists are so busy being sophisticated and racing to write journal articles that they often miss the obvious.
- Re: (Score:3, Insightful)
  
  by ceoyoyo ( 59147 ) writes:
  
  Any decent object recognition algorithm supports at least affine transformations, which include rotation.
  
  Some of those scientists are actually pretty smrt.
  - Re: (Score:2)
    
    by 4D6963 ( 933028 ) writes:
    
    Any decent object recognition algorithm supports at least affine transformations, which include rotation.
    Which always made me wonder, how do they go about doing that? Do they perform cross-correlation for every variant of each support affine transform? Or is it something completely different?
    - Re:I forsee nefarious law enforcement uses (Score:5, Informative)
      
      by ceoyoyo ( 59147 ) writes: on Sunday May 25, 2008 @03:58PM (#23537837)
      
      There are all kinds of ways, but two simple ones come to mind. If you convert to a polar coordinate system the power spectrum is conveniently orientation independent. You can use the same trick with a shift: the power spectrum of a Cartesian coordinate system is shift independent.
      
      Another way is to somehow identify the orientation. An simple way to do that is to find the axis along which there's maximum variation and rotate until those axes match in both images.
      
      Pixel by pixel co-registration basically does look at a similarity measure for a lot of variations on the affine transform. You generally don't have to look at them all though: you use an iterative algorithm with a clever optimization strategy so your transform gets better and better instead of searching through the parameter space randomly.
      
Typical PC (Score:2)

by Nom du Keyboard ( 633989 ) writes:

search for similar pictures through millions of images in less than a second on a typical PC.

Of course that typical PC is a dual quad-core machine running at 3GHz with 8GB of memory, GPU X3 running offloading co-processing software, and 1TB of hard drive space.
Mpff (Score:1)

by ArAgost ( 853804 ) writes:

This technology has been around for centuries! 256 bits to describe the content of an image? It's called title. 1024 bits to describe the content of an image? It's called a caption.
Opensource project to identify similar images (Score:2)

by chrysalis ( 50680 ) writes:

Check out libpuzzle : http://libpuzzle.pureftpd.org/project/libpuzzle [pureftpd.org]

It's also designed to quickly find similar images, even out of millions of images. The documentation describes a possible indexation technique (as suggested in the original paper):
http://download.pureftpd.org/pub/pure-ftpd/misc/libpuzzle/doc/README [pureftpd.org]

Images are stored as 544-bits signatures by default.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

There goes the neighborhood (Score:4, Funny)

Re: (Score:2)

Re:There goes the neighborhood (Score:5, Insightful)

Re:There goes the neighborhood (Score:5, Funny)

Re: (Score:1)

Re: (Score:1)

Re:There goes the neighborhood (Score:5, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1, Offtopic)

tests (Score:1, Funny)

Re: (Score:1)

Like every other "advance" in image recognition... (Score:3, Insightful)

Re:Like every other "advance" in image recognition (Score:5, Funny)

Re: (Score:1)

Re:Like every other "advance" in image recognition (Score:5, Funny)

Re:Like every other "advance" in image recognition (Score:3, Informative)

Re: (Score:1)

Re:Like every other "advance" in image recognition (Score:3, Interesting)

Re:Like every other "advance" in image recognition (Score:5, Informative)

Re: (Score:1)

Another Roland Piquepaille story on Slashdot (Score:2)

Re: (Score:1)

Oh thats really simple to do... (Score:2)

Of course it helps if you read the papers... (Score:4, Informative)

Re: (Score:3, Insightful)

Re:Of course it helps if you read the papers... (Score:4, Insightful)

Re: (Score:2)

Re: (Score:1)

Oh really? (Score:2)

Re: (Score:3, Insightful)

Re: (Score:1)

not so much an advance (Score:2)

I can see it now (Score:1)

Re: (Score:2)

But can it tell the difference between.... (Score:2)

Search Jenna Jameson? (Score:5, Funny)

i'm a little concerned about the licensing. (Score:1)

Re: (Score:1)

Link to the source code (Score:2)

Re: (Score:2)

Re: (Score:2)

Since 8% of pix on flickr... (Score:1)

Re: (Score:2)

Scary. (Score:2)

Re: (Score:1)

Re: (Score:2)

Very cool stuff... (Score:2, Interesting)

Re:Very cool stuff... (Score:4, Informative)

Hmmm.... (Score:1)

Think it through (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

I forsee nefarious law enforcement uses (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re:I forsee nefarious law enforcement uses (Score:5, Informative)

Typical PC (Score:2)

Mpff (Score:1)

Opensource project to identify similar images (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals