Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Software

An Advance In Image Recognition Software 81

Roland Piquepaille alerts us to work by US and Israeli researchers who have developed software that can identify the subject of an image characterized using only 256 to 1024 bits of data. The researchers said this "could lead to great advances in the automated identification of online images and, ultimately, provide a basis for computers to see like humans do." As an example, they've picked up about 13 million images from the Web and stored them in a searchable database of just 600 MB, making it possible to search for similar pictures through millions of images in less than a second on a typical PC. The lead researcher, MIT's Antonio Torralba, will be presenting the research next month at a conference on Computer Vision and Pattern Recognition.
This discussion has been archived. No new comments can be posted.

An Advance In Image Recognition Software

Comments Filter:
  • by HeavensBlade23 ( 946140 ) on Saturday May 24, 2008 @08:48PM (#23532724)
    This will be used to break CAPTCHA-type schemes even worse than they already are.
    • I had thought it would be more along the lines of categorising pictures into 'PORN' and 'NOT PORN'. Or possibly even 'PORN', 'GAY PORN', 'SHEEP PORN' and 'NOT PORN' if it's really advanced.
    • by Jeremiah Cornelius ( 137 ) * on Saturday May 24, 2008 @08:57PM (#23532792) Homepage Journal
      This will be used to identify YOU, citizen.
    • by elnico ( 1290430 ) on Saturday May 24, 2008 @09:22PM (#23532926)
      Incorrect. First of all, in a CAPTCHA, you're trying to very rigorously inspect a single image. This advance seems to be more about taking quick glances at lots of images. Furthermore, in the article, they talk about recognizing flowers and cars. The fact is, computers already have no problem recognizing letters and numbers in images. We got that down a long time ago. The difficult things about reading a CAPTCHA image are removing distortion and splitting the whole image into the component characters. If you read the article, you'd see that this research has nothing to do with that.
      • So in other words it just killed the Kitten... Auth?
      • by EMeta ( 860558 )
        Yes, but as you say, letters and numbers are getting trivial, so we will need captchas outside of that. "Click the pictures with people out of the following set of 20 pictures" could have been a useful method.
  • tests (Score:1, Funny)

    by nawcom ( 941663 )
    thank god, now i can get some assistance when i'm taking one of those "real tits or fake tits" online quizes. fapfapafap.
  • by ZxCv ( 6138 ) on Saturday May 24, 2008 @08:55PM (#23532786) Homepage
    ...I'll believe it when I see it.

    Until then, it's snake oil, as far as I'm concerned.
  • .... the answer is always "a picture"
  • I hate reading press releases of reading papers with real explanations of what's going on.

    I just finished reading "Small Codes and Large Image Databases for Recognition" written by the guy. All he did was implemented Geoff Hinton's idea of databasing images with a binarized coefficients produced by Restricted Boltzmann Machines.

    Hinton himself gave a talk on it for Google here:
    http://www.youtube.com/watch?v=AyzOUbkUf3M [youtube.com]

    Actually I'm wondering, is he plagiarizing Hinton?
    • Re: (Score:3, Insightful)

      by samkass ( 174571 )
      I think plagiarizing is a strong word to throw around. And particular implementations of general approaches can often be very interesting when one considers what tradeoffs are made transferring pure theory to practical applications. If this sort of thing were attempted in the 90's, they'd probably arbitrarily pick a few hundred features by hand and KL-transform it down to the most significant dimensions and hash those into one of these codes. Since I've been out of "the biz" for awhile, it's pretty inter
    • by Hays ( 409837 ) on Saturday May 24, 2008 @11:17PM (#23533305)
      Jeff Hinton worked with them, you really think they're plagiarizing him? That claim doesn't even make sense, this is a novel research domain. A big part of science is taking people's ideas, reproducing them, and applying them to novel domains. That's how it's SUPPOSED to work.

      This research involves the use of one of the largest image databases seen in computer vision. It shows that you can do extremely rapid scene matching for databases of this scale. No, that's not obvious no matter what you think. This image data is fairly high dimensional.

      This research says something about the space of likely scenes and it might be a key enabling technology to a lot of the heavily data driven computer vision and computer graphics approaches popping up lately.

      • My mistake...I just saw the video where Hinton was suggesting to Google, tongue-in-cheek, to use his RBM bottleneck trick as a method of databasing and then to see this guy's paper mentioning the very same thing a few weeks later.

        It was a skim to see what the hell the article was really about, didn't know these two were connected. I jumped the gun 'cause I got burned by a plagiarizer in the past, sorry.
    • Agree with parent, before reading the actual paper the press release is kinda useless.

      Also the example shown in the article does not really make sense for me. I mean of course if we look at a blurred and rotated object in a series of images it is hard to discern. Also we might think the object is not the same in the different images, although it actually is. But the question is, does that mean that the algorithm can reliably determine the identity of an object even if a human viewer cannot?

      Actually I doubt
  • They're going to distinguish an individual based on images with 256 to 1024 bits of data?

    I guess nobody there thought to do the math before making these claims. This story probably shouldn't have made it to the front page; it's less than useful.

    • Re: (Score:3, Insightful)

      by elnico ( 1290430 )
      "They're going to distinguish an individual based on images with 256 to 1024 bits of data?"

      No one said they were going to identify individual people with this. The main gist of this research seems to be efficiency (in both space and time, if I read it correctly). For instance, if one wanted to identify every face in a picture of a crowd, they could apply this algorithm to a low-res version of the image to quickly find the locations of every "face," and then use a more advanced face recognition algorithm t
    • 2^1024 is 1.797693134862315907729305190789e+308. I think you could do a lot with a number that big.
  • This is not so much an "advance", and more a demonstration that some image recognition problems can be solved with fairly simple, well known methods and a lot of data.
  • all the Obama halloween masks setting off false positives from sea to shining sea.
  • a fake ufo picture and a real one?

    How will spammers make use of this? Well just make that viagra pill be reflected in a coke bottle.

    anyone for a random bit generator to see what random results gets labeled?

    Wonder what fractals might produce?

    moral of the question: we can always break what we make.

     
  • by Bayoudegradeable ( 1003768 ) on Saturday May 24, 2008 @09:23PM (#23532936)
    Oh my, the soon to be most searched "name" on the web is... Jenna Jameson! Wait a minute, I think I misunderstood "facial" recognition...
  • if they grabbed 13 million images from the net, there's a good chance that many of them are copyrighted. if they are using those copyrighted images in their (presumably FOR SALE) software, wouldn't that require some serious licensing fees, even if it's an internal-you-never-see-the-pictures usage, since it's a part of their algorithm, or what-have-you?

    for the record, i say this as a concerned/curious artist who isn't looking for a payout.

  • Given this works under real world conditions, it would make it possible that every shop gets the list of faces of criminals or other people which "you dont want" in your shop, recognize them in real time using little investment only and throw them out. Or the possibility to track a person on traffic surveillance cameras. That is pretty freaky. Politicians, please make laws which restrict such databases. I don't like the idea of beiing escorted out of the shopping mall because of my credit rating. Or that so
  • Very cool stuff... (Score:2, Interesting)

    by Pedrito ( 94783 )
    This actually solves a problem I've been stumped on for a while. I need a way to search for similar images such that images that are similar have a searchable value with an inherent "nearness" quality.

    That is, there are a number of image similarity algorithms, but the computed values of two similar images are not necessarily mathematically near to each other. This algorithm produces values that are, which can make searching for similar images among very many images, quite fast.
  • I hope they didn't use any of my pictures because I sure as hell didn't give them permission to use my images for something like that and I clearly state that my own work is my property on my website for my use only.

    While I don't care about most uses about my images (go a head and PS a penis in my mouth) but I would fight it if I found out it was used for this.
  • Even if this was perfectly efficient, I'm pretty sure there's more than 256 * 1024 things you could have an image of out there. The amount of information this analysis could give just can't be very useful.

    Thats not meant to disparage the work - image recognition is important and difficult. This particular 'advance' just isn't that 'advancing'
    • You might have misread the numbers. They said 256 'bits'. 256 bits can distinguish roughly 1.1 * 10^77 states. That's a LOT. 1024 bits can distinguish roughly 1.8 * 10^308 states. I don't think there are that many atoms in the universe. Jere
      • Your first number, 10^77, is approximately the number of hydrogen atoms estimated to be in the universe.

        Your second number is essentialy the number of hydrogen atoms in 10^231 universes, all similar to our own.

        In english, 2^1024 is approximately equal to the number of hydrogen atoms in a googol googol universes. Imagine if you will, a replica universe for every hydrogen atom in our universe. Now imagine a replica universe for each of the hydrogen atoms in those replica universes.

        We are still about 10^
        • by fyngyrz ( 762201 ) *

          It still isn't big enough to do even a half decent job.

          Imagine that a bit has a coherent meaning, such as "image contains a kitten." If the bit is zero, there is no kitten in the image; if it is one, there is at least one kitten in the image.

          Now imagine that system extended for all types of animals. Going to go past a requirement for 1024 bits pretty fast, isn't it?

          Now imagine that a bit represents "image contains a keyboard" in the same fashion.

          Now imagine a bit for every type of macro and mic

          • Imagine that a bit has a coherent meaning, such as "image contains a kitten." If the bit is zero, there is no kitten in the image; if it is one, there is at least one kitten in the image.

            Why would I imagine that?

            The most trivial example that busts your theory is the game "20 questions." Each answer is yes/no, a single bit of data mapping a 20 bit value to 1 million arbitrary things. There is no bit assigned to "this is a cat", each bit is taken in context with all of the other bits.

            A more complex example is Arithmetic Compression, which assigns fixed-length input codes to variable-length output codes. The shortest output code length is much smaller than a single bit (any symbol with

            • by fyngyrz ( 762201 ) *

              You should imagine it because you are conflating the number of bits required to count things with the number of bits required to discriminate among things. The two are entirely disjoint.

              • Minimal Perfect Hashing says differently, and I didnt "conflate" anything.

                I simply expressed 2^1024 in understandable terms, keeping on point with the person I replied to who was trying to compare his numbers with the number of atoms in the universe.

                Also, you apparently didn't read the article because you attribute qualities to the technique which the article specifically denies. It does not say that it will descriminate against individual people (such as "one bit that says 'image contains pesson Rockoo
  • But then again, rotating an image 90 degrees might defeat the whole system. Scientists are so busy being sophisticated and racing to write journal articles that they often miss the obvious.
    • Re: (Score:3, Insightful)

      by ceoyoyo ( 59147 )
      Any decent object recognition algorithm supports at least affine transformations, which include rotation.

      Some of those scientists are actually pretty smrt.
      • by 4D6963 ( 933028 )

        Any decent object recognition algorithm supports at least affine transformations, which include rotation.
        Which always made me wonder, how do they go about doing that? Do they perform cross-correlation for every variant of each support affine transform? Or is it something completely different?
        • by ceoyoyo ( 59147 ) on Sunday May 25, 2008 @03:58PM (#23537837)
          There are all kinds of ways, but two simple ones come to mind. If you convert to a polar coordinate system the power spectrum is conveniently orientation independent. You can use the same trick with a shift: the power spectrum of a Cartesian coordinate system is shift independent.

          Another way is to somehow identify the orientation. An simple way to do that is to find the axis along which there's maximum variation and rotate until those axes match in both images.

          Pixel by pixel co-registration basically does look at a similarity measure for a lot of variations on the affine transform. You generally don't have to look at them all though: you use an iterative algorithm with a clever optimization strategy so your transform gets better and better instead of searching through the parameter space randomly.
  • search for similar pictures through millions of images in less than a second on a typical PC.

    Of course that typical PC is a dual quad-core machine running at 3GHz with 8GB of memory, GPU X3 running offloading co-processing software, and 1TB of hard drive space.

  • This technology has been around for centuries! 256 bits to describe the content of an image? It's called title. 1024 bits to describe the content of an image? It's called a caption.
  • Check out libpuzzle : http://libpuzzle.pureftpd.org/project/libpuzzle [pureftpd.org]

    It's also designed to quickly find similar images, even out of millions of images. The documentation describes a possible indexation technique (as suggested in the original paper):
    http://download.pureftpd.org/pub/pure-ftpd/misc/libpuzzle/doc/README [pureftpd.org]

    Images are stored as 544-bits signatures by default.

All the simple programs have been written.

Working...