Forgot your password?
typodupeerror
Google Technology

Google Using ReCAPTCHA To Decode Street Addresses 104

Posted by timothy
from the you-are-the-crowd-being-sourced dept.
smolloy writes "Apparently some users of reCAPTCHA have recently begun seeing photographs appear in their CAPTCHA puzzles — photos that look very much like zoomed in house numbers taken from Google Streetview. It appears that Google has decided to put the reCAPTCHA system to help clean up Google streetview images, and 'according to a Google spokesperson, the system isn't limited to street addresses, but also involves street names and even traffic signs.' A large collection of these has appeared on the Blackhatworld website."
This discussion has been archived. No new comments can be posted.

Google Using ReCAPTCHA To Decode Street Addresses

Comments Filter:
  • by Anonymous Coward on Thursday March 29, 2012 @05:13PM (#39515495)

    This is an incredibly fascinating and great use of the technology.

  • Wow that site is so terrible looking that it makes Geocities and myspace look decent. The only thing it's missing is cosmic cursors.

  • by mykos (1627575) on Thursday March 29, 2012 @05:18PM (#39515569)
    What happens to the other part? Does google keep recycling it until it has multiples of the same answer? Can we all agree on a word for the addresses just to have some fun with google?
    • by melikamp (631205)
      Unless I know that I am contributing to a libre project, I always use "fuck" in reCAPTCHA.
    • Can we all agree on a word for the addresses just to have some fun with google?

      Actually, words instead of numbers could be an issue already. My parents' house does not have a number anywhere. The house has a visible name instead, and that's what is used in letters addressed to them (including government letters): house-name, street-name, etc. Some houses on their street have numbers, but most just have names, and the house names are nothing to do with the names of the occupants. BTW that particular first world country does not have any postal codes, either.

      • by ewieling (90662)
        Somewhat off-topic, I admit.

        In the USA when e911 service is introduced into an area each street is named and each house numbered in the maps e911 uses, I assume these are official postal addresses as well. It is a good idea to have your house number visible for emergency services to find you.
      • by operagost (62405)
        Can we pick on Ireland mercilessly for naming instead of numbering their houses-- like the USA gets picked on for still using measures based on some guy's foot?
    • If they're using this as a way to identify the street numbers, then I would assume that they're randomly matching the numbers with different words and seeing if they can get several matches to the same numbers. I would guess that they're also comparing the results to attempts at automated OCR. It would be difficult to bomb.

    • by chrismcb (983081)
      Sure. But can you convince enough of the population to answer incorrectly?
  • They're using us to identify our own home and business addresses, does anyone else feel a little violated by this?

    Could just be me being paranoid, but this sounds like something out of a science fiction book. Whoever had the idea to do this, I have to admit, was really using their head though.
    • by nine-times (778537) <nine.times@gmail.com> on Thursday March 29, 2012 @05:21PM (#39515613) Homepage
      I don't find it worrying. The existence of a street address is properly public knowledge. It's not an invasion of privacy until they link the address with who lives there.
    • by geekoid (135745)

      You mean your public information is public? shocking.

      "Could just be me being paranoid, but this sounds like something out of a science fiction book."
      Warp Drive teleporters, FTL, light sabers and robots are all in Science fiction. Why do you think things in science fiction are bad?

      • by NIN1385 (760712)
        I never stated that I thought public information being public was shocking me. Man, I sense some hostility. I don't recall saying that science fiction was bad either.

        I just raised a question, I even stated that I "Could just be me being paranoid". When did /. become so damn hostile. Holy shit.
  • by Gen-GNU (36980) on Thursday March 29, 2012 @05:28PM (#39515723)

    I have read the quote from Google about what they are doing several times, and I don't see what everyone else sees. It appears to me that they are using the already known street names and numbers as possible ReCAPTCHA images. What they are NOT doing is using the results given by people to define what the image says. The point of the experiment is to determine whether these images are sufficient to separate people from web-bots. I imagine that they will look at the number of 'wrong' answers from both sides of the test, and see if bots are able to parse the street view images significantly more often than the standard test images.

    So... can anyone point to something in the Google quote to show me where I went wrong? From TFA, here is the quote:

    We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations. Based on the data and results of these reCaptcha tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online.

    • What they are NOT doing is using the results given by people to define what the image says.

      Um, no, that's exactly what ReCaptcha is for! The standard ReCaptcha images are all from old books that were scanned in (and presumably had trouble being OCRed with high confidence), and Google used ReCaptcha to "read" the words.

      For heaven's sake, ReCaptcha's MOTTO is: "reCAPTCHA: Stop Spam, Read Books"

      I read how it works. Multiple users are shown the same image, and once a few people have identified a given image as the same word, it's treated as the "correct" answer, and then later users have to match t

      • by martin-boundary (547041) on Thursday March 29, 2012 @07:45PM (#39517123)
        Yeah, the problem with that is that it can't work when most of the humans are robots. The robots will make guesses using standard algorithms, and their guesses will be pretty consistent with the other robots' guesses (which are quite probably the same robot in another instance). Then Google thinks the robot guess is correct, because it's overwhelmingly the most consistent answer. And humans who give the correct answer get marked wrong, because they're a minority.

        It's quite noticeable if you use a site which relies heavily on recaptchas. For example, when you get a word which has old english S [wikipedia.org] which looks like a modern small case F, you're much better off claiming it's an F instead of giving the correct answer.

  • I don't see how anyone can be pissy they're doing this.
    They already list the number of the house on maps.
    • by Anonymous Coward

      My understand of ReCAPTCHA is that it's to help translate books for libraries. Google has distorted that by using it to improve it's own databases. I personally don't have ReCAPTCHA on my website, but if I did I would be completely pissed off. Google is a for-profit company and can pay to do user studies to see how well people can read images. I'm willing to donate my time/reading ability to random libraries, not Google.

      • by icebike (68054) * on Thursday March 29, 2012 @06:04PM (#39516169)

        Oh, climb down off that ledge before you get hurt.

        reCAPTCHA is for what ever you want to use it for, Its simply a technique for crowdsourcing guesses.

        In my estimation, Google maps and street view is one of the great accomplishments of our time, easily worth every penny Google monetizes out of it.

        • by Inda (580031)
          It's worth every penny I pay for it and more.

          The picture of my house has a funny join; two photos that span my house. The join is in the middle of my burgular siren and it makes it stand out. Brilliant.
  • I'm glad something is being done I can't recall how many times I've looked up a street address to find Google maps reporting it as being 4 or 5 blocks away (on average) from where it actually is.

  • Back when reCaptcha showed two words that you could find in the dictionary, black on white I had no problem with it, it seemed like a good idea and you might be contributing to digitizing a book or something.
    But now you just get randomly generated characters with a zigzag going through the middle and blobs that invert it and it's hard to tell if this one letter is an 'i' or an 'r' or a 't'.
    So I don't even bother looking at the real word and just solve the generated one.

  • So many addresses has been fuzzy that I could that could only be a strange design choice.

  • ... to help clean up Google streetview images...

    I thought text in Streetview was blurred out by design in the same way that faces were-- automatically and for security reasons (read: so Google doesn't get sued by crazy OMG I'M ON TEH INTERNET people).

    I'd actually prefer if they un-blurred all street numbers and signs. It's fine to rely on Map's street number location when you're in a huge city, and the difference between 123 fake street and 125 fake street is ten feet or so. But last time I planned a ro

New systems generate new problems.

Working...