Stories
Slash Boxes
Comments
typodupeerror delete not in

+-   OCR limits force Ancestry.com to turn to China-> on Sunday October 05 2008, @07:57AM Ian Lamont

Submitted by Ian Lamont on Sunday October 05 2008, @07:57AM
software
Ian Lamont writes "Even though some CAPTCHA schemes have been cracked in the past year, a far more difficult challenge lies in using software to recognize handwritten text. Optical code recognition has been used for years to convert printed documents into text data, but the enormous variation in handwriting styles has thwarted large-scale OCR imports of handwritten public documents and historical records. Ancestry.com took a surprising approach to digitizing and converting all publicly released U.S. census records from 1790 to 1930: It contracted the job to Chinese firms whose staff manually transcribed the names and other information. The Chinese staff are specially trained to read the cursive and other handwriting styles from digitized paper records and microfilm. The task is ongoing with other handwritten records, at a cost of approximately $10 million per year, the company's CEO says."
Link to Original Source
submission

This discussion was created for logged-in users only, but now has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
In the next world, you're on your own.