Google Docs' OCR Quality Tested 99
orenh writes "Google has released a Google Docs application for Android, which includes the ability to create documents by OCR-ing photos. I tested the application's OCR quality and found that it's mediocre under the best conditions and poor under real-world conditions. However, I believe that this poor performance is caused in part by an intentional decision by Google."
/b/ (Score:1, Interesting)
Re: (Score:1)
You said nigger, you referenced to 4chan and b/. Most mod don't take time to understand what it been told, they only lookup the keyword. Michael Kristopeit is right, slashdot is stagnated... LOL i am better post that as A. Coward.
Re: (Score:2, Insightful)
Re: (Score:2)
And I mod according to what they say the guidelines are.
But then again, I cruise here on raw and uncut simply because I want to see it all, and many good posts are hidden if you use filtering.
Guess I need to become a karma whore.
Re: (Score:2)
Trump? Is that you?
Re: (Score:2)
Google's OCR (Score:2)
Re: (Score:2)
Its also far from new. Didn't they get that from some long dead Open Source project?
Re: (Score:2)
I've played around with Google's OCR framework (tesseract) and it is far from perfect. So, this isn't really a surprise.
Its also far from new. Didn't they get that from some long dead Open Source project?
Answered in the order you mentioned each:
Yes, far from new (project started 26 years ago).
No, not long dead. Just "recently" (roughly 6 years ago, give or take) open sourced and ported/compiled for Linux, OS/2 (and other platforms I am sure).
Yes (open source project), and I think it was called... Tesseract. Kinda like the poster you responded to mentioned. ;-)
To save you the work, it was an HP/UNLV project, started in 1985, that was open-sourced in 2005. It is still available on SourceForge [sourceforge.net].
Re: (Score:2)
Yeah, I was looking for an android OCR library, and that one was the only one that came up. Although there are a few other Linux options, none of those seemed to be right on the money either. This article is strengthening the already published reports on open source OCR software: basically, it's not performing all that well. I wish it was.
Re: (Score:2)
This article is strengthening the already published reports on open source OCR software: basically, it's not performing all that well. I wish it was.
Now that it's getting some exposure, I'd say it'll be performing a lot better soon.
Nothing like being in the public eye for attracting clever people's attention.
Re: (Score:3)
I guess it'll be a little while before we'll see an app I'd wondered about. I thought it would be useful to be able to take snapshots of things like news reports (streamed on the web, El Gato Eye-TV domestic or satellite t.v., YouTube etc.) and do OCR on them, AND get an English translation of it. With the events so far this year, support for Japanese and Arabic languages would have been a good start.
Re: (Score:2)
Definitely, weirdly I was wondering this afternoon if Goggles can already do OCR and translation on full pages of text.. I have a French book that I'd love to read, but I have basically no French!
Re: (Score:2)
With the top-notch translators that are around today, you may be able to get the gist of the book. But the chance that the translation of the book will be a joy to read is about zero, zip, nada, nothing. You'd better buy a good translation or, if that's not available, try and learn French (with the book itself as source material maybe).
Re: (Score:2)
there was a paper about combining a (crappy) machine translation with low-skilled workers, who natively understand the target language, to patch up the glaring flaws. the idea is that _most_ of the errors made by the machine don't require understanding of the source language to detect. of course you lose out on anything 'deep' or artistic in the source language, and i would be hesitant to trust it for scientific papers or legal documents, but it's an interesting idea.
Re: (Score:2)
there was a paper about combining a (crappy) machine translation with low-skilled workers, who natively understand the target language, to patch up the glaring flaws.
I'm working on a project where the translations are handled like that. We send all texts to an external company, and a few hours later, they send back the translation. This seems to work relatively well.
The next phase involves immediate translation without human intervention. I'm curious as to how that will work out.
Re: (Score:2)
there was a paper about combining a (crappy) machine translation with low-skilled workers
Even better, Distributed Proofreaders [wikipedia.org] is Project Gutenberg's version of just that. They've probably passed 20,000 books OCR'd and proofed by now.
Re: (Score:2)
There are no translations available or I'd buy them. It's a book about Parkour by David Belle.. I'm just interested in basic history and his opinions rather than flowery language or whatever. If there is much discussion of technique it might be really hard to understand though - I auto-translated a French tutorial on rolling before, and it would just read as gibberish to someone who didn't already have a good idea of the technique.
Re: (Score:2)
Or you could learn French if much of the literature in your field of interest is in that language. It isn't that hard if you're not interested in fluency. You also have gained a valuable skill.
Hey, it's more useful than either Elvish or Klingon.
Re: (Score:2)
Here you go [amazon.co.uk]
It appears that there is a Facebook group where people are putting up translations of small parts of it now.
Better to scan to PDF (Score:4, Interesting)
There are a number of scanner apps in the market that do a much better job in the first step of this process, which is taking the picture. They then concentrate their efforts on producing a clean usable PDF of the document. I tested one of these and found that the PDF rendered by it was much better than the PDF produced by Google. [android.com]
Everything is crisp and readable.
If the first fails, its no wonder the second OCR step fails.
Re: (Score:2)
And how do you plan on searching, indexing, or otherwise having an computer operate on the contents of that document?
Re: (Score:3)
Just how many of such documents do you expect to have to index taken with a cellphone? Seriously, this is a toy. Don't go all corporate archives on me here.
Re: (Score:2)
Even then, I have yet to work for a company that has a searchable PDF archive. Even when I worked for Fairfax (media company here in AU that publishes national & local newspapers), the PDF archive that came straight out of the publishing app wasn't searchable. Hell, it only had 3 months of the paper on servers, the rest were on archive DVDs.
The whole idea of searchable PDFs died a long time ago, this is why business use purpose built products.
Also, the OP stated that it was the original PDF that was gen
Re: (Score:2)
I work on this (Score:2)
My job entails working with our office's document management system to manually enter metadata.
In part, I essentially end up parsing the data which users entered in various formats.
However, since the original form is entered electronically to begin with, I figure this could be a lot more automated. (The people in my office definitely have a clue; however, fat chance moving this up through the bureaucracy.)
Re: (Score:2)
Re: (Score:2)
Just how many of such documents do you expect to have to index taken with a cellphone? Seriously, this is a toy. Don't go all corporate archives on me here.
Well, that's the whole point to OCR. If you're just scanning, then you're just scanning. OCR'ing lets you do all kinds of text processing, analysis, format shifting etc. A scan is... just a picture of a document. Makes me think of microfiche.
Re: (Score:3)
True, but again, this is a cell phone app. You don't expect document management system level capabilities, especially not in release 1.0.
If you want that level of quality you bring something more than a cell phone to the task. Maybe a flatbed or something.
My point here is this: I've had much better luck going direct to PDF On the phone than via Google Docs.
Try this test if you have a Google Docs account, (even a free one):
Upload some PDF, even one created using something on your phone like CamScanner. [appbrain.com].
Th
Re: (Score:1)
I don't touch Google anything, except for email. I much rather use -real- solutions, with my nice flatbed etc :)
It is odd that your phone does that better than Google...
Re: (Score:2)
This is a Google product. They like to release early and do public betas lasting years, so expect rapid improvements.
There seems little point in reviewing a new Google product until it has matured somewhat because the first version is always half done sort-of-works quality code. The first version of Android typed everything entered into the phone into a hidden root shell for crying out loud. About the only area they seem to hold off in is the front page of their search engine.
No good free solutions (Score:3)
The end of the article is pretty telling. Basically any professional OCR software from the mid 1990's and normal consumer grade commercial software from today is lightyears ahead of open source solutions. Which is kind of sad, but the problem is that there really isn't a huge market for OCR in the way that there is for web browsers and other more successful projects, coupled with the inherent difficulty in doing good OCR.
Comment removed (Score:4, Interesting)
Re: (Score:3, Insightful)
I don't think that spammers have any amazing tech, they just have different requirements. They can still send spam with a 1% success rate whereas with OCR you'd want a 99% success rate.
99% success rate is crappy ... (Score:4, Insightful)
I don't think that spammers have any amazing tech, they just have different requirements. They can still send spam with a 1% success rate whereas with OCR you'd want a 99% success rate.
I once worked on an OCR project. The client specified a 99% success rate and we strained to restrain our grins. 99% is about one error every one or two lines of text. We got 99.6% in our first implementation before we even began to work on accuracy. Admittedly we had excellent image quality. This was a custom solution that had its own optics.
Re:99% success rate is crappy ... (Score:4, Interesting)
A 99% success rate could also mean 99 pages with zero errors out of a 100 pages attempted. With 250 words per page that would represent a mandated success rate of 99.995%
Re: (Score:2)
QUICK A LAWYER, LET'S GET HIM!
As an aside. Stupid slashdot filter is telling me using caps is like yelling. Well I AM yelling.
Re: (Score:2)
Heh, it's always fun to reinterpret requirements to make them easier to implement :)
A 99% success rate could also mean 99 pages with zero errors out of a 100 pages attempted. With 250 words per page that would represent a mandated success rate of 99.995%
Thankfully the client specified 99% with respect to character recognition not correct pages. If they were specifying pages we would have been straining to suppress pissing our pants rather than suppressing grins. :-)
Re: (Score:2)
Of course, we also specified that based on clean images scanned at 300 DPI, and they give us crap images scanned at 200 DPI with fold lines , highlighter and pen scribble and apparently their mailing machine sprays some kind of serial number on ev
Re: (Score:2)
obligatory XKCD [xkcd.com] (alt text is relevant)
Re: (Score:2)
Google's approach to accuracy appears to be somewhat novel. Most OCR software uses spelling correction and grammar rules to improve accuracy but Google use data derived from the contents of pages they index. They use it for translation too which, when it works, gives their output a more natural quality compared to previous efforts. I find that Chinese to English works particularly well.
Doubling OCR accuracy is exponentially harder. Unlike a human that can easily pick up on what type of document it is (lette
Re: (Score:2)
... Google use data derived from the contents of pages they index ...
Interesting. I guess that adapts for common usage deviating from proper spelling and grammar.
... pick up on what type of document it is (letter, technical manual, novel, newspaper article) and make informed mental corrections ...
Machines will do this to a degree, for example favoring lowercase L when the surrounding characters are alphabetic and favoring one when the surrounding characters are numeric. But yeah, context rules, the preceding works well enough in prose but often fails in source code.
Re: (Score:1)
Don't tell him this. It's funnier to let him keep PH3AR1NG TEH 3L33T HAXORZ.
No good solutions anywhere (Score:2)
Re: (Score:2)
Nexus S has no flash? (Score:1)
Re: (Score:2)
Google DOCs will use the flash or not, based on user settings, so, yeah, he just missed that.
But In my tests with Nexus One, (Not Nexus S), using the flash at the range needed to see the picture just puts a
white blob in the center of the shot and is actually worse than using bright room lights.
Re: (Score:3)
Re: (Score:1)
Re: (Score:2)
What article? The link seems to be pointing to a 403 Error page. At least to me.
Maybe it was just the OCR'ed output of a scan of "Loser roar"
( Ok, I couldn't come up with anything better)
OCR-B character recognition (question) (Score:2)
I'm in the market for a good way of recognizing OCR-B based characters on an android device (mostly uppercase characters and digits). I know the location (on a flat 2D plane in a 3D space) of the characters, but they do not form sentences or even words. Does anyone have a good algorithm to do this kind of low-level character recognition? A library would be even better of course, especially if it is open source. I'm personally thinking of comparing bitmaps or vectors.
As a hint to other devs, many commercial
Re: (Score:2)
What about OpenCV?
http://blog.damiles.com/?p=292 [damiles.com]
Re: (Score:2)
Looks promising, many thanks! License plates are not that far off from the intended purpose.
Um... (Score:5, Insightful)
And, seriously, how effective of OCR'ing are you really imagining you're going to get off of a camera phone pic, anyway?
Re: (Score:3)
Re: (Score:2)
> And, seriously, how effective of OCR'ing are you really imagining
> you're going to get off of a camera phone pic, anyway?
Camera phones are getting quite good. An iPhone 4 takes 5MP images and there are many others out now that are as good or better.
Specifically, the images are 2592x1936 pixels which equates to 225 dpi at 8.5" x 11". That's plenty to OCR a typical page--say, 8.5x11 with clean 12-point type. I've carefully taken photos of documents with my phone and printed them and they're indistingu
What a dumb fuck (Score:1)
I suppose this retard thinks he's clever.
Bad Kitty!
Verily, you may not link directly to images. Link to their containing web page instead.
You tried to access: /blog/
From: http://hurvitz.org/ [hurvitz.org]
I have spoken!
They Why (Score:3, Informative)
Re: (Score:2)
Does that mean it couldn't be a viable candidate for some Summer of Code work then?
More like Masters/PhD Thesis than Summer of Code (Score:4, Interesting)
Does that mean it couldn't be a viable candidate for some Summer of Code work then?
More like a bunch of masters/phd thesis to get started.
OCR is an area of AI research under the topic of Computer Vision. It is yet another area that seems simple in concept but turns out to be incredibly difficult in practice.
Re: (Score:3)
only problem is sometimes GA make a solution that makes no sense and should not work but somehow does http://www.damninteresting.com/on-the-origin-of-circuits [damninteresting.com]
Re: (Score:2)
seems to me that OCR would be an area that would be easy to build a framework for genetic algorithms, using a huge collection of solved OCR pages to evaluate. with each generation being tested on a random subset of pages so they do not learn to cheat instead of learn to solve.
Sounds like a great thesis project. :-)
Re: (Score:2)
Genetic algorithms are an optimisation algorithm, but what do you want to optimise exactly ? What are your individuals here ?
The idea of using a large collection of solved problem to check and improve the accuracy of the method looks more like neural network to me. Indeed, this seems to be a common method for OCR. For example : http://www.codeproject.com/KB/dotnet/simple_ocr.aspx [codeproject.com]
Re: (Score:2)
While neural networks are a good solution, genetic algorithms can still be used in conjunction with them.
One possible training method for neural networks happens to be genetic algorithms. The genes being the link strengths, and the fitness function being say the percentage of correct results. (If you reach a sufficiently high level, you might want to change to minimizing uncertainty, with a fitness dropping exponentially if the correct percentage drops too low.)
In the alternative genetic algorithms can be u
Re: (Score:2)
Sure, and with 9 women you can make a baby in a month.
10 experts and 5 years would be more feasible. 5 experts and 10 years even more so.
Scaling is hard.
See also http://en.wikipedia.org/wiki/The_Mythical_Man-Month [wikipedia.org]
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Re: (Score:3)
Re: (Score:2)
403 (Score:2)
Did anyone else mirror this? I'm just getting a 403.
Re: (Score:1)
Way to go (Score:1)
Forbidden
You don't have permission to access /blog/2011/04/ocr-quality-of-google-docs on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
Nice link, asshole.
Oh WTG (Score:1)
Self-promote to /. and host on a box that can't handle the limited traffic of a 25-comment popularity story?
GOOD WORK SON
Use in combination with CamScanner (Score:1)
Crippleware? (Score:2)
Google prides itself on having supposedly the best quality apps and features, which is why they take years to leave Beta. Why would they intentionally release a crippled version of their app? That will be the worst thing since Google Books with the missing pages.
In case I'm not the only one. (Score:1)
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artif
Obligatory (Score:1)
I've just tried it... (Score:3)
I think the quality is tolerable. I photographed a document lying on my desk, without doing anything special to make it smooth or adjust lighting. This is a good simulation of a real-world situation where you can photograph a piece of text. There were errors in the transcription but it was readable, and with a very little editing would have been perfect. What surprised me was that apparently the whole image was uploaded from my phone to Google Docs, and then downloaded again, which is a little bit inefficient; I think that the OCR process runs server side.
I see this as very useful. This afternoon I'm going in to the local planning office to look at some planning applications; I won't be able to take them away, and I doubt I'll be allowed to use a photocopier, but I will have my phone. That's a real world application. I can think of hundreds more.
Typical Google (Score:2)