Fedora Users — Re: ocr + fedora core and a big book..

Re: ocr + fedora core and a big book..

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

Hi,

> I have been asked to capture a book of stats into a database, I was
> suggested to use ocr, any sugestions on how I may go about this, and
> is it worth the effort.... The text in question is printed text ...

Grab a copy of gocr, compile and install (it's not in FE which is odd).
When you scan, ensure it's at as high a resolution as possible (minimum
in my experience of 300 dpi) and grey scaled.

Use either gimp or xsane to grab the scan and tell gocr to do it's
business.

OCR is not an exact science and you will really need to sit down and go
through the scanned text to ensure that the numbers scanned are correct
(very easy to spot, you may have @ instead of 0, l for 1 and the such).
Save the file generated. You may then need to either write a script to
delimit using " " as the target or feed it into emacs and then search
and replace " " for "," - save.

Getting into the database - depends on the type. MySQL is pretty easy.

TTFN

Paul
-- 
"Logic, my dear Zoe, is merely the ability to be wrong with authority" -
Dr Who