Re: ocr + fedora core and a big book..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2006/1/16, Gregory Machin <gregory.machin@xxxxxxxxx>:
I agree with you, but the boss wants ocr.. I think i will leave hime to figure is out I have to much coding to do .. lol ...

thanks for the input .. have a grate day ..


On 1/13/06, Bill Rugolsky Jr. < brugolsky@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Fri, Jan 13, 2006 at 10:47:02AM +0000, Paul F. Johnson wrote:
> Grab a copy of gocr, compile and install (it's not in FE which is odd).
> When you scan, ensure it's at as high a resolution as possible (minimum
> in my experience of 300 dpi) and grey scaled.
>
> Use either gimp or xsane to grab the scan and tell gocr to do it's
> business.
>
> OCR is not an exact science and you will really need to sit down and go
> through the scanned text to ensure that the numbers scanned are correct
> (very easy to spot, you may have @ instead of 0, l for 1 and the such).
> Save the file generated. You may then need to either write a script to
> delimit using " " as the target or feed it into emacs and then search
> and replace " " for "," - save.

Sadly, in my (limited) experience, none of the free software solutions
such as Gocr or Clara OCR is really up to the task.  The leading
proprietary packages are vastly superior.  Some of them have free 30-day
evaluations.

With a proper setup for lots of automated training, Clara might be able
to do the job.  Especially if you do some image morphology (using, e.g.,
GIMP) to clean up the scans.  But you'll have to do some serious work.

A tried and true technique that avoids using proprietary software
is to simply pay multiple people to type the whole thing, and then
reconcile the differences (or use majority voting). :-)

Regards,

        Bill Rugolsky



--
Gregory Machin
greg@xxxxxxxxxxxxxx
gregory.machin@xxxxxxxxx
www.linuxpro.co.za
www.exponent.co.za
Web Hosting Solutions
Scalable Linux Solutions
www.iberry.info (support and admin)

+27 72 524 8096

--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
 
Thats another reason to get the best available solution packaged into extras... if its beeing widely used its probably beeing improved at a faster rate.
 
regards,
Rudolf Kastl
 

 


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux