On Fri, Jan 13, 2006 at 10:47:02AM +0000, Paul F. Johnson wrote: > Grab a copy of gocr, compile and install (it's not in FE which is odd). > When you scan, ensure it's at as high a resolution as possible (minimum > in my experience of 300 dpi) and grey scaled. > > Use either gimp or xsane to grab the scan and tell gocr to do it's > business. > > OCR is not an exact science and you will really need to sit down and go > through the scanned text to ensure that the numbers scanned are correct > (very easy to spot, you may have @ instead of 0, l for 1 and the such). > Save the file generated. You may then need to either write a script to > delimit using " " as the target or feed it into emacs and then search > and replace " " for "," - save. Sadly, in my (limited) experience, none of the free software solutions such as Gocr or Clara OCR is really up to the task. The leading proprietary packages are vastly superior. Some of them have free 30-day evaluations. With a proper setup for lots of automated training, Clara might be able to do the job. Especially if you do some image morphology (using, e.g., GIMP) to clean up the scans. But you'll have to do some serious work. A tried and true technique that avoids using proprietary software is to simply pay multiple people to type the whole thing, and then reconcile the differences (or use majority voting). :-) Regards, Bill Rugolsky