On 06/13/10 09:29, Joel Rees wrote: > On Mon, Jun 7, 2010 at 8:25 AM, Jim <mickeyboa@xxxxxxxxxxxxx> wrote: >> On 06/06/2010 05:19 PM, Frank Cox wrote: >>> On Sun, 2010-06-06 at 22:01 +0100, mike cloaked wrote: >>> >>>> I have a scanned pdf of a very old document which was typewritten >>>> about half a century ago. The scanned copy is noisy and the letters >>>> are far from clear. The text can be made out (mostly) by eye, but it >>>> is 19 pages long and I would like to OCR it to get a digitised text to >>>> save the eye strain and lots of typing. >>>> >>> You can't make a silk purse out of a sow's ear. >>> >>> If you are having difficulty reading the scan yourself, then you're >>> probably out of luck getting the computer to OCR it for you. >>> >>> Your best bet is to retype it. It's only 19 pages so it shouldn't take >>> too long to type it again. You'll spend far more time fiddling around >>> (unsuccessfully) with OCR stuff than it will take to retype it anyway. >>> >> Scanning a Text doc is not going to Save properly in Xsane/Linux, even >> if you use "gocr" >> Scanning and "Saving Text" is broken. >> >> As far as how a text looks on your terminal after scanning, It always >> looks bad. You have to Save As" to get good finish product, and again >> "Save As" Text is broken in Xsane. only Images turn out after "Saving" > > Can you use the "copy/paste" (Select the text and Edit->Copy) pipe? > > (I suppose I should grab the current ocr downloads and give them a > try. I have to say, it seems like about four years ago, all the open > source ocr projects just stopped moving.) I'm using *tesseract* for extracting text from a tiff (scan with xsane into .tif) file containing text and get good results: yum install tesseract -- Joachim Backes <joachim.backes@xxxxxxxxxxxxxx> http://www.rhrk.uni-kl.de/~backes
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
-- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines