A question on OCR for bad old document?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a scanned pdf of a very old document which was typewritten
about half a century ago. The scanned copy is noisy and the letters
are far from clear. The text can be made out (mostly) by eye, but it
is 19 pages long and I would like to OCR it to get a digitised text to
save the eye strain and lots of typing.

I have tried various routes to doing this, including converting the
pdf to jpg, tif and other formats after fiddling with it in GIMP to
turn it (not very well) from grey scale to monochrome with an indexed
image before trying to OCR it. I have tried GOCR, OCRAD and gscan2pdf
but all give pretty awful results with a very low success rate.

Does anyone have any guidance or a url to point me to that may help
with turning that scanned old document into something sensible as a
character file within Fedora ?

Thanks in advance for any tips.

-- 
mike c
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux