Fedora Users — Re: A question on OCR for bad old document?

On Mon, Jun 7, 2010 at 6:28 AM, mike cloaked <mike.cloaked@xxxxxxxxx> wrote:
> On Sun, Jun 6, 2010 at 10:19 PM, Frank Cox <theatre@xxxxxxxxxxx> wrote:
>>
>> You can't make a silk purse out of a sow's ear.
>>
>
> Hah - well true but I had hoped after seeing the wonderful computing
> facilities on CSI TV programmes (only joking!)
>
>> If you are having difficulty reading the scan yourself, then you're
>> probably out of luck getting the computer to OCR it for you.
>>
>> Your best bet is to retype it.  It's only 19 pages so it shouldn't take
>
> I was hoping you would not say that!

The folks that bring you PAF, GEDCOM, and familysearch.org have
volunteer projects where they get interested people to read through
scans of really old census documents and the like, and extract names,
birthdates, etc. Sure, that's all handwritten. (No typewriters back
then.) But the concept is the same. The human brain is the best OCR,
but the computer can help, via scanning and various image enhancement
algorithms.

(Same thing with translation. It's in that class of hard problems that
make the known mechanical techniques explode the backtracking-stacks
before they can get to a solution, on average.)

You might want to get on the GIMP users list and ask what people who
use GIMP regularly suggest, for enhancing the scans to make them more
readable. (If you haven't already.)

Joel Rees
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines