Fedora Users — Re: Convert PDF to Text?

Re: Convert PDF to Text?

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

On Mon, Apr 23, 2007 at 06:38:01AM +0100, Keith G. Robertson-Turner wrote:
> Verily I say unto thee, that Akemi Yagi spake thusly:
> > On Sun, 22 Apr 2007 01:33:32 +0100, Keith G. Robertson-Turner wrote:
> > 
> >> All it produces is "empty" html files, that is - they are proper html
> >> (head, body, etc.) but the actual content is not there.
> >>
> >> IOW it looks like it can only work if the content of the PDF really is
> >> text, and not a scanned image of text.
> > 
> > This might be of help:
> > 
> > http://www.groklaw.net/article.php?story=20061210115516438
> 
> Thanks for the link. Looks good.
> 

I must point out that the scanned result will certainly need a fair amount
of cleanup. While tesseract is pretty good, it is far from perfect.

-- 
-------------------------------------------------------------------------------
 .----    Fred Smith   /              
( /__  ,__.   __   __ /  __   : /     
 /    /  /   /__) /  /  /__) .+'           Home: fredex@xxxxxxxxxxxxxxxxxxxxxx 
/    /  (__ (___ (__(_ (___ / :__                                 781-438-5471 
-------------------------------- Jude 1:24,25 ---------------------------------

Attachment: pgpuf3y3IJnsr.pgp
Description: PGP signature