On Mon, Apr 23, 2007 at 06:38:01AM +0100, Keith G. Robertson-Turner wrote: > Verily I say unto thee, that Akemi Yagi spake thusly: > > On Sun, 22 Apr 2007 01:33:32 +0100, Keith G. Robertson-Turner wrote: > > > >> All it produces is "empty" html files, that is - they are proper html > >> (head, body, etc.) but the actual content is not there. > >> > >> IOW it looks like it can only work if the content of the PDF really is > >> text, and not a scanned image of text. > > > > This might be of help: > > > > http://www.groklaw.net/article.php?story=20061210115516438 > > Thanks for the link. Looks good. > I must point out that the scanned result will certainly need a fair amount of cleanup. While tesseract is pretty good, it is far from perfect. -- ------------------------------------------------------------------------------- .---- Fred Smith / ( /__ ,__. __ __ / __ : / / / / /__) / / /__) .+' Home: fredex@xxxxxxxxxxxxxxxxxxxxxx / / (__ (___ (__(_ (___ / :__ 781-438-5471 -------------------------------- Jude 1:24,25 ---------------------------------
Attachment:
pgpuf3y3IJnsr.pgp
Description: PGP signature