Re: Convert PDF to Text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Verily I say unto thee, that bdk@xxxxxx spake thusly:
> I think pdftohtml is part of
> 
> poppler-utils

Got it, thanks.

However, now there's another problem - it doesn't really work.

All it produces is "empty" html files, that is - they are proper html
(head, body, etc.) but the actual content is not there.

IOW it looks like it can only work if the content of the PDF really is
text, and not a scanned image of text.

This definitely works with Evince, I just wish there was a way to
automate it with a batch script, rather than me having to copy and paste
the text out of 2000 documents.

Here's the original PDF file:

http://antitrust.slated.org/www.iowaconsumercase.org/011607/0000/PX00111.pdf

And here's a video of Evince "OCRing" the text from the image:

http://media.slated.org/albums/userpics/Evince_podit.mp4 (H264 MP4)

Download the PDF and try it yourself.

It's bizarre, surely there's a way to automate this?

TIA.

-- 
K.
http://slated.org

.----
| I found [Vista] to be a dangerously unstable operating system,
| which has caused me to lose data ... unfortunately this product
| is unfit for any user. - [H]ardOCP, <http://tinyurl.com/3bpfs2>
`----

Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.20-1.2312.fc5
 01:31:48 up 4 days, 23:03,  3 users,  load average: 0.57, 0.52, 0.54


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux