Fedora Users — Re: Convert PDF to Text?

On 21/04/07, Keith G. Robertson-Turner
<fedora-gmane.00003@xxxxxxxxxxxxxxxxxxxxxxx> wrote:

I have some PDF documents that are photocopied text documents (embedded
image, rather than text glyphs). When I open these with Evince, I am
able to copy and paste the actual text. At first I though this was some
kind of OCR process, but then I realised it's actually the document
itself, which has the original text embedded in it (OCRed and embedded
during the original scan).

Is there any command I can use to extract the text from these PDF
documents in a batch? I have a couple of thousand documents that need
converting.


Have you looked at pdftk? "If PDF is electronic paper, then pdftk is
an electronic staple-remover, hole-punch, binder, secret-decoder-ring,
and X-Ray-glasses. Pdftk is a command-line tool for doing everyday
things with PDF documents."

http://www.accesspdf.com/pdftk/