Re: Convert PDF to Text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/04/07, Keith G. Robertson-Turner
<fedora-gmane.00003@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
I have some PDF documents that are photocopied text documents (embedded
image, rather than text glyphs). When I open these with Evince, I am
able to copy and paste the actual text. At first I though this was some
kind of OCR process, but then I realised it's actually the document
itself, which has the original text embedded in it (OCRed and embedded
during the original scan).

Is there any command I can use to extract the text from these PDF
documents in a batch? I have a couple of thousand documents that need
converting.

Have you looked at pdftk? "If PDF is electronic paper, then pdftk is
an electronic staple-remover, hole-punch, binder, secret-decoder-ring,
and X-Ray-glasses. Pdftk is a command-line tool for doing everyday
things with PDF documents."

http://www.accesspdf.com/pdftk/


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux