Fedora Users — Convert PDF to Text?

Convert PDF to Text?

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

To: fedora-list@xxxxxxxxxx

Subject: Convert PDF to Text?

From: "Keith G. Robertson-Turner" <fedora-gmane.00003@xxxxxxxxxxxxxxxxxxxxxxx>

Date: Sat, 21 Apr 2007 22:31:51 +0100

Organization: Slated.org

Reply-to: For users of Fedora <fedora-list@xxxxxxxxxx>

User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.0.10) Gecko/20070302 Fedora/1.5.0.10-1.fc6 pango-text Thunderbird/1.5.0.10 Mnenhy/0.7.4.666

I have some PDF documents that are photocopied text documents (embedded
image, rather than text glyphs). When I open these with Evince, I am
able to copy and paste the actual text. At first I though this was some
kind of OCR process, but then I realised it's actually the document
itself, which has the original text embedded in it (OCRed and embedded
during the original scan).

Is there any command I can use to extract the text from these PDF
documents in a batch? I have a couple of thousand documents that need
converting.

Just curious, since if Evince can obviously do it (manually) then the
necessary library components (at least) must be installed (FC6).

TIA.

-- 
K.
http://slated.org

.----
| I found [Vista] to be a dangerously unstable operating system,
| which has caused me to lose data ... unfortunately this product
| is unfit for any user. - [H]ardOCP, <http://tinyurl.com/3bpfs2>
`----

Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.20-1.2312.fc5
 22:29:39 up 4 days, 20:01,  3 users,  load average: 0.53, 0.48, 0.48