Convert PDF to Text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have some PDF documents that are photocopied text documents (embedded
image, rather than text glyphs). When I open these with Evince, I am
able to copy and paste the actual text. At first I though this was some
kind of OCR process, but then I realised it's actually the document
itself, which has the original text embedded in it (OCRed and embedded
during the original scan).

Is there any command I can use to extract the text from these PDF
documents in a batch? I have a couple of thousand documents that need
converting.

Just curious, since if Evince can obviously do it (manually) then the
necessary library components (at least) must be installed (FC6).

TIA.

-- 
K.
http://slated.org

.----
| I found [Vista] to be a dangerously unstable operating system,
| which has caused me to lose data ... unfortunately this product
| is unfit for any user. - [H]ardOCP, <http://tinyurl.com/3bpfs2>
`----

Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.20-1.2312.fc5
 22:29:39 up 4 days, 20:01,  3 users,  load average: 0.53, 0.48, 0.48


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux