On Sunday 2007-04-22 00:31:51 Keith G. Robertson-Turner wrote: > I have some PDF documents that are photocopied text documents (embedded > image, rather than text glyphs). When I open these with Evince, I am > able to copy and paste the actual text. At first I though this was some > kind of OCR process, but then I realised it's actually the document > itself, which has the original text embedded in it (OCRed and embedded > during the original scan). > > Is there any command I can use to extract the text from these PDF > documents in a batch? I have a couple of thousand documents that need > converting. > > Just curious, since if Evince can obviously do it (manually) then the > necessary library components (at least) must be installed (FC6). > kwrite from koffice can read and edit .pdf files (quite well), so you should be able to save it as plain text. I guess that with dcop you can make a script to do this with multiple files for you. -- Regards, Doncho