Quoting Paul Smith <phhs80@xxxxxxxxx>: > I have got a pdf file, whose text I would like to copy to a word > processor. However, it seems to be protected, as when I copy and paste > a piece of text from there into a word processor, I only see garbage. > Is there some way of getting clean text from the pdf file? The PDF format has many ways to display text. To be able to extract text you need a file that stores strings and uses font information to render them in the viewer. You may be seeing images that were rasterized long ago. You should provide the output of the "pdffonts" command, preferrable for a minimal document (a big document could combine sections that use fonts with images). For example, the simplest case is a document that uses the PostScript Type 1 fonts provided by the viewer: $ pdffonts /usr/share/doc/cups-1.1.20/ssr.pdf name type emb sub uni object ID ------------------------------------ ------------ --- --- --- --------- Times-Roman Type 1 no no no 4 0 Helvetica Type 1 no no no 7 0 Helvetica-Bold Type 1 no no no 8 0 Times-Bold Type 1 no no no 5 0 Courier Type 1 no no no 3 0 Symbol Type 1 no no no 9 0 Times-Italic Type 1 no no no 6 0 -- George N. White III Head of St. Margarets Bay, Nova Scotia