Fedora Users — Re: Copying text from a protected pdf file

Re: Copying text from a protected pdf file
Date Prev
Date Next
Thread Prev
Thread Next
Date Index
Thread Index
Quoting Paul Smith <phhs80@xxxxxxxxx>:

> I have got a pdf file, whose text I would like to copy to a word
> processor. However, it seems to be protected, as when I copy and paste
> a piece of text from there into a word processor, I only see garbage.
> Is there some way of getting clean text from the pdf file?

The PDF format has many ways to display text.  To be able to extract text
you need a file that stores strings and uses font information to render them
in the viewer.  You may be seeing images that were rasterized long ago.
You should provide the output of the "pdffonts" command, preferrable for a 
minimal document (a big document could combine sections that use fonts with
images).  

For example, the simplest case is a document that uses the PostScript Type 1
fonts provided by the viewer:

$ pdffonts /usr/share/doc/cups-1.1.20/ssr.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
Times-Roman                          Type 1       no  no  no       4  0
Helvetica                            Type 1       no  no  no       7  0
Helvetica-Bold                       Type 1       no  no  no       8  0
Times-Bold                           Type 1       no  no  no       5  0
Courier                              Type 1       no  no  no       3  0
Symbol                               Type 1       no  no  no       9  0
Times-Italic                         Type 1       no  no  no       6  0


-- 
George N. White III
Head of St. Margarets Bay, Nova Scotia