Re: Copying text from a protected pdf file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Paul Smith <phhs80@xxxxxxxxx>:

> I have got a pdf file, whose text I would like to copy to a word
> processor. However, it seems to be protected, as when I copy and paste
> a piece of text from there into a word processor, I only see garbage.
> Is there some way of getting clean text from the pdf file?

The PDF format has many ways to display text.  To be able to extract text
you need a file that stores strings and uses font information to render them
in the viewer.  You may be seeing images that were rasterized long ago.
You should provide the output of the "pdffonts" command, preferrable for a 
minimal document (a big document could combine sections that use fonts with
images).  

For example, the simplest case is a document that uses the PostScript Type 1
fonts provided by the viewer:

$ pdffonts /usr/share/doc/cups-1.1.20/ssr.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
Times-Roman                          Type 1       no  no  no       4  0
Helvetica                            Type 1       no  no  no       7  0
Helvetica-Bold                       Type 1       no  no  no       8  0
Times-Bold                           Type 1       no  no  no       5  0
Courier                              Type 1       no  no  no       3  0
Symbol                               Type 1       no  no  no       9  0
Times-Italic                         Type 1       no  no  no       6  0


-- 
George N. White III
Head of St. Margarets Bay, Nova Scotia


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux