Fedora Users — Re: Copying text from a protected pdf file

Re: Copying text from a protected pdf file

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

To: For users of Fedora Core releases <fedora-list@xxxxxxxxxx>

Subject: Re: Copying text from a protected pdf file

From: Paul Smith <phhs80@xxxxxxxxx>

Date: Wed, 21 Sep 2005 10:35:00 +0100

Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Xlj8TBW3IILyc8mos9PTFdavVd7ZOQYDiIndjsIqDBwrOivQwfYIOcAgVh91TxZkfOk1j7FJ1zma98TtVsuzngomCy8RP5UokfhBLkq6yeh9Z+VzBk89ntGAw3gQX878y0EFIffbXJqtjQ3HQWMtxpTMsrQPppe0jwkageQLZx0=

In-reply-to: <[email protected]>

List-help: <mailto:[email protected]?subject=help>

List-id: For users of Fedora Core releases <fedora-list.redhat.com>

List-post: <mailto:[email protected]>

List-subscribe: <https://www.redhat.com/mailman/listinfo/fedora-list>, <mailto:[email protected]?subject=subscribe>

List-unsubscribe: <https://www.redhat.com/mailman/listinfo/fedora-list>, <mailto:[email protected]?subject=unsubscribe>

References: <[email protected]> <[email protected]> <[email protected]>

Reply-to: For users of Fedora Core releases <fedora-list@xxxxxxxxxx>

On 9/17/05, Paul Smith <phhs80@xxxxxxxxx> wrote:
> > > I have got a pdf file, whose text I would like to copy to a word
> > > processor. However, it seems to be protected, as when I copy and paste
> > > a piece of text from there into a word processor, I only see garbage.
> > > Is there some way of getting clean text from the pdf file?
> >
> > The PDF format has many ways to display text.  To be able to extract text
> > you need a file that stores strings and uses font information to render them
> > in the viewer.  You may be seeing images that were rasterized long ago.
> > You should provide the output of the "pdffonts" command, preferrable for a
> > minimal document (a big document could combine sections that use fonts with
> > images).
> >
> > For example, the simplest case is a document that uses the PostScript Type 1
> > fonts provided by the viewer:
> >
> > $ pdffonts /usr/share/doc/cups-1.1.20/ssr.pdf
> > name                                 type         emb sub uni object ID
> > ------------------------------------ ------------ --- --- --- ---------
> > Times-Roman                          Type 1       no  no  no       4  0
> > Helvetica                            Type 1       no  no  no       7  0
> > Helvetica-Bold                       Type 1       no  no  no       8  0
> > Times-Bold                           Type 1       no  no  no       5  0
> > Courier                              Type 1       no  no  no       3  0
> > Symbol                               Type 1       no  no  no       9  0
> > Times-Italic                         Type 1       no  no  no       6  0
> 
> Thanks, George. In my case,
> 
> $ pdffonts myfile.pdf
> name                                 type         emb sub uni object ID
> ------------------------------------ ------------ --- --- --- ---------
> DTUUBE+TTBC19E318t00                 TrueType     yes yes no      13  0
> URMVBE+TTBC18C910t00                 TrueType     yes yes no      16  0
> TOYVBE+Symbol                        Type 1C      yes yes no      19  0
> Helvetica                            Type 1C      yes no  no      22  0
> CLLUBE+TTBC1802E0t00                 TrueType     yes yes no      34  0
> Helvetica-Bold                       Type 1C      yes no  no      43  0
> Helvetica-Oblique                    Type 1C      yes no  no      58  0
> $

Is it possible to find the missing fonts to install them? 

Paul