Re: ps to pdf and then to text editor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/9/06, George N. White III <aa056@xxxxxxxxxxxxxx> wrote:
> > I print to a file, file.ps, a web-page with text. Then, I apply ps2pdf
> > and I get file.pdf. However, I cannot copy (from file.pdf) the text to
> > a text editor. Can one get a pdf file with copyable text?
>
> Does this work with a really trivial web page?
>
> What does "pdffonts file.pdf" show?
>
> If the pdf file uses strings, then you stand a better chance of being able
> to cut and paste from a pdf viewer to the editor, but you may run into
> encoding issues, so the pasted text is gibberish.
>
> I get:
>
> $ cat t.html
> abc
>
> Print to ps from Firefox, convert to pdf, load in Adobe Reader, and
> cut and paste gives: "^Y^Z^[", so the encoding is a problem.  Xpdf
> would not let me copy the text.  The t.html.ps file has:
>
> 8 dict begin
> /FontName /Nimbus_Roman_No9_L.Regular.0.0.Set0 def
> /FontType 1 def
> /FontMatrix [ 0.001 0 0 0.001 0 0 ]readonly def
> /PaintType 0 def
> /FontBBox [-168 -281 1031 1098]readonly def
> /Encoding [
> /.notdef
> /uni0066/uni0069/uni006C/uni0065/uni003A/uni002F/uni0068/uni006F
> /uni006D/uni0067/uni0077/uni0074/uni0057/uni0073/uni002E/uni0031
> /uni0020/uni0030/uni0034/uni0039/uni0032/uni0036/uni0041/uni004D
> /uni0061/uni0062/uni0063/
>
> This is the 'abc' --> '^Y^Z^[' encoding.
>
> $ pdffonts t.html.pdf
> name                         type         emb sub uni object ID
> ---------------------------- ------------ --- --- --- ---------
> YNAHAD+Nimbus_Roman_No9_L.Regular.0.0.Set0
>                               Type 1C      yes yes no  9 0
>
> If the pdf file uses images, you need to use an OCR tool to get the text.
> I have seen cases where printing docs to PS on Win32 results in the
> text being rasterized in the driver so the PS file has images.  This may
> happen with screen fonts and/or certain effects (transparency, text
> outlines filled with colored patterns).

Thanks, George and Mike. After pstill, I get

$ pdffonts file.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
Nimbus_Roman_No9_L.Regular.0.0.Set0  Type 1       yes no  no      33  0
Verdana.Bold.0.0.Set0                Type 1       yes no  no      37  0
Verdana.Regular.0.0.Set0             Type 1       yes no  no      41  0
Lucida_Sans.Regular.0.0.Set0         Type 1       yes no  no      45  0
Arial.Regular.0.0.Set0               Type 1       yes no  no      49  0
Arial.Bold.0.0.Set0                  Type 1       yes no  no      53  0
Verdana.Italic.0.0.Set0              Type 1       yes no  no      57  0
[1]-  Done                    acroread anselmo.pdf
[2]+  Done                    kwrite
$

After ps2pdf, I get

$ pdffonts file.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
EOZSTF+Verdana.Regular.0.0.Set0      Type 1C      yes yes no      13  0
MQEXGW+Arial.Regular.0.0.Set0        Type 1C      yes yes no      19  0
DMCZLT+Lucida_Sans.Regular.0.0.Set0  Type 1C      yes yes no      17  0
YTBXNU+Nimbus_Roman_No9_L.Regular.0.0.Set0 Type 1C      yes yes no       8  0
GBGOAU+Verdana.Bold.0.0.Set0         Type 1C      yes yes no      10  0
GMTXSU+Arial.Bold.0.0.Set0           Type 1C      yes yes no      23  0
AJKQFS+Verdana.Italic.0.0.Set0       Type 1C      yes yes no      26  0
$

Paul


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux