Re: Convert PDF to Text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Verily I say unto thee, that linuxmaillists@xxxxxxxxxxx spake thusly:
> On Monday 23 April 2007, Keith G. Robertson-Turner wrote:
>> Verily I say unto thee, that linuxmaillists@xxxxxxxxxxx spake
>> thusly:
>>> On Sunday 22 April 2007, linuxmaillists@xxxxxxxxxxx wrote:

>>>> Kword won't even open the file because it does not recognize or
>>>> support the format.

>>> I will cover my own post. It has to be imported and then it works
>>> very nicely. Very cool!

>> I can't even import PDF.

> Did you go to
> 
> File > Open
> 
> or
> 
> File > Import

Both. No joy. It doesn't even see the filename, unless I rename it from
*.pdf to *.odt.

>> Is there some additional dependencies I should install?
>> 
>> Using KWord 1.6.2 on FC6.

> Group        : Applications/Productivity Source       :
> koffice-1.6.2-3.fc6.1.src.rpm Build Time   : Sun Mar  4 19:56:25 2007
> Install Time : Sun Apr 22 14:33:32 2007
> 
> I installed every thing except the devel package and the language
> packages.
> 
> It only gets text and completely ignores images.
> 
> The Evince package works even better. I only know how to start it
> from the shell. I don't have an icon for it that I can find. Running
> KDE

This is primarily a Gnome system, but AFAIK I have most of the KDE
support libs installed, since I use things like Amarok, K3B, etc.

I have a local YAM mirror of several repos, which (amongst other things)
makes searching for packages a lot easier. I've searched for everything
I can find with "*[Pp][Dd][Ff]*" in the name, but AFAICT I already have
everything available. Then again, I had no idea about Poppler (does not
contain the PDF acronym at all) until someone pointed it out to me.

I suppose I could try:

files=$(find /var/yam -type f -name "*.rpm")
for File in files
   do
      rpm -qp --qf "%{description}\n" $File | grep -q "[Pp][Dd][Ff]"
      if [ $? = 0 ]
         then
            echo $File >>PDF-stuff.txt
      fi
   done

Unless there's an easier way.

This is frustrating. Presumably Evince and KWord are both using some PDF
library that enables parsing out the text from these documents. If the
capability is already in the system, there must be a way to access it
from a script, even if it's only a wrapper to a binary. I can see me
downloading the sources just to discover what it's doing here.

-- 
K.
http://slated.org

.----
| I found [Vista] to be a dangerously unstable operating system,
| which has caused me to lose data ... unfortunately this product
| is unfit for any user. - [H]ardOCP, <http://tinyurl.com/3bpfs2>
`----

Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.20-1.2312.fc5
 22:33:21 up 6 days, 20:05,  2 users,  load average: 0.36, 0.29, 0.25


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux