On 3/10/08, François Patte <francois.patte@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > bonsoir, > > I am trying to convert a pdf file into html using pdftohtml provided by f8. > > I get an html file with "nice" characters like: ’ insead of apostroph, > or Ã(c) instead of é... > > so i think that there is some coding problem. > > Using man pdftohtml, I got this info: > - -enc <string> > ~ output text encoding name > > > but, I am unable to guess what is the syntax to use in order to have a > correct output in utf8 for: > > Error: Couldn't find unicodeMap file for the 'utf8' encoding > > is the only answer I get if I try: > > pdftohtml -enc utf8 myfile.pdf > > > i tried utf-8, latin1, latin-1, ISO_8859-1, .... without any success. > > > If somebody knows... many thnaks in advance. I don't, but man pdftohtml -> Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is based and benefits a lot from Derek Noonburg?s xpdf package. man xpdf -> -enc encoding-name Sets the encoding to use for text output. The encoding-name must be defined with the unicodeMap command (see xpdfrc(5)). This defaults to "Latin1" (which is a built-in encoding). [con- fig file: textEncoding] man xpdfrc -> unicodeMap encoding-name map-file [...] The Latin1, ASCII7, Symbol, ZapfDingbats, UTF-8, and UCS-2 encodings are predefined. I'm afraid you'll have to read the elided part if you need an encoding other than these six. Hope this helps, Andras