Re: LZO patch comparision

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2007-06-07 at 19:40 +0530, Nitin Gupta wrote:
> On 6/7/07, Richard Purdie <[email protected]> wrote:
> > In the following,
> > - is Nitin's patch
> > + is my version.

> > +       for (;;) {
> > -               DINDEX1(dindex, ip);
> > +               dindex = DMS((0x21 * DX3(ip,5,5,6)) >> 5, 0);
> >                 m_pos = dict[dindex];
> >
> > Probably makes sense to expand the define since its single use.
> 
> DINDEX{1,2} definition varies with LZO flavours (lzo1x, lzo1y etc.),
> so its better to leave this macro as such.

Ok, I've no real preference.

> > -                               op[-2] |= (unsigned char)t;
> > +                               op[-2] |= t;
> > The unsigned char casts are all unneeded.
> > I'll skip future cases of these issues but there are more.
> 
> 't' is size_t, so shouldn't we do this casting while doing OR b/w
> unsigned char and size_t ? I'm not sure here.

op is an unsigned char. The assignment implies the cast and it will make
no difference to the output code (I've checked).

> >
> > @@ -203,61 +165,60 @@
> >                         break;
> >         }
> >
> > -       *out_len = (size_t)(op - out);
> > -       return (size_t)(in_end - ii);
> > +       *out_len = op - out;
> > +       return in_end - ii;
> >
> > unneeded casts.
> 
> You pointed out _lots_ of such unnecessary casts. I wonder why author
> did these in first place? He aims "unlimited" backward compatibility
> but I'm not too sure if removing these castings can break code on any
> of archs that Linux supports.

The author supports any compiler on any arch with a wide range of
compiler flags. If we assume gcc and the standard kernel coding
standards/flags, the casts are not needed. All the casts I've suggested
removing are implied by the assignments made.

FWIW, my bytecode comparisons also confirm there are no bytecode changes
on the arches I made the tests on.

> > -       if (!workmem)
> > -               return -EINVAL;
> >
> > Could be confused with LZO's own error codes. Do we need this?
> 
> This is basic (good-to-have) sanity check. s/-EINVAL/LZO_E_ERROR

It should never happen and I'm not sure its worth guarding against but
LZO_E_ERROR is better.

> > +#define HAVE_IP_OR(x, ip_end, ip) ((ip_end - ip) < (size_t)(x))
> > +#define HAVE_OP_OR(x, op_end, op) ((op_end - op) < (size_t)(x))
> > +#define HAVE_LB_OR(m_pos, out, op) (m_pos < out || m_pos >= op)
> >
> > Your NEED_* defines affect flow control contra to CodingStyle, hence my
> > choice of these instead and the associated code changes which I'll skip
> > over.
> 
> 1. Why did you remove (size_t) cast in LHS? It's now like comparison
> between signed and unsigned int.

Where does the int come from? op_end, op and friends are all pointers
and will hence be unsigned...

> 2. Naming is strange. I suggest:
> HAVE_IP_OR -> CHECK_IP
> HAVE_OP_OR -> CHECK_OP
> HAVE_LB_OR -> CHECK_LB

ok.

> > -               /* copy literals */
> > -               NEED_OP(t + 3);
> > -               NEED_IP(t + 4);
> > -#ifndef UNALIGNED_OK
> > -               if (((ip | op ) & 3) == 0) {
> > -#endif
> > +               if (HAVE_OP_OR(t + 3, op_end, op))
> > +                       goto output_overrun;
> > +               if (HAVE_IP_OR(t + 4, ip_end, ip))
> > +                       goto input_overrun;
> > +
> > +#if defined(LZO_UNALIGNED_OK_4)
> >                         COPY4(op, ip);
> >                         op += 4;
> >                         ip += 4;
> >
> 
> There is no need to have separate LZO_UNALIGNED_OK_{2,4}. If unaligned
> access is allowed and is faster than byte-by-byte access, set
> UNALIGNED_OK otherwise not.

There was no real added complication in leaving them separate. Some
archs support LZO_UNALIGNED_OK_4 but not LZO_UNALIGNED_OK_2 (AVR32
according to the kernel headers). 

Anyhow, this is pending a look at what get/put_unaligned does to the
bytecode...

> > We have a fairly major difference in the code here. This is where you've
> > assumed LZO_ALIGNED_OK_4 was set and it wasn't set in any in LZO in any
> > of my tests. Assuming LZO_ALIGNED_OK_4 should be safe for the kernel and
> > the code path you've added probably performs better but why was it
> > disabled in minilzo?
> >
> > -#ifdef UNALIGNED_OK
> > +#if defined(LZO_UNALIGNED_OK_4)
> >                         if (t >= 2 * 4 - (3 - 1) && (op - m_pos) >= 4) {
> > -#else
> > -                       if (t >= 2 * 4 - (3 - 1) &&
> > -                                       (((op | m_pos) &  3) == 0)) {
> > -#endif
> >
> > Same again here.
> 
> Here, you are avoiding optimization in case op, m_pos happen to be
> (4-byte) aligned.

Yes, for the reason I described - minilzo didn't enable that
optimisation / codepath - why?

> > diff -uwr 1/lzodefs.h 2/lzodefs.h
> > --- 1/lzodefs.h 2007-06-07 09:33:34.000000000 +0100
> > +++ 2/lzodefs.h 2007-06-06 17:40:56.000000000 +0100
> >
> > -#ifndef __LZO1X_INT_H
> > -#define __LZO1X_INT_H
> >
> > Pointless since only the LZO code will include this and it will do it
> > once.
> 
> No header file should ever be without these.

Plenty of kernel header files are...

> > +/* Which machines to allow unaligned accesses on */
> > +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
> > +#define LZO_UNALIGNED_OK_2
> > +#define LZO_UNALIGNED_OK_4
> >  #endif
> 
> > Probably preferable to do this here using CONFIG options rather than in
> > the Makefile.
> 
> I think, CONFIG_* option is not present for all archs (ARM/PPC?). The
> arch is simply picked up as a compilation arg (ARCH=x) or using `uname
> -m`. So I think its better to leave this in Makefile as simple list.

Ok, this will need some further thought. Its probably not worth
discussing until the implications of get/put_unaligned have been looked
at. A lot of effort has gone into those header files and we should
probably try and use them if its not going to affect performance too
much.

> > I still need to do some further tests to ascertain whether
> > we can use get/put_unaligned without affecting performance instead.
> 
> I think, get/put_unaligned should not be used on archs where these are
> slower than simple byte-by-byte access. I suspect this is the case
> where dirty work behind unaligned access is done in s/w (though I have
> not done any benchmarks myself).
>
> > So in summary apart from style issues, the code is the same apart from
> > the alignment access issues.
> 
> Yes. I will look into these alignment issues again.

Likewise.

Cheers,

Richard


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux