Re: [RFC] LZO de/compression support - take 6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 28 May 2007 11:47:55 Nitin Gupta wrote:
> On 5/28/07, Adrian Bunk <[email protected]> wrote:
> > On Mon, May 28, 2007 at 08:10:31PM +0530, Nitin Gupta wrote:
> > > Hi,
> > >
> > > Attached is tester code used for testing.
> > > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > > version for 'TinyLZO')
> > >
> > > Cheers,
> > > Nitin
> > >
> > > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > >> (Using tester program from Daniel)
> > >>
> > >> Following compares this kernel port ('take 6') vs original miniLZO
> > >> code:
> > >>
> > >> 'TinyLZO' refers to this kernel port.
> > >>
> > >> 10000 run averages:
> > >> 'Tiny LZO':
> > >>        Combined: 61.2223 usec
> > >>        Compression: 41.8412 usec
> > >>        Decompression: 19.3811 usec
> > >> 'miniLZO':
> > >>        Combined: 66.0444 usec
> > >>        Compression: 46.6323 usec
> > >>        Decompression: 19.4121 usec
> > >>
> > >> Result:
> > >> Overall: TinyLZO is 7.3% faster
> > >> Compressor: TinyLZO is 10.2% faster
> > >> Decompressor: TinyLZO is 0.15% faster
> >
> > So your the compressor of your version runs 10.2% faster than the
> > original version.
> >
> > That's a huge difference.
> >
> > Why exactly is it that much faster?
> >
> > cu
> > Adrian
>
> I am not sure on how to account for this _big_ perf. gain but from
> benchmarks I see that whenever I remove unncessary casting from tight
> loops I get perf. gain of 1-2%. For e.g. open coding
> LZO_CHECK_MPOS_NON_DET macro with all unnecessary casting removed,
> gave perf. gain of ~2%. Similarly, I found many other places where
> such casting was unnecessary.

This is my guess as well. Though performance will likely drop when I make the 
noinline macro mean something. (This may be offset by figuring out a way to 
make likely() and unlikely() also have a meaningful effect in userspace).

This benchmark should be run on BE machines, but I'm still trying to figure 
out a way to open-code a *fast* and *reliable* version of cpu_to_le16.

Suggestions for the above bits will always be appreciated.
 
> These changes have been tested on x86, amd64, ppc. Testing of 'take 6'
> version is yet to be done - this will confirm that I didn't
> reintroduce some error.

I don't see that 'take 6' has had too much changed from 'take 4', although I 
haven't do a full comparison.

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux