On Monday 28 May 2007 16:18:40 Daniel Hazelton wrote: > On Monday 28 May 2007 13:11:15 Adrian Bunk wrote: > > On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote: > > > On 5/28/07, Adrian Bunk <[email protected]> wrote: > > >... > > > > > >> - then ensure that it works correctly on all architectures and > > > > > > Already tested on x86, amd64, ppc (by Bret). I do not have machines > > > from other archs available. Bret tested 'take 3' version but no > > > changes were introduced in further revisions that could affect > > > correctness - but still it will be good to have this version tested > > > too. Only with inclusion in -mm and testing by much wider user base > > > can make it to mainline (I suppose nobody uses -mm for production use > > > anyway). > > > > > >> document why your version is that much faster than the original > > >> version and why you know your optimizations have no side effects > > With likely(), unlikely() and noinline *not* defined as NOP's performance > drops: > > 10000 run averages: > 'Tiny LZO': > Combined: 84.9292 usec > Compression: 42.4646 usec > Decompression: 42.4646 usec > 'miniLZO': > Combined: 61.3548 usec > Compression: 43.5648 usec > Decompression: 17.79 usec > > However, I'm worried that my testbed code - likely the Perl script that > actually loops the test code and collects its output - is somehow faulting, > as the way that the Compression and Decompression code have the exact same > value. > > I'm going to toss some debugging output in the script and see if I can spot > the problem. > Okay, checked my code and it was all a problem with cpu_to_le16 getting defined fully instead of just being a NOP. With that problem corrected the performance returns: 10000 run averages: 'Tiny LZO': Combined: 60.1028 usec Compression: 42.0652 usec Decompression: 18.0376 usec 'miniLZO': Combined: 61.0932 usec Compression: 43.4382 usec Decompression: 17.655 usec Combined average shows 'Tiny' to be 1.6% faster Compression in 'Tiny' is 3.2% faster Decompression in 'Tiny' is 2.2% slower All in all, the trade-off in this code, with the overall performance of the 'Tiny' code being faster than the stock miniLZO code I'd like to say that I'm certain that the decompressor could probably be sped up more, although I don't know of a place in the kernel where less than half a microsecond would make a massive impact. (Only place I can think of where this might have a negative is in SLAB/SLOB/SLUB, and I don't think that a low-level memory manager like those is a place for a compression algorithm anyway) Later today or tommorrow I'll start putting together another part to this "test-bed" for stress-testing the algorithm. Currently I only have a few ideas for tests: (for benchmarking) 1) Random input to compressor 2) large input data 3) real-world data (mmap()'d chunk of /dev/mem as input) (for stability checking) 4) deliberate corruption of compressed data 5) early finish (realloc() compressed data buffer to less than the full size of the data) 6) late start (change pointer to start of compressed data so the decompressor doesn't start at the beginning) When I have a better idea of how the LZO algorithm works I might try creating an input data-set for the decompressor that would deliberately overflow the output buffer. This is supposed to be caught by bounds-checking in the '_safe' version of the decompressor (the version used in 'Tiny' and used in the benchmarking of miniLZO), however it might be possible to trick the decompressor. (If I do manage this and it crashes both 'Tiny' and 'miniLZO' I *will* report it to Herr Oberhumer as well as see if I can come up with a quick patch for the problem). As I said before, any suggestions for how to improve the benchmark code are welcome, as are suggestions for tests I can add to it for stressing the new 'TinyLZO' implementation. Latest version of the test-bed attached. Significant changes include: likely(), unlikely() and noinline are no longer NOP's when the marked line in helpers.h is commented out, cpu_to_le16 functions as expected and is no longer a NOP. (Yes, that means this code is now fully capable of working correctly on a BE system). DRH
- Prev by Date: Re: [RFC][PATCH -mm 3/3] PM: Disable _request_firmware before hibernation/suspend
- Next by Date: Re: stuff ready to be deleted?
- Previous by thread: Re: [RFC] LZO de/compression support - take 6
- Next by thread: Re: [RFC] LZO de/compression support - take 6