Hi,
On Tue, 30 Aug 2005, Knut Petersen wrote:
> > Probably you can make it even faster by avoiding the multiplication, like
> >
> > unsigned int offset = 0;
> > for (i = 0; i < image.height; i++) {
> > dst[offset] = src[i];
> > offset += pitch;
> > }
> >
>
> More than two decades ago I learned to avoid mul and imul. Use shifts, add and
> lea instead,
> that was the credo those days. The name of the game was CP/M 80/86, a86, d86
> and ddt ;-)
>
> But let´s get serious again.
>
> Your proposed change of the patch results in a 21 ms performance decrease on
> my system.
> Yes, I do know that this is hard to believe. I tested a similar variation
> before, and the results
> were even worse.
Could you try the patch below, for a few extra cycles you might want to
make it an inline function.
bye, Roman
Index: linux-2.6/drivers/video/fbmem.c
===================================================================
--- linux-2.6.orig/drivers/video/fbmem.c 2005-08-30 01:55:18.000000000 +0200
+++ linux-2.6/drivers/video/fbmem.c 2005-08-30 21:10:25.705462837 +0200
@@ -82,11 +82,11 @@ void fb_pad_aligned_buffer(u8 *dst, u32
{
int i, j;
+ d_pitch -= s_pitch;
for (i = height; i--; ) {
/* s_pitch is a few bytes at the most, memcpy is suboptimal */
for (j = 0; j < s_pitch; j++)
- dst[j] = src[j];
- src += s_pitch;
+ *dst++ = *src++;
dst += d_pitch;
}
}
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|