I am forwarding this message for a co-worker; his email to "fedora-list" keeps geting bounced. Having said that, I have worked with him on this issue and will be able to answer questions/describe the issues well enough for anyone who is kind enough to reply.
What he tried to send follows below:
-------------------------------------------------------------------------------------------------------------------------
I have run into an issue with memory bandwidth using the Fedora Core 2 kernels and I need help. I don't know what is wrong, but something killed performance of my custom driver when I ported it from RedHat 7.3 to Fedora Core 2. I believe I have narrowed it down to the kernel.
My driver requires a large amount of contiguous physical memory for DMA from a PCI device. I use the 'mem=YYY' command line parameter to reserve the top of physical RAM for my driver. Then I allow mapping via mmap() calls to user space. The user space app then uses this pointer to save the data to disk.
Normally the user space app writes to disk using the mmap()'d pointer as the source. With the new kernels these writes are taking way too long (around 20 MB/s). Even when the write goes to /dev/shm, the speed is limited to around 20 MB/s. A memcpy from the mmap()'d memory seems to have no such slowdown.
This driver has been in use for some time on a RedHat 7.3 (2.4) kernel with no issues. To narrow the problem down, I removed all code that talks to the HW and created a driver that only maps host memory. The pattern I use is shown below. It is almost identical to the code in the kernel mem driver (...drivers/char/mem.c).
dev_mmap(...) { ... u32 remap_addr = num_physpages*PAGE_SIZE; // Top of memory
...
vma->vm_flags |= VM_IO; vma->vm_flags |= VM_RESERVED;
status = remap_page_range( vma, vma->vm_start; remap_addr, vma->vm_end - vma->vm_start, vma->vm_page_prot ); if( status ) return -EAGAIN;
... }
I created a test program that opens the device, calls mmap() to get a pointer, then saves 32 MB to /dev/shm and times it with the wall clock, as follows:
dev_fd = open("/dev/mydevice",O_RDWR,0); shm_fd = open("/dev/shm/foo.dat",O_O_TRUNC|O_CREAT,0666); void *devptr = mmap(0,0x2000000,PROT_READ,MAP_SHARED,dev_fd,0); msync(devptr,num_bytes,MS_SYNC|MS_INVALIDATE); double t1 = /* time in seconds using gettimeofday() */ int n = write(shm_fd,devptr,0x2000000); double t2 = /* time in seconds using gettimeofday() */
/* check for errors */
I have tried this on several platforms and kernels and the results vary, but the common denominator seems to be:
Fedora kernel + 32-bit Intel = poor performance (see below)
Processor Kernel Chipset Arch Results Opteron 2.6.5-1.358smp AMD 64-bit Pass Opteron 2.6.7-1.492smp AMD 32-bit Fail Xeon 2.6.7-1.492smp Intel E7505 32-bit Fail Xeon 2.6.6-1.435.2.3 Intel E7505 32-bit Fail Xeon 2.6.6-1.435.2.3smp Intel E7505 32-bit Fail Xeon 2.4.18-24smp Intel E7505 32-bit Pass Xeon 2.4.18-24smp Intel E7501 32-bit Pass P4 2.6.7 (kernel.org) Via ??? 32-bit Pass
Notes: * The Fails are always around 20 MB/s * When it passes, the performance depends on the chipset (e.g. 700+ MB/s) * The E7505 is hosted in an HP xw8000. * The E7501 is hosted on an Intel SE7501WV2 motherboard. * The P4 is my home PC, which is a VIA chipset - don't ask me which.
Any help is appreciated.
Thanks,
John Fusco
-------------------------------------------------------------------------------------------------------------------------
Does anyone have any ideas where to begin with this one ? And is there some other list that this question should be passed to ?
Thanks,
Jim Foris