Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Dec 2006, Chris Wedgwood wrote:

> > Any ideas why iommu=disabled in the bios does not solve the issue?
> 
> The kernel will still use the IOMMU if the BIOS doesn't set it up if
> it can, check your dmesg for IOMMU strings, there might be something
> printed to this effect.

FWIW: As far as I understand the linux kernel code (I am no kernel 
developer so please correct me if I am wrong) the PCI dma mapping code is 
abstracted by struct dma_mapping_ops. I.e. there are currently four 
possible implementations for x86_64 (see linux-2.6/arch/x86_64/kernel/)

1. pci-nommu.c : no IOMMU at all (e.g. because you have < 4 GB memory)
   Kernel boot message: "PCI-DMA: Disabling IOMMU."

2. pci-gart.c : (AMD) Hardware-IOMMU.
   Kernel boot message: "PCI-DMA: using GART IOMMU" (this message
   first appeared in 2.6.16)

3. pci-swiotlb.c : Software-IOMMU (used e.g. if there is no hw iommu)
   Kernel boot message: "PCI-DMA: Using software bounce buffering 
   for IO (SWIOTLB)"

4. pci-calgary.c : Calgary HW-IOMMU from IBM; used in pSeries servers. 
   This HW-IOMMU supports dma address mapping with memory proctection,
   etc.
   Kernel boot message: "PCI-DMA: Using Calgary IOMMU" (since 2.6.18!)

What all this means is that you can use "dmesg|grep ^PCI-DMA:" to see 
which implementation your kernel is currently using.

As far as our problem machines are concerned the "PCI-DMA: using GART 
IOMMU" case is broken (data corruption). But both "PCI-DMA: Disabling 
IOMMU" (trigged with mem=2g) and "PCI-DMA: Using software bounce buffering 
for IO (SWIOTLB)" (triggered with iommu=soft) are stable.

BTW: It would be really great if this area of the kernel would get some 
more and better documentation. The information at 
linux-2.6/Documentation/x86_64/boot_options.txt is very terse. I had to 
read the code to get a *rough* idea what all the "iommu=" options 
actually do and how they interact.
 
> > 1) And does this now mean that there's an error in the hardware
> > (chipset or CPU/memcontroller)?
> 
> My guess is it's a kernel bug, I don't know for certain.  Perhaps we
> shaould start making a more comprehensive list of affected kernels &
> CPUs?

BTW: Did someone already open an official bug at 
http://bugzilla.kernel.org ?

Best regards,
Karsten

-- 
__________________________________________creating IT solutions
Dipl.-Inf. Karsten Weiss               science + computing ag
phone:    +49 7071 9457 452            Hagellocher Weg 73
teamline: +49 7071 9457 681            72070 Tuebingen, Germany
email:    [email protected] www.science-computing.de

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux