Sergey Vlasov wrote:
http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2
So you have 4GB RAM, and most likely some memory is remapped above the
4GB address boundary.
Uhm don't know,.. I'm running an amd64 kernel and I've always thought
there is no such boundaray.
But yes I have 4 GB.
Could you show the full dmesg output after boot?
Yes I will but you'll have to wait until monday or tuesday. I'm
currently visiting my parents and have no access to my main PC :(
Other things you can try:
- Boot with mem=3072M (or some larger value which is still less than
the amount of RAM below the 4GB boundary - the exact value could be
found from the dmesg output) and check whether you can reproduce the
corruption in this configuration.
I'll do that as soon as I'm at home.
- Look in the BIOS setup for memory remapping options (Google indicates
that it may be called "Hammer Configuration/Memory Hole Mapping" on
this board). Maybe you need to try different values (AFAIR there
were some complaints about unstabilities with software remapping;
cannot find the exact page now).
I think I have correctly set these settings up.
As far as I can remember:
The Memhole Mapping was set to Hardware.
The IOMMU is enabled and the IOMMU memory was set to 64MB (I "found
this out" because for all values less than 64MB (i.e. 32) the Linux
kernel complained.
Some other things that I remember now from my exhaustive testing:
-The error also occurred directly after a reboot (thus the file cache
was empty) when running a script that went through all my test files and
verified them with their sha512 sums.
- I once did the following,.. suddenly after diff found a difference I
Ctrl-C'ed and copied the files to another location.
In this case the files were probably used from the cache, thus the error
was really stored on disk.
I used vbindiff (hex differ) and seen that, in the differing range, not
just all bytes were different,.. but some were ok, than some were
different again,.. and so on.
So, at least in that case, it was not one whole range that was totally
wrong, but only part of the bytes.
- Another thing... perhaps this was only by chance but:
When I did sha512 sums or diffs,.. the errors were always found in the
files I copied.... not in the original files. Of course diff could not
say me that (because it doesn't tell which files are original) but
sha512sum could.
This is very strange because:
My first big tests were:
1) The original by Exact Audio Copy under Windows created files on my
PATA disc in a FAT32 partition
- compared with -
a) copies from that files to another place on the FAT32 partition
b) copies from that files to an ext3 partition on one of the SATA discs.
=> There one could imagine that the failure would be done in the copying
(which is impossible or unlikely,.. because then the differences should
be always in the same file(s).
2) I copied the files from FAT32 to ext3 again,.. and then copied the
whole stuff from ext3 to another location on the ext3 partition.
The error happens here, too.
And I think it's very strange the even for test 2 the differences seemed
to be always in the most recent copy.
Perhaps this was only fortune.
- I tried the whole thing under Windows (installed GNU diff tools there).
Copied the files and started the diff.
Until I had to abort (because my parents came to get me...) there have
been found no differences.
Anyway I'm not sure if this says so much:
First of all,.. the diff is very very very slow in Windows (many times
slower than in Linux),.. any I have all DMA/Busmaster/etc drivers
installed in Windows.
Because of this I was not able to complete at least one whole diff over
all the files, thus it would still be possible that errors have occurred.
The Windows task manager (if I interpret his data right) told me that
diff has read about 20GB of data.... which mean it would have diffed
about 10GB of the files (so only one third).
Another thing that I wonder about:
The Task Manager shows me somewhere something like System Cache: 2,1 GB
(about).
As the EAC project was the first time for me to use Windows since
Windows 95 or so.... I'm not sure what that means and if it is the same
as the Linux file cache.
If so:
Linux seems to use all "free" memory for caching files but Windows would
use only about the half of my memory.
Perhaps that could be a reason if the error would not occur under
Windows (btw: I'm going to make several complete tests in Windows Monday
or Tuesday when I'm back in Munich). Just imagin if there would be an
hardware error in that unused 2GB.
I'm not sure enough about the internal Linux memory management to tell
if that may be a reason for the error. I could imagine that Linux reads
data first into the unused (cache) sections of mainmemory and copies it
from there to the virtual memory (which is actually physical memory too)
of the diff process.
Thus if there would be an error in my memory (although memtest86+ did
not, until now, tell me of an error) it could be possible that Windows
never use that memory regions, and thus the diff under windows won't get
any corrupted data, too.
You understand what I mean?
btw: Can someone tell me if it's possible to instruct windows to use the
whole memory as file cache? (And if so how ;-) )
My further tests (as I'm currently intend to do) are:
-Severall copy/diffs under windows
-An even longer memtest86+
-Using some Knoppix or so, to see if the error is related to my
Distribution, my custom kernel or something like that.
-The kernel options Sergey suggested me to try
-Everything else some of you would suggest me :)
Thanks and best wishes,
Chris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]