Copying netdev.
Benjamin Herrenschmidt wrote:
> Hi !
>
> I've been chasing with Segher a data corruption problem lately.
> Basically transferring huge amount of data (several Gb) and I get
> corrupted data at the rx side. I cannot tell for sure wether what I've
> been observing here is the same problem that segher's been seing on is
> blades, he will confirm or not. He also seemed to imply that reverting
> to an older kernel on the -receiver- side fixed it, which makes me
> wonder, since it's looks really like a sending side problem (see
> explanation below), if some change in, for exmaple, window scaling,
> might hide or trigger it.
Please send me lspci and tg3 probing output so that I know what
tg3 hardware you're using. I also want to look at the tcpdump or
ethereal on the mirrored port that shows the packet being corrupted.
>
> Now, first, I've been playing with ssh from /dev/zero on one machine
> to /dev/zero on the other. That allowed me to run enough
> tests all over
> the place to have some idea of where the problem comes from since ssh
> will shoke at decryption when hitting the corruption.
>
> The base setup where it happens often is 2 Quad G5's connected to a
> gigabit switch. Both were running some versions of 2.6.18-rc4 and -rc5
> (some random git actually, but see below as I've reproduced
> the problem
> with today's git snapshot which includes the TG3 tx race fix among
> others).
>
> I have reproduced with various machines as the receiver. A sungem in a
> Dual G5 and a virtual ethernet in a Power5 partition (so the
> packets go
> to an e1000 then routed through an AIX IO server to a virtual
> ethernet :) are good examples of "variety" :) I haven't tested with
> non-PowerPC machines so far. I've also never been able to
> reproduce with
> TSO disabled on the emitting TG3's
>
> Then, I've hacked tridge "socklib" test program (a simple TCP server
> that pushes a known buffer and a simple TCP receiver that
> connects to it
> and reads the data). I've added comparison of the data with what they
> are supposed to be on the receiving end. The interesting thing is that
> is much faster than ssh or whatever else I tried. ssh or rsync between
> those 2 Quad G5s give me about 35Mb/sec while I get to
> 107Mb/sec average
> with the small test program.
>
> The fun thing is, I've not been able to reproduce at all that
> way. When
> the link is pretty much saturated, the problem doesn't occur !
>
> As soon as I introduce a small delay (some crap waiting loop) in the
> sender to slow down the throughput to about 80Mb/sec, then the problem
> starts occuring every now and then (I don't have precise frquency data
> but I get a corruption every couple of gigabytes I'd say).
>
> As for my previous tests, disabling TSO on the sending side
> "fixes" it.
>
> Below is a dump of what the corruption look like. I've trimmed the
> beginning and end of the dumped packet (the receiver does 8k
> reads). The
> 0x5a are the expected data, the rest is corruption. They look like
> kernel pointers, but that isn't always the case (often though but that
> might not be relevant). The size and position within the buffer of the
> corrupted data is variable (doesn't seem to be specifically a page or
> anything nice and round like that).
>
> I've configured the switch to send all the traffic between the two
> machines to a 3rd box and then recorded it with tcpdump (the
> "spy" uses
> an e1000) and I can see the corrupted data in the recorded
> traces (the exact same pattern as detected by the receiver).
> So it seems
> very likely at this point that the corruption happens on the sending
> side. The TCP checksums are correct I assume. I don't see any error
> count on the receiving tg3 nor suspicous message in dmesg indicating
> they aren't.
>
> That's all the data I have at this point. I can't guarantee 100% that
> it's a TSO bug (it might be a bug that TSO renders visible
> due to timing
> effects) but it looks like it since I've not reproduced yet with TSO
> disabled. I'll do an overnight test to confirm that though...
> sometimes
> the bug can take it's time to show up ... I've seen it wait
> 20Gb before
> it kicked in. Also the fact that fully loading the machine never
> produced it is strange.... smells like a race.
>
> Cheers,
> Ben.
>
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 00 00 00 00 00 00 00 00
> 2f 63 70 75 73 00 7f 7e c0 00 00 00 01 cb 70 82
> 00 00 00 04 bf 1d db 4d c0 00 00 00 01 cb 92 00
> c0 00 00 01 7b fe 6d 98 c0 00 00 00 01 cb 70 91
> 00 00 00 04 df 5d fe fd c0 00 00 00 01 cb 92 10
> c0 00 00 01 7b fe 6d b8 c0 00 00 00 01 cb 71 0e
> 00 00 00 04 fe e2 fb cf c0 00 00 00 01 cb 92 20
> c0 00 00 01 7b fe 6d d8 c0 00 00 00 01 cb 71 1f
> 00 00 00 04 73 69 ed ff c0 00 00 00 01 cb 92 30
> c0 00 00 01 7b fe 6d f8 c0 00 00 00 01 cb 70 04
> 00 00 00 04 b9 fe cf ff c0 00 00 00 01 cb 92 40
> c0 00 00 01 7b fe 6e 18 c0 00 00 00 00 3f 7b c8
> 00 00 00 05 ff df b9 bc c0 00 00 01 7b fe 6e 38
> 00 00 00 00 00 00 00 00 63 70 75 73 00 8d f1 ce
> c0 00 00 01 7b fe 73 60 c0 00 00 00 01 cb 92 64
> ff 89 d6 80 ff 89 d6 80 00 00 00 01 00 00 00 00
> c0 00 00 01 7b fe 68 00 c0 00 00 01 7b fe 6e c8
> c0 00 00 01 7b fe 6e e0 00 00 00 00 00 00 00 00
> c0 00 00 01 7b fe 6c e8 c0 00 00 01 7b fe 73 70
> c0 00 00 01 7b fe 75 48 c0 00 00 01 7b fe 73 70
> c0 00 00 01 7b fe 73 70 c0 00 00 01 7b fa 9e 80
> 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 2f 63 70 75 73 2f 50 6f
> 77 65 72 50 43 2c 47 35 40 30 00 7e 5e 6f 4d ef
> c0 00 00 00 01 cb 70 6c 00 00 00 04 7f ee b7 fe
> c0 00 00 00 01 cb 92 64 c0 00 00 01 7b fe 6f 00
> c0 00 00 00 01 cb 71 35 00 00 00 04 bb cc e9 67
> c0 00 00 00 01 cb 92 74 c0 00 00 01 7b fe 6f 20
> c0 00 00 00 01 cb 71 39 00 00 00 04 2f fc eb b9
> c0 00 00 00 01 cb 92 84 c0 00 00 01 7b fe 6f 40
> c0 00 00 00 01 cb 71 3e 00 00 00 04 e7 5f be de
> c0 00 00 00 01 cb 92 94 c0 00 00 01 7b fe 6f 60
> c0 00 00 00 01 cb 71 49 00 00 00 04 e6 73 e7 a7
> c0 00 00 00 01 cb 92 a4 c0 00 00 01 7b fe 6f 80
> c0 00 00 00 01 cb 71 55 00 00 00 08 1b fb 77 f9
> c0 00 00 00 01 cb 92 b4 c0 00 00 01 7b fe 6f a0
> c0 00 00 00 01 cb 70 9d 00 00 00 04 b6 db 59 ef
> c0 00 00 00 01 cb 92 c8 c0 00 00 01 7b fe 6f c0
> c0 00 00 00 01 cb 71 5b 00 00 00 04 69 6f fc da
> c0 00 00 00 01 cb 92 d8 c0 00 00 01 7b fe 6f e0
> c0 00 00 00 01 cb 71 69 00 00 00 04 d6 7b 66 de
> c0 00 00 00 01 cb 92 e8 c0 00 00 01 7b fe 70 00
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]