On Sat, 22 Dec 2007 23:30:56 PST, Andrew Morton said: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/ I've bisected it down this far: kvm-ist-kaput.patch GOOD git-lblnet.patch git-lblnet-fixup.patch git-leds.patch git-libata-all.patch git-libata-all-fix-pata_winbond-borkage.patch git-libata-all-wtf.patch BAD and somehow, I doubt the leds or libata trees horked up networking. ;) Symptoms - semi-sporadic failures in making network connections. The test case that tripped it up was the 'make test' from the Tcl 8.5 - several of the test cases will create a listening socket, and then try to connect to it. Under 2.6.24-rc5-mm1, it works just fine, but I'm seeing hangs under -rc6-mm1. Doing a 'netstat -n -a -A inet -p' while it's hung shows me this: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:34118 0.0.0.0:* LISTEN 2236/tcltest tcp 0 1 127.0.0.1:59460 127.0.0.1:34118 SYN_SENT 2236/tcltest Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest Pretty consistent failure mode - a socket is in 'listen', and the connection gets hung in 'SYN_SENT'. There's 3 outputs listed - the first one from one run of the test case, the second 2 are some 20 seconds apart on the same run. It's pretty obvious that if you can't complete a 3-packet handshake to loopback in 20 seconds, something is hosed. However, it's apparently some sort of race/timing issue, as many *other* test cases in the Tcl test tree do in fact work OK. I already checked, it's not a slam-dunk to just 'patch -R' as there's 3 or 4 conflicts where later patches need massaging/reverting as well. It's a problem with both 'classic RCU' and 'preempt RCU' (that was my *first* guess as to the cause). Any clues/hints/advice/patches?
Attachment:
pgp8YmnqyLzTG.pgp
Description: PGP signature
- Follow-Ups:
- Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage
- From: James Morris <[email protected]>
- Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage
- References:
- 2.6.24-rc6-mm1
- From: Andrew Morton <[email protected]>
- 2.6.24-rc6-mm1
- Prev by Date: Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM
- Next by Date: Re: HSM violation errors
- Previous by thread: Re: 2.6.24-rc6-mm1
- Next by thread: Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage
- Index(es):