Re: Problem with 2.6 kernel and lots of I/O

But the problem doesn't occur with the "local" end, it's with the"recieving" end (which may be the same thing, but mostly it's not, since Itend to reboot the secondary node more).

The problem occurs on the node running `nbd-server' in userspace and notnessicarily having "nbd" support.

"nbd1" is a remote nbd device to the secondary server, which then becomeshighly unusable. I'm not sure what you're attempting to convey to me, asthe server that is running raid1_resync (reading from nbd0, whichcooresponds with a local nbd-client binding) is perfectly usable in theexample I gave, but the remote node is not...



	Roy Keene
	Planning Systems Inc.

On Mon, 20 Jun 2005, Kyle Moffett wrote:

On Jun 20, 2005, at 18:19:19, Roy Keene wrote:

On Mon, 6 Jun 2005, Kyle Moffett wrote:

IIRC, because of the way the loopback delivers packets from the
same context as they are sent, it is possible (and quite easy)
to either deadlock or peg the CPU and make everything hang and
be unuseable.  DRBD likewise used to have problems with testing
over the loopback until they added a special configuration
option to be extra careful and yield CPU.


Actually, the problem I have isn't specific to the using it over
the local device.  Quite often I have the problem where the
secondary node goes down and comes back up after some time and
needs to be resyncd.  This is done on the master (raid1_resync) by
hot-removing /dev/nbd1 and then hot-adding it back.


No, see, when you hot-add /dev/nbd1, the kernel md resync thread
begins processing the resync.  The resync operation on two nbds
involves:
 1) Send data request packet from nbd0
 2) Wait for response
 3) Send data packet to nbd1
 4) Wait for response
 5) Repeat until done

On a normal net device, the "Send data request packet" causes the
system to drop the packet on the wire and go away to do other stuff
for a while, whereas on the loopback, it can schedule immediately
to the process receiving the packet, which is the kernel itself.
The kernel then processes the packet and returns the result, over
the loopback.  It then sends the response to the other server over
a real net connection.  During most of this time, the kernel is
taking big locks and turning interrupts off and on and such, causing
massive hangs until resync finishes.  Since you mentioned bad write
performance with your RAID controller, I suspect its driver may also
turn off interrupts, take excessive locks, or do other madness,
further worsening system responsiveness.

Cheers,
Kyle Moffett

--

There are two ways of constructing a software design. One way is to make itso simple that there are obviously no deficiencies. And the other way is tomake it so complicated that there are no obvious deficiencies.

-- C.A.R. Hoare

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Pavel Machek <[email protected]>
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Kyle Moffett <[email protected]>

References:
- Problem with 2.6 kernel and lots of I/O
  - From: Roy Keene <[email protected]>
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Pavel Machek <[email protected]>
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Erik Slagter <[email protected]>
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Kyle Moffett <[email protected]>
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Roy Keene <[email protected]>
- Re: Problem with 2.6 kernel and lots of I/O
  - From: Kyle Moffett <[email protected]>

Prev by Date: Re: [2.6 patch] drivers/pnp/: cleanups
Next by Date: Re: A Great Idea (tm) about reimplementing NLS.
Previous by thread: Re: Problem with 2.6 kernel and lots of I/O
Next by thread: Re: Problem with 2.6 kernel and lots of I/O
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]