Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2006-08-17 at 22:42 +0400, Evgeniy Polyakov wrote:
> On Thu, Aug 17, 2006 at 11:01:52AM -0700, Daniel Phillips ([email protected]) wrote:
> > *** The system is not OOM, it is in reclaim, a transient condition ***
> 
> It does not matter how condition when not every user can get memory is
> called. And actually no one can know in advance how long it will be.

Well, we have a direct influence here, we're working on code-paths that
are called from reclaim. If we stall, the box is dead.

> > Which Peter already told you!  Please, let's get our facts straight,
> > then we can look intelligently at whether your work is appropriate to
> > solve the block IO network starvation problem that Peter and I are
> > working on.
> 
> I got openssh as example of situation when system does not know in 
> advance, what sockets must be marked as critical.
> OpenSSH works with network and unix sockets in parallel, so you need to
> hack openssh code to be able to allow it to use reserve when there is 
> not enough memory.

OpenSSH or any other user-space program will never ever have the problem
we are trying to solve. Nor could it be fixed the way we are solving it,
user-space programs can be stalled indefinite. We are concerned with
kernel services, and the continued well being of the kernel, not
user-space. (well therefore indirectly also user-space of course)

>  Actually all sockets must be able to get data, since
> kernel can not diffirentiate between telnetd and a process which will 
> receive an ack for swapped page or other significant information.

Oh, but it does, the kernel itself controls those sockets: NBD / iSCSI
and AoE are all kernel services, not user-space. And it is the core of
our work to provide this information to the kernel; to distinguish these
few critical sockets.

> So network must behave separately from main allocator in that period of 
> time, but since it is possible that reserve can be not filled or has not
> enough space or something other, it must be preallocated in far advance
> and should be quite big, but then why netwrok should use it at all, when
> being separated from main allocations solves the problem?

You still need to guarantee data-paths to these services, and you need
to make absolutely sure that your last bit of memory is used to service
these critical sockets, not some random blocked user-space process.

You cannot pre-allocate enough memory _ever_ to satisfy the total
capacity of the network stack. You _can_ allot a certain amount of
memory to the network stack (avoiding DoS), and drop packets once you
exceed that. But still, you need to make sure these few critical
_kernel_ services get their data.

> I do not argue that your approach is bad or does not solve the problem,
> I'm just trying to show that further evolution of that idea eventually
> ends up in separated allocator (as long as all most robust systems
> separate operations), which can improve things in a lot of other sides
> too.

Not a separate allocator per-se, separate socket group, they are
serviced by the kernel, they will never refuse to process data, and it
is critical for the continued well-being of your kernel that they get
their data.

Also, I do not think people would like it if say 100M of their 1G system
just disappears, never to used again for eg. page-cache in periods of
low network traffic.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux