Re: [RFC 0/8] Cpuset aware writeback

On Tue, 16 Jan 2007, Andrew Morton wrote:

> > On Mon, 15 Jan 2007 21:47:43 -0800 (PST) Christoph Lameter <[email protected]> wrote:
> >
> > Currently cpusets are not able to do proper writeback since
> > dirty ratio calculations and writeback are all done for the system
> > as a whole.
> 
> We _do_ do proper writeback.  But it's less efficient than it might be, and
> there's an NFS problem.

Well yes we write back during LRU scans when a potentially high percentage 
of the memory in cpuset is dirty.

> > This may result in a large percentage of a cpuset
> > to become dirty without writeout being triggered. Under NFS
> > this can lead to OOM conditions.
> 
> OK, a big question: is this patchset a performance improvement or a
> correctness fix?  Given the above, and the lack of benchmark results I'm
> assuming it's for correctness.

It is a correctness fix both for NFS OOM and doing proper cpuset writeout.

> - Why does NFS go oom?  Because it allocates potentially-unbounded
>   numbers of requests in the writeback path?
> 
>   It was able to go oom on non-numa machines before dirty-page-tracking
>   went in.  So a general problem has now become specific to some NUMA
>   setups.

Right. The issue is that large portions of memory become dirty / 
writeback since no writeback occurs because dirty limits are not checked 
for a cpuset. Then NFS attempt to writeout when doing LRU scans but is 
unable to allocate memory.

>   So an obvious, equivalent and vastly simpler "fix" would be to teach
>   the NFS client to go off-cpuset when trying to allocate these requests.

Yes we can fix these allocations by allowing processes to allocate from 
other nodes. But then the container function of cpusets is no longer 
there.

> (But is it really bad? What actual problems will it cause once NFS is fixed?)

NFS is okay as far as I can tell. dirty throttling works fine in non 
cpuset environments because we throttle if 40% of memory becomes dirty or 
under writeback.

> I don't understand why the proposed patches are cpuset-aware at all.  This
> is a per-zone problem, and a per-zone fix would seem to be appropriate, and
> more general.  For example, i386 machines can presumably get into trouble
> if all of ZONE_DMA or ZONE_NORMAL get dirty.  A good implementation would
> address that problem as well.  So I think it should all be per-zone?

No. A zone can be completely dirty as long as we are allowed to allocate 
from other zones.

> Do we really need those per-inode cpumasks?  When page reclaim encounters a
> dirty page on the zone LRU, we automatically know that page->mapping->host
> has at least one dirty page in this zone, yes?  We could immediately ask

Yes, but when we enter reclaim most of the pages of a zone may already be 
dirty/writeback so we fail. Also when we enter reclaim we may not have
the proper process / cpuset context. There is no use to throttle kswapd. 
We need to throttle the process that is dirtying memory.

> But all of this is, I think, unneeded if NFS is fixed.  It's hopefully a
> performance optimisation to permit writeout in a less seeky fashion. 
> Unless there's some other problem with excessively dirty zones.

The patchset improves performance because the filesystem can do sequential 
writeouts. So yes in some ways this is a performance improvement. But this 
is only because this patch makes dirty throttling for cpusets work in the 
same way as for non NUMA system.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC 0/8] Cpuset aware writeback
  - From: Andrew Morton <[email protected]>

References:
- [RFC 0/8] Cpuset aware writeback
  - From: Christoph Lameter <[email protected]>
- Re: [RFC 0/8] Cpuset aware writeback
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: [PATCH 45/59] sysctl: C99 convert ctl_tables in drivers/parport/procfs.c
Next by Date: Re: [PATCH] return ENOENT from ext3_link when racing with unlink
Previous by thread: Re: [PATCH] nfs: fix congestion control
Next by thread: Re: [RFC 0/8] Cpuset aware writeback
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]