Re: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io

On Wed, Oct 03, 2007 at 12:41:19PM +1000, David Chinner wrote:
> On Wed, Oct 03, 2007 at 09:34:39AM +0800, Fengguang Wu wrote:
> > On Wed, Oct 03, 2007 at 07:47:45AM +1000, David Chinner wrote:
> > > On Tue, Oct 02, 2007 at 04:41:48PM +0800, Fengguang Wu wrote:
> > > >  		wbc.pages_skipped = 0;
> > > > @@ -560,8 +561,9 @@ static void background_writeout(unsigned
> > > >  		min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
> > > >  		if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
> > > >  			/* Wrote less than expected */
> > > > -			congestion_wait(WRITE, HZ/10);
> > > > -			if (!wbc.encountered_congestion)
> > > > +			if (wbc.encountered_congestion || wbc.more_io)
> > > > +				congestion_wait(WRITE, HZ/10);
> > > > +			else
> > > >  				break;
> > > >  		}
> > > 
> > > Why do you call congestion_wait() if there is more I/O to issue?  If
> > > we have a fast filesystem, this might cause the device queues to
> > > fill, then drain on congestion_wait(), then fill again, etc. i.e. we
> > > will have trouble keeping the queues full, right?
> > 
> > You mean slow writers and fast RAID? That would be exactly the case
> > these patches try to improve.
> 
> I mean any writers and a fast block device (raid or otherwise).
> 
> > This patchset makes kupdate/background writeback more responsible,
> > so that if (avg-write-speed < device-capabilities), the dirty data are
> > synced timely, and we don't have to go for balance_dirty_pages().
> 
> Sure, but I'm asking about the effect of the patches on the
> (avg-write-speed == device-capabilities) case. I agree that
> they are necessary for timely syncing of data but I'm trying
> to understand what effect they have on the normal write case

> (i.e. keeping the disk at full write throughput).

OK, I guess it is the focus of all your questions: Why should we sleep
in congestion_wait() and possibly hurt the write throughput? I'll try
to summary it:

- congestion_wait() is necessary
Besides device congestions, there may be other blockades we have to
wait on, e.g. temporary page locks, NFS/journal issues(I guess).

- congestion_wait() is called only when necessary
congestion_wait() will only be called we saw blockades:
        if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
                congestion_wait(WRITE, HZ/10);
        }
So in normal case, it may well write 128MB data without any waiting.

- congestion_wait() won't hurt write throughput
When not congested, congestion_wait() will be wake up on each write
completion. Note that MAX_WRITEBACK_PAGES=1024 and
/sys/block/sda/queue/max_sectors_kb=512(for me),
which means we are gave the chance to sync 4MB on every 512KB written,
which means we are able to submit write IOs 8 times faster than the
device capability. congestion_wait() is a magical timer :-)

> > So for your question of queue depth, the answer is: the queue length
> > will not build up in the first place. 
> 
> Which queue are you talking about here? The block deivce queue?

Yes, the elevator's queues.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io
  - From: David Chinner <[email protected]>
- Re: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io
  - From: David Chinner <[email protected]>

Prev by Date: Re: Serial ATA does not find partitions (Hitachi HD, new? ATI controller) where old SATA works
Next by Date: [PATCH] Code style fix for open_exec
Previous by thread: Re: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io
Next by thread: Re: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]