Re: Reducing the bdi proporion calculation period to speed up disk write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2007-12-11 at 11:11 +0100, Peter Zijlstra wrote:
> On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> > The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> > device dirty threshold. It works well.
> > However, the period for proportion calculation may be too large.
> > For 8G memory, the calc_period_shift() will return 19 as the shift.
> > 
> > When we switch writing operation between different disks, there may be
> > potential performance issue.
> > 
> > For example, we first write to disk A, then write to disk B.
> > The proportion for disk B will increase slowly because the denominator
> > is too large (It's 2^18 + (global_count & counter_mask)).
> > The disk B will get small dirty page quota for a long time,
> > it will get blocked frequently though the total dirty page is under the
> > dirty page limit.
> > 
> > Peter provided a patch to avoid this issue, this patch allow violation
> > of bdi limits if there is a lot of room on the system.
> > It looks like:
> > 
> > +if (nr_reclaimable + nr_writeback < (background_thresh +
> > dirty_thresh) / 2)
> > +                     break; 
> > 
> > This patch really help to avoid congestion, but if the dirty pages
> > exceed about 3/4 of the dirty_thresh, congestion still happens if we
> > write to another disk. 
> > 
> > I think that we can reduce the period to speed up the proportion
> > adjustment. 
> > 
> > diff -Nur a/page-writeback.c b/page-writeback.c
> > --- a/page-writeback.c  2007-12-11 13:46:30.000000000 +0800
> > +++ b/page-writeback.c  2007-12-11 13:47:11.000000000 +0800
> > @@ -128,10 +128,7 @@
> >   */
> >  static int calc_period_shift(void)
> >  {
> > -       unsigned long dirty_total;
> > -
> > -       dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> > 100;
> > -       return 2 + ilog2(dirty_total - 1);
> > +       return 12;
> >  }
> 
> Its a heuristic, it might need some tuning, but a static value is wrong.
> I think its generally true that the larger the machine memory size, the
> faster the storage subsystem. And the more likely it has more disks.
> 
> One of the reasons this value isn't static is that with your fixed 12 it
> becomes very hard to balance over more than 4096 active devices. Of
> course, it takes quite a special set-up to get into that situation.
I strongly agree with you that a static value is not a good idea.

> 
> As it is, it now takes about 2 * dirty limit to switch over, you could
> start by making that just a single, or maybe even half a, dirty limit.
We will do more testing to choose a better formular based on dirty_ratio
and total memory.

> 
> 
> Also, I'm not quite convinced your benchmark is all that useful. Do you
> really think it matches an actual frequently occurring usage pattern?
We used iozone to test 1.2GB sequential write/rewrite. It's hard to match
exactly an actual usage pattern, but I have an example. Administrator
might backup big files to other free disks periodically although he/she might
not need it fast.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux