Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Tue, 2007-09-04 at 23:58 -0700, Christoph Lameter wrote:
> On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> 
> > On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote:
> > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> > > 
> > > > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a
> > > 
> > > You can change that by booting with slub_max_order=0. Then we can also use 
> > > the per cpu queues to get these order 0 objects which may speed up the 
> > > allocations because we do not have to take zone locks on slab allocation.
> > > 
> > > Note also that Andrew's tree has a page allocator pass through for SLUB 
> > > for 4k kmallocs bypassing slab completely. That may also address the 
> > > issue.
> > > 
> > > If you want SLUB to handle more objects in the 4k kmalloc cache 
> > > without going to the page allocator then you can boot f.e. with
> > > 
> > > slub_max_order=3 slub_min_objects=8
> > I tried this approach. The testing result showed 2.6.23-rc4 is about
> > 2.5% better than 2.6.22. It really resovles the issue.
> > 
> > However, the approach treats the slabs in the same policy. Could we
> > implement a per-slab specific approach like direct b)?
> 
> I am not sure what you mean by same policy. Same configuration for all 
> slabs?
Yes.

> 
> > > Try the ways to address the issue that I mentioned above.
> > I really appreciate your kind comments!
> 
> Would it be possible to try the two other approaches that I suggested? I 
> think both of those may also solve the issue. Try booting with
> slab_max_order=0
1) I tried slab_max_order=0 and the regression becomes 12.5%. It's still not good.

2) I apllied patch slub-direct-pass-through-of-page-size-or-higher-kmalloc.patch
to kernel 2.6.23-rc4. The new testing result is much better, only 1% less than
2.6.22.

So the best solution is booting kernel with "slub_max_order=3 slub_min_objects=8".

>  and see what effect it has. The queues of the page 
> allocator can be much larger than what slab has for 4k pages. There is 
> really not much of a point in using a slab allocator for page sized 
> allocations.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
  - From: Christoph Lameter <clameter@sgi.com>

References:
- tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
  - From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
- Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
  - From: Christoph Lameter <clameter@sgi.com>
- Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
  - From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
- Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
  - From: Christoph Lameter <clameter@sgi.com>

Prev by Date: Re: huge improvement with per-device dirty throttling
Next by Date: some XFS fixes for 2.6.23
Previous by thread: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
Next by thread: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]