Re: More information on scsi_cmd_cache leak... (bisect)

Jens Axboe wrote:

On Fri, Jan 27 2006, Jens Axboe wrote:
On Fri, Jan 27 2006, Neil Brown wrote:
On Friday January 27, chase.venters@clientec.com wrote:
Greetings,
Just a quick recap - there are at least 4 reports of 2.6.15 usersexperiencing severe slab leaks with scsi_cmd_cache. It seems that a few of ushave a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common.After dealing with this leak for a while, I decided to do some dancing aroundwith git bisect. I've landed on a possible point of regression:
commit: a9701a30470856408d08657eb1bd7ae29a146190
[PATCH] md: support BIO_RW_BARRIER for md/raid1
I spent about an hour and a half reading through the patch, trying to see ifI could make sense of what might be wrong. The result (after I dug into thecode to make a change I foolishly thought made sense) was a hung kernel.This is important because when I rebooted into the kernel that had beengiving me trouble, it started an md resync and I'm now watching (at leastduring this resync) the slab usage for scsi_cmd_cache stay sane:
turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 :slabdata 3 3 0
This suggests that the problem happens when a BIO_RW_BARRIER write is
sent to the device.  With this patch, md flags all superblock writes
as BIO_RW_BARRIER However md is not so likely to update the superblock often
during a resync.

There is a (rough) count of the number of superblock writes in the
"Events" counter which "mdadm -D" will display.
You could try collecting 'Events' counter together with the
'active_objs' count from /proc/slabinfo and graph the pairs - see if
they are linear.

I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'
command to the device, and the driver for your particular device may
well be losing scsi_cmd_cache allocation when doing that, but I leave
that to someone how knows more about that code.
I already checked up on that since I suspected barriers initially. The
path there for scsi is sd.c:sd_issue_flush() which looks pretty straight
forward. In the end it goes through the block layer and gets back to the
SCSI layer as a regular REQ_BLOCK_PC request.
Sorry, that was for the ->issue_flush() that md also does but did before
the barrier addition as well. Most of the barrier handling is done in
the block layer, but it could show leaks in SCSI of course. FWIW, I
tested barriers with and without md on SCSI here a few days ago and
didn't see any leaks at all.

It does not have anything to do with this in scsi_io_completion does it?

        if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
                return;

For that case the scsi_cmnd does not get freed. Does it come back aroundagain and get released from a different path?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: More information on scsi_cmd_cache leak... (bisect)
  - From: James Bottomley <James.Bottomley@SteelEye.com>
- Re: More information on scsi_cmd_cache leak... (bisect)
  - From: Jens Axboe <axboe@suse.de>

References:
- More information on scsi_cmd_cache leak... (bisect)
  - From: Chase Venters <chase.venters@clientec.com>
- Re: More information on scsi_cmd_cache leak... (bisect)
  - From: Neil Brown <neilb@suse.de>
- Re: More information on scsi_cmd_cache leak... (bisect)
  - From: Jens Axboe <axboe@suse.de>
- Re: More information on scsi_cmd_cache leak... (bisect)
  - From: Jens Axboe <axboe@suse.de>

Prev by Date: 2.6.14 kernels and above copy_to_user stupidity with IRQ disabled check
Next by Date: Re: More information on scsi_cmd_cache leak... (bisect)
Previous by thread: Re: More information on scsi_cmd_cache leak... (bisect)
Next by thread: Re: More information on scsi_cmd_cache leak... (bisect)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]