Apparently the latest kernel still uses big kernel lock in the opening/
closing of block device. And on a moderate sized numa machine, we've see
huge lock contention with lock_kernel()/unlock_kernel() function coming
from fs/block_dev.c:do_open() and block_dev.c/blkdev_put().
This was found accidentally by a slightly non-optimal application environ-
ment: a multi-process application runs on 128P numa machine; it forks out
400 processes and each process tries to open ~3000 block devices. Because
of per process limit of file descriptor, this "smart ass" application went
into a mode where it dynamically open and close file descriptors on demand.
I know we can get around it by increasing "open file" limit. But it darn
on me that current code won't allow concurrent opening of different block
devices either.
I've studied do_open() and blkdev_put(), and concluded that it is already
SMP safe because they are protected by per-device bdev->bd_mutex. What's
left that I can tell is the call to each block device type via disk->fops
->open(). I would like to propose rid the BKL in the generic block device
open/close path and put the burden on each device specific code to use BKL
if necessary. Most of the modern block device drivers shouldn't need them
because in fops->open() they already uses one of the three variants:
a) spin lock, b) mutex, or c) per device structure.
Out of the 63 hits I see in 2.6.17-rc4 with block_device_operations->open(),
floppy driver seems to be the only one that mucks around with a global
Variable without any protection. We can add a spin lock there.
Comments?
Signed-off-by: Ken Chen <[email protected]>
diff -Nurp linux-2.6.16/fs/block_dev.c linux-2.6.16.ken/fs/block_dev.c
--- linux-2.6.16/fs/block_dev.c 2006-05-31 17:45:37.000000000 -0700
+++ linux-2.6.16.ken/fs/block_dev.c 2006-05-31 17:44:48.000000000 -0700
@@ -860,10 +860,8 @@ static int do_open(struct block_device *
int part;
file->f_mapping = bdev->bd_inode->i_mapping;
- lock_kernel();
disk = get_gendisk(bdev->bd_dev, &part);
if (!disk) {
- unlock_kernel();
bdput(bdev);
return ret;
}
@@ -935,7 +933,6 @@ static int do_open(struct block_device *
}
bdev->bd_openers++;
mutex_unlock(&bdev->bd_mutex);
- unlock_kernel();
return 0;
out_first:
@@ -948,7 +945,6 @@ out_first:
module_put(owner);
out:
mutex_unlock(&bdev->bd_mutex);
- unlock_kernel();
if (ret)
bdput(bdev);
return ret;
@@ -1010,7 +1006,6 @@ int blkdev_put(struct block_device *bdev
struct gendisk *disk = bdev->bd_disk;
mutex_lock(&bdev->bd_mutex);
- lock_kernel();
if (!--bdev->bd_openers) {
sync_blockdev(bdev);
kill_bdev(bdev);
@@ -1040,7 +1035,6 @@ int blkdev_put(struct block_device *bdev
}
bdev->bd_contains = NULL;
}
- unlock_kernel();
mutex_unlock(&bdev->bd_mutex);
bdput(bdev);
return ret;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]