Re: [PATCH] Fix a race condition between ->i_mapping and iput()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton <[email protected]> writes:

>> This patch takes ref-count of bdev's inode for fs's inode before
>> setting a ->i_mapping, and the clear_inode() of fs's inode does iput().
>> So, if fs's inode is still living, bdev's inode shouldn't be freed.
>> 
>
> OK, so to rephrase:
>
> Whenever /dev/sda's inode->i_mapping points at a kernel-internal blockdev
> inode's i_data, we will hold a ref on that blockdev inode, to pin its
> i_data.  That sounds sane.

Yes, thanks.

This approach may have one issue. If user is using a on-memory fs as
/dev (i.e. udev), the inode is never freed, so blockdev's inode is
also never freed.

And if user is creating the many block special file, blockdev's inode
is also pinned additionally.

Probably we can call a igrab() for ->i_mapping instead of this approach,
however it's "run-time overhead vs memory space" trade-off.

>> diff -puN fs/block_dev.c~i_mapping-race-fix-2 fs/block_dev.c
>> --- linux-2.6/fs/block_dev.c~i_mapping-race-fix-2	2006-03-11 16:19:07.000000000 +0900
>> +++ linux-2.6-hirofumi/fs/block_dev.c	2006-03-11 17:52:11.000000000 +0900
>> @@ -418,21 +418,31 @@ EXPORT_SYMBOL(bdput);
>>  static struct block_device *bd_acquire(struct inode *inode)
>>  {
>>  	struct block_device *bdev;
>> +
>>  	spin_lock(&bdev_lock);
>>  	bdev = inode->i_bdev;
>> -	if (bdev && igrab(bdev->bd_inode)) {
>> +	if (bdev) {
>> +		atomic_inc(&bdev->bd_inode->i_count);
>
> No longer checking (I_FREEING|I_WILL_FREE).  Why was this change included?

With this patch, inode->i_bdev is freed in clear_inode() after the
inode became the I_FREEING.

The caller is holding a inode's ref, and if the inode->i_bdev != NULL,
->i_bdev->bd_inode never have the I_FREEING/I_WILL_FREE.  So we don't
need to check I_FREEING/I_WILL_FREE anymore.

>> -		if (inode->i_bdev)
>> -			__bd_forget(inode);
>> -		inode->i_bdev = bdev;
>> -		inode->i_mapping = bdev->bd_inode->i_mapping;
>> -		list_add(&inode->i_devices, &bdev->bd_inodes);
>> +		if (!inode->i_bdev) {
>> +			/*
>> +			 * We take an additional bd_inode->i_count for inode,
>> +			 * and it's released in clear_inode() of inode.
>> +			 * So, we can access it via ->i_mapping always
>> +			 * without igrab().
>> +			 */
>> +			atomic_inc(&bdev->bd_inode->i_count);
>> +			inode->i_bdev = bdev;
>> +			inode->i_mapping = bdev->bd_inode->i_mapping;
>> +			list_add(&inode->i_devices, &bdev->bd_inodes);
>> +		}
>
> And why this change?  The removal of the __bd_forget() and the changed
> behaviour if (inode->i_bdev != NULL)?

With the above change, we don't need to care the case of igrab()
returns NULL anymore. Since inode->i_bdev->bd_inode never has the
I_FREEING, we don't need to call __bd_forget() for that case.

I think we need to address an only following simple race here.

              cpu0                                    cpu1
-----------------------------------+-----------------------------------------
bdev = bdget(inode->i_rdev);       
if (bdev) {                        
				   	bdev = bdget(inode->i_rdev);
				   	if (bdev) {
						spin_lock(&bdev_lock);
						if (!inode->i_bdev) {
							[...]
							inode->i_bdev = bdev;
	spin_lock(&bdev_lock);

Thanks.
-- 
OGAWA Hirofumi <[email protected]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux