Re: [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices

OGAWA Hirofumi <[email protected]> wrote:
>
> Hi,
> 
> This adds new "flush" option on experiment for hotplug devices.
> 
> Current implementation of "flush" option does,
> 
> 	- synchronizing data pages at ->release() (last close(2))
> 	- if user's work seems to be done (fs is not active), all
> 	  metadata syncs by pdflush()

Seems like a sensible thing to do.

> This option would provide kind of sane progress, and dirty buffers is
> flushed more frequently (if fs is not active).

Your implementation doesn't really do this.  bdi_write_congested() only
returns true if the device is super-busy.  To determine whether it's "not
active" we'd need to peek at the queue's disk_stat accounting, or at the
queue's outstanding read/write requests.  We covered this a couple of weeks
ago in the context of Con's swap prefetch work.

>  This option doesn't
> provide any robustness (robustness is provided by other options), but
> probably the option is proper for hotplug devices.

Well...  It does a full fsync_super() - that's pretty robust.

> +EXPORT_SYMBOL(filemap_write_and_wait);

_GPL please.

> +EXPORT_SYMBOL(fsync_super);

Ditto

> +/*
> + * Copyright (C) 2005, OGAWA Hirofumi
> + * Released under GPL v2.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/blkdev.h>
> +#include <linux/writeback.h>
> +#include <linux/msdos_fs.h>
> +
> +int fat_sync_fdata(struct inode *inode, struct file *filp)
> +{
> +	int err = 0;
> +
> +	if (filp->f_mode & FMODE_WRITE) {
> +#if 1
> +		current->flags |= PF_SYNCWRITE;
> +		err = filemap_write_and_wait(inode->i_mapping);
> +		current->flags &= ~PF_SYNCWRITE;
> +#else
> +		down(&inode->i_sem);
> +#if 1
> +		err = generic_osync_inode(inode, inode->i_mapping, OSYNC_DATA);
> +#else
> +		err = filp->f_op->fsync(filp, filp->f_dentry, 1);
> +#endif
> +		up(&inode->i_sem);
> +#endif
> +	}
> +	return err;
> +}

Can't we just split up do_fsync() a bit and use that?

> +static void fat_pdflush_handler(unsigned long arg)
> +{
> +	struct super_block *sb = (struct super_block *)arg;
> +	fsync_super(sb);
> +}

It would be nice if /proc/sys/vm/dirty_writeback_centisecs was a per-fs
thing.   That's non-trivial.

> +static void fat_flush_timer(unsigned long data)
> +{
> +	struct super_block *sb = (struct super_block *)data;
> +	struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +	struct backing_dev_info *bdi = blk_get_backing_dev_info(sb->s_bdev);
> +	unsigned long last_flush_jiff;
> +
> +	if (bdi_write_congested(bdi)) {

As indicated above, this won't be very effective.

> +		mod_timer(&sbi->flush_timer, jiffies + (HZ / 10));
> +		return;
> +	}
> +
> +	last_flush_jiff = sbi->last_flush_jiff;
> +
> +	if (!time_after_eq(jiffies, last_flush_jiff + (HZ / 2))) {
> +		mod_timer(&sbi->flush_timer, last_flush_jiff + (HZ / 2));
> +		return;
> +	}

What's the above doing?

> +	if (pdflush_operation(fat_pdflush_handler, (unsigned long)sb) < 0)
> +		mod_timer(&sbi->flush_timer, jiffies + HZ);
> +}
> +
> +void __fat_mark_flush(struct super_block *sb)
> +{
> +	struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +
> +	sbi->last_flush_jiff = jiffies;
> +	/*
> +	 * make sure by smb_wmb() that dirty buffers before here is
> +	 * processed at the timer routine.
> +	 */
> +	smp_wmb();
> +
> +	if (!timer_pending(&sbi->flush_timer))
> +		mod_timer(&sbi->flush_timer, jiffies + HZ);
> +}
> +EXPORT_SYMBOL(__fat_mark_flush);

_GPL?

> +void fat_flush_stop(struct super_block *sb)
> +{
> +	del_timer_sync(&MSDOS_SB(sb)->flush_timer);
> +}

whoops, the pdflush_operation could still be in progress.

To avoid umount races I think the pdflush callback is going to need to take
sb_lock, increment s_count, take ->s_umount, test ->s_root.  Like, for
example, __sync_inodes.

> +void fat_flush_init(struct super_block *sb)
> +{
> +	struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +	init_timer(&sbi->flush_timer);
> +	sbi->flush_timer.data = (unsigned long)sb;
> +	sbi->flush_timer.function = fat_flush_timer;

-mm has setup_timer()

> +
> +static inline void fat_mark_flush(struct super_block *sb)
> +{
> +	if (MSDOS_SB(sb)->options.flush)
> +		__fat_mark_flush(sb);
> +}

It'd be nice to make this a more generic thing, so other filesystems can
use it without copying lots of code.

> +		case Opt_flush:

MS_FLUSH?   I added MS_DIRSYNC a few years ago - it wasn't too complex.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices
  - From: OGAWA Hirofumi <[email protected]>
- running linux TCP/IP stack on a real slow machine
  - From: Kallol Biswas <[email protected]>

References:
- [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices
  - From: OGAWA Hirofumi <[email protected]>

Prev by Date: Re: 2.6.13-rc2-almost-rc3, ext3 and memory corruption
Next by Date: Re: Ktimer / -rt9 (+custom) monotonic_clock going backwards.
Previous by thread: [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices
Next by thread: running linux TCP/IP stack on a real slow machine
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]