Re: [patch] reduce IPI noise due to /dev/cdrom open/close


On Jul 4, 2006, at 2:47 AM, Jes Sorensen wrote:

Milton Miller wrote:

On Mon Jul 03 2006 - 11:37:43 EST,  Jes Sorensen wrote:

But that is totally racy!

Another cpu could set its bit between the assignment to mask and
the call to cpus_clear.

Which means we end up with cpus holding a bh in their lru but no
idea which ones.

The idea I wrote the code under was that we are really only concernedtomake sure all bh's related to the device causing the invalidate arehit.

It doesn't matter that there are bh's in the lru from other devices.


And that is fine as far as it goes.  The problem is that an unrelated

device might be be hit by this operation. For example, hal is runningon cpu 0, so the fsck gets run on cpu 1 and hits this race. Itfinishes, and now hal is back to sleep, and cpu power saving balancingsays run the mount on cpu 0. The file system tries to change the blocksize, which calls kill_bdev which calls invalidate_bdev, but we forgotthat cpu 1 had a bh, so the bh doesn't free, and the page is left inthe page cache with the wrong size buffer. POOOF there goes thefilesystem.


Oh, and there isn't any safety check there, is there?

Unfortunately clearing the bit in the callback means we pass the cpu
mask around twice (once to clear, and later to set as we start
freeing bhs again).


Doing it in the callback also means each CPU has to go back and hit the
mask remotely causing a lot of cache line ping pong effects, especially
as they'll be doing it at roughly the same time. This is why I
explicitly did the if (!test_bit) set_bit() optimization.

Although that is probably not much worse than scanning other cpus'
per-cpu data for NULL (and I would probably just scan 8 pointers
rather than add another per-cpu something is cached flag).


8 pointers * NR_CPUS - that can be a lot of cachelines you have to pull
in from remote :(

8 pointers in one array, hmm... 8*8 = 64 consecutive bytes, that ishalf a cache line in my arch. (And on 32 bit archs, 8*4 = 32 bytes,typically 1 full cache line). Add an alignment constraint if you like.I was comparing to fetching a per-cpu variable saying we had anyentries; I'd say the difference is the time to do 8 loads once you havethe line vs the chance the other cpu is actively storing and stealingit back before you finish. I'd be really surprised if it was stolentwice.

I don't like the idea of invalidate_bdev (say due to openers going
to zero) running against one device causing a buffer to be left
artificially busy on another device, causing a page to be left
around.

If you want to cut down on the cache line passing, then putting
the cpu mask in the bdev (for "I have cached a bh on this bdev
sometime") might be useful.  You could even do a bdev specific
lru kill, but then we get into the next topic.


That could be an interesting way out.

Another approach is to age the cache. Only clear the bit during the IPIcall if you had nothing there on the last N calls (N being somecombination of absolute time and number of invalidate calls).


milton

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch] reduce IPI noise due to /dev/cdrom open/close
  - From: Jes Sorensen <[email protected]>

References:
- [patch] reduce IPI noise due to /dev/cdrom open/close
  - From: Jes Sorensen <[email protected]>
- Re: [patch] reduce IPI noise due to /dev/cdrom open/close
  - From: Milton Miller <[email protected]>
- Re: [patch] reduce IPI noise due to /dev/cdrom open/close
  - From: Jes Sorensen <[email protected]>

Prev by Date: RE: [PATCH] mm: moving dirty pages balancing to pdfludh entirely
Next by Date: netconsole
Previous by thread: Re: [patch] reduce IPI noise due to /dev/cdrom open/close
Next by thread: Re: [patch] reduce IPI noise due to /dev/cdrom open/close
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]