Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.

On Sat, Aug 26, 2006 at 11:46:18PM -0700, Andrew Morton wrote:
> On Sun, 27 Aug 2006 11:41:55 +0530
> Dipankar Sarma <[email protected]> wrote:
> 
> > Now coming to the read-side of lock_cpu_hotplug() - cpu hotplug
> > is a very special asynchronous event. You cannot protect against
> > it using your own subsystem lock because you don't control
> > access to cpu_online_map.
> 
> Yes you do.  Please, read _cpu_up(), _cpu_down() and the example in
> workqueue_cpu_callback().  It's really very simple.

What are you talking about here ? That is the write side. You are
*not* supposed to do lock_cpu_hotplug() in cpu callbacks paths AFAICT. 
If someone does it (like cpufreq did), it is just *wrong*.

> > With multiple low-level subsystems
> > needing it, it also becomes difficult to work out the lock
> > hierarchies.
> 
> That'll matter if we do crappy code.  Let's not do that?

I am talking about readsides here - you read cpu_online_map and
block then reuse the map and make some calls to another subsystem
that may again do a similar read-side cpu_hotplug lock. I suspect
that it hard to get rid of all possible dependencies.

> > > I rather doubt that anyone will be hitting the races in practice.  I'd
> > > recommend simply removing all the lock_cpu_hotplug() calls for 2.6.18.
> > 
> > I don't think that is a good idea.
> 
> The code's already racy and I don't think anyone has reported a
> cpufreq-vs-hotplug race.

Do cpu hotplugs in a one cpu and change frequencies in another -
I think Gautham has a script to reproduce this. Besides
lockdep apparently complains about it -

http://marc.theaimsgroup.com/?l=linux-kernel&m=115359728428432&w=2

> > The right thing to do would be to
> > do an audit and clean up the bad lock_cpu_hotplug() calls.
> 
> No, that won't fix it.  For example, take a look at all the *callers* of
> cpufreq_update_policy().  AFAICT they're all buggy.  Fiddling with the
> existing lock_cpu_hotplug() sites won't fix that.  (Possibly this
> particular problem can be fixed by checking that the relevant CPU is still
> online after the appropriate locking has been taken - dunno).
> 
> It needs to be ripped out and some understanding, thought and design should
> be applied to the problem.

Really, the hotplug locking rules are fairly simple-

1. If you are in cpu hotplug callback path, don't take any lock.

2. If you are in a non-hotplug path reading cpu_online_map and you don't
   block, you just disable preemption and you are safe from hotplug.

3. If you are in a non-hotplug path and you use cpu_online_map and
   you *really* need to block, you use lock_cpu_hotplug() or 
   cpu_hotplug_disable whatever it is called.

Is this too difficult for people to follow ?

> 
> > People
> > seem to have just got lazy with lock_cpu_hotplug().
> 
> That's because lock_cpu_hotplug() purports to be some magical thing which
> makes all your troubles go away.

No it doesn't. Perhaps we should just document the rules better
and put some static checks for people to get it right.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Andrew Morton <[email protected]>

References:
- [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Gautham R Shenoy <[email protected]>
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Andrew Morton <[email protected]>
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Dave Jones <[email protected]>
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Linus Torvalds <[email protected]>
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Andrew Morton <[email protected]>
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Dipankar Sarma <[email protected]>
- Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
  - From: Andrew Morton <[email protected]>

Prev by Date: dmfe and tulip modules list duplicate PCI IDs
Next by Date: Re: wrt: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Previous by thread: Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
Next by thread: Re: [RFC][PATCH 0/4] Redesign cpu_hotplug locking.
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]