On Sat, Feb 03, 2007 at 07:38:50PM +0300, Oleg Nesterov wrote: > On 01/31, Paul E. McKenney wrote: > > > > QRCU as currently written (http://lkml.org/lkml/2006/11/29/330) doesn't > > do what you want, as it acquires the lock unconditionally. I am proposing > > that synchronize_qrcu() change to something like the following: > > > > void synchronize_qrcu(struct qrcu_struct *qp) > > { > > int idx; > > > > smp_mb(); > > > > if (atomic_read(qp->ctr[0]) + atomic_read(qp->ctr[1]) <= 1) { > > smp_rmb(); > > if (atomic_read(qp->ctr[0]) + > > atomic_read(qp->ctr[1]) <= 1) > > goto out; > > } > > > > mutex_lock(&qp->mutex); > > idx = qp->completed & 0x1; > > atomic_inc(qp->ctr + (idx ^ 0x1)); > > /* Reduce the likelihood that qrcu_read_lock() will loop */ > > smp_mb__after_atomic_inc(); > > I almost forgot. Currently this smp_mb__after_atomic_inc() is not strictly > needed, and the comment is correct. However, it becomes mandatory with your > optimization. Without this barrier, it is possible that both checks above > mutex_lock() will see the result of atomic_dec(), but not the atomic_inc(). > > So, may I ask you to also update this comment? > > /* > * Reduce the likelihood that qrcu_read_lock() will loop > * AND > * make sure the second re-check above will see the result > * of atomic_inc() if it sees the result of atomic_dec() > */ > > Something like this, I hope you will make it better. Good catch!!! I will make sure that this is reflected. Also, validation is proceeding nicely, if extremely hoggishly. The validation program and output thus far attached in case someone is interested. The three-readers/three-updaters case has worked its way up to 16% of the memory on a 48GB machine. ;-) If it overflows, I will see if I can get someone to convert it to VHDL and run hardware validation tools on it. This turned out to be necessary for the -rt implementation of RCU... > And another note: this all assumes that STORE-MB-LOAD works "correctly", yes? > We have other code which relies on that, should not be a problem. We have been working with Doug Lea of SUNY Oswego, Sebatian Burckhardt of University of Pennsylvania, and Vijay Saraswat of IBM Research towards a "universal memory model" that accommodates all machines. Currently, it does in fact handle store-mb-load the way we all want, thankfully! Actually, the other guys are doing most of the formalism, my main role has been programming a very stupid checker based on their formalisms and yelling at them when something bad happens. See the cccc directory in the x10 project on SourceForge if you want more info, but be warned that the checker's UI really sucks. It's input and output formats are abominations that could only have been dreamed up by someone who started out on punchcards and early-70s BASIC. Not pretty, but it does a good job of letting people know what I think the formalism is saying! There are some people working on a Prolog program called jmmsolve that as a much nicer input format, but I need to prototype memory barriers before they will incorporate them (currently, each statement acts as if it had smp_mb() before and after it). Also, their output is as yet incomprehensible to me. Thanx, Paul > (Alan Stern cc'ed). > > Oleg. >
Attachment:
qrcuval.2007.02.03a.tgz
Description: qrcuval.2007.02.03a.tgz
- Follow-Ups:
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Alan Stern <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- References:
- [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Peter Zijlstra <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Christoph Hellwig <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Ingo Molnar <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: "Paul E. McKenney" <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Oleg Nesterov <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Peter Zijlstra <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: "Paul E. McKenney" <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Peter Zijlstra <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: "Paul E. McKenney" <[email protected]>
- Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- From: Oleg Nesterov <[email protected]>
- [PATCH 3/7] barrier: a scalable synchonisation barrier
- Prev by Date: Re: A CodingStyle suggestion
- Next by Date: Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
- Previous by thread: Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- Next by thread: Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
- Index(es):