Re: [PATCH linux-2.6 01/04] brsem: implement big reader semaphore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




 Hello, Nick.

Nick Piggin wrote:
Tejun Heo wrote:


As I've said in the other reply, I'll first rip off three stage init thing for cpucontrol. That added pretty much complexity to it. And with the weird naming scheme, please tell me how to fix it. I have no problem renaming things.


OK, my criticism of your naming was not constructive so I apologise
for that. I willll help you with some of those minor issues if we
establish that your overall design is a a goer.


 Thanks. :-)


What would be wrong with an array of NR_CPUS rwsems? The only
tiny trick you would have to do AFAIKS is have up_read remember
what rwsem down_read took, but that could be returned from
down_read as a token.



Using array of rwsems means that every reader-side ops performs (unnecessary) real down and up operations on the semaphore which involve atomic memory op and probably memory barrier. These are pretty expensive operations.

What brsem tries to do is implementing rwsem semantics while performing only normal (as opposed to atomic/barrier) intstructions during reader-side operations. That's why all the call_on_all_cpus stuff is needed while write-locking. Do you think avoiding atomic/barrier stuff would be an overkill?


Yes I think so. I think the main problem on modern CPUs is
not atomic operations as such, but cacheline bouncing.

Without any numbers, I'd guess that your down_read is more
expensive than mine simply due to touching more cachelines
and having more branches.

 Other than local_bh_disable/enable(), all brsem read ops do are

 1. accessing sem->idx
 2. calculate per-cpu rcnt address from sem->idx
 3. do one branch on the value of per-cpu rcnt
 4. inc/dec per-cpu rcnt

So, it does access one more cachline and, yeap, there is one branch for bh enabling and several more inside local_bh_enable. I'll try to get some benchmark numbers for comparison.

I'm thinking about adding down_read(&xxx->s_umount) to write(2) and compare normal rwsem and brsem performance by repeitively writing short data into a file on a UP machine. Do you have better ideas?


The other thing is simply that you really want your
synchronization primitives to be as simple and verifiable
as possible. For example rwsems even recently have had subtle
memory ordering and other implemntation corner cases, and
they're much less complex than this brsem.

Nick


 Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux