Re: remove __read_mostly?

On Sun, Jun 25, 2006 at 11:57:36AM -0700, Andrew Morton wrote:
> 
> I'm thinking we should remove __read_mostly.
> 
> Because if we use this everywhere where it's supposed to be used, we end up
> with .bss and .data 100% populated with write-often variables, packed
> closely together.  The cachelines will really flying around.
> 
> IOW: __read_mostly optimises read-mostly variables and pessimises
> write-often variables.
> 
> We want something which optimises both read-mostly and write-often storage.
>  We do that by marking the write-often variables with
> __cacheline_aligned_in_smp.

We already mark write often structures with __cacheline_aligned_in_smp.

The idea behind __read_mostly is to separate variables like cpu maps,
bootcpuinfo etc which are written to very very rarely -- during 
initialization/hot-plugging, but read quite often something like ~100 % read
ratio.  They might not be sharing the cacheline with a 
variable which is not as frequently written to, to mark 
__cacheline_aligned_in_smp, but still, these often read variables would be 
invalidated in cache every time there is a write on these other variables.  
Considering that there are quite a few structures which are read often,
(like cpu maps, cpu info, node to cpu maps etc) it made sense to place them 
in a separate section.

When we mark  variables __read_mostly, it doesn't mean all that is left in
.bss and .data is write mostly.  Surely the rest of .data and .bss which
do not qualify for ~100% read would not be ~100% write/RMW.  We would have 
variables which experience varying ratio of reads and writes.

So as I see it the current scheme of thing is 
1. If a variable is written quite often -- mark __cacheline_aligned_in_smp.
2. If a variable is read quite often .. something like 99:1 read -- mark them
   __read_mostly

> 
> OK?
> 
> [Problem is, I don't think any of the make-foo-__read_mostly patches
> actually identified _which_ write-often variables were affecting `foo', so
> we'll be back to square one.]

Most of the variables we marked as __read_mostly were ones we were seeing
inter-node transfers for read, when we shouldn't be seeing any -- because
they are just written to once during bootup.  And it did not make sense to 
mark the other writers on the inter-node cacheline __cacheline_aligned_in_smp, 
since it would mean a bloat of 128-4096 bytes depending on the arch, and
those variables were not written often enough to warrant padding. A few not
so-often-written variables on the same cacheline would mean these read
mostly variables are always being invalidated.

> 
> [Actually, we should do
> 
> 	#define __write_often __cacheline_aligned_in_smp
> 
> and use __write_often
> 
> a) for documentation and
> 
> b) so the optimisation can be centrally turned off, for space
>    optimisation or for performance validation.]
> 

Sure, we can change users of __cacheline_aligned_in_smp to __write_often and
change the section to .data.write_mostly from .data.cacheline_aligned for 
readability sake.  But, IMHO we do need the __read_mostly section for variables
which experience near 100% read access.  I cannot see any negative aspects of
having a separate read mostly section as there is no padding or bloating
involved.  Granted, we might not see performance difference with some builds
due to the compiler/linker placement / code changes, but this might
resurface later with a newer build/code changes.

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: remove __read_mostly?
  - From: Paul Jackson <[email protected]>

References:
- remove __read_mostly?
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: linux-2.6.17.1: undefined reference to `online_page'
Next by Date: Re: linux-2.6.17.1: undefined reference to `online_page'
Previous by thread: remove __read_mostly?
Next by thread: Re: remove __read_mostly?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]