Re: Major regression on hackbench with SLUB (more numbers)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 22, 2007 at 10:01:37AM -0800, Linus Torvalds wrote:
> Well, I do have to admit that I'm not a huge fan of /proc/slabinfo, and 
> the fact that there hasn't been a lot of complaints about it going away 
> does seem to imply that it wasn't a very important ABI.
> 
> I'm the first to stand up for backwards compatibility, but I also try to 
> be realistic, and so far nobody has really complained about the fact that 
> /proc/slabinfo went away on any grounds but on the general principle of 
> "it was a user-visible ABI".

That's probably because the people who are most likely to be using
/proc/slabinfo tend to people people using kernels in production
environments, and they probably haven't started playing with SLUB yet.
So the fact we haven't heard the yelling doesn't necessarily mean that
people won't notice.

I know of a number of debugging scripts that periodically poll
/proc/slabinfo to find out what is using memory and to track trends
and predict potential problems with memory usage and balance.  Indeed,
I've left several such scripts behind at various customer
installations after debugging memory usage problems, and some system
administrators very much like being able to collect that information
easily.  So I can say for sure there at there are scripts out there
that depend on /proc/slabinfo, but they generally tend to be at sites
where people won't be running bleeding edge kernels.

> That said, I'm not a *huge* fan of /sys/slab/ either. I can get the info I 
> as a developer tend to want from there, but it's really rather less 
> convenient than I'd like. It is really quite hard, for example, to get any 
> kind of "who actually uses the most *memory*" information out of there. 

I have a general problem with things in /sys/slab, and that's just
because they are *ugly*.  So yes, you can write ugly shell scripts
like this to get out information:

> You have to use something like this:
> 
> 	for i in *
> 	do
> 		order=$(cat "$i/order")
> 		slabs=$(cat "$i/slabs")
> 		object_size=$(cat "$i/object_size")
> 		objects=$(cat "$i/objects")
> 		truesize=$(( $slabs<<($order+12) ))
> 		size=$(( $object_size*$objects ))
> 		percent=$(( $truesize/100 ))
> 		if [ $percent -gt 0 ]; then
> 			percent=$(( $size / $percent ))
> 		fi
> 		mb=$(( $size >> 10 ))
> 		printf "%10d MB %3d %s\n" $mb $percent $i
> 	done | sort -n | tail

But sometimes when trying to eyeball what is going on, it's a lot
nicer just to use "cat /proc/slabinfo".  

Another problem with using /sys/slab is that it is downright *ugly*.
Why is it for example, that /sys/slab/dentry is a symlink to
../slab/:a-0000160?  Because of this, Linus's script reports dentry twice:

     10942 MB  86 buffer_head
     10942 MB  86 journal_head
     10943 MB  86 :a-0000056
     21923 MB  86 dentry
     21924 MB  86 :a-0000160
     78437 MB  94 ext3_inode_cache

Once as "dentry" and once as ":a-0000160".  A similar situation
happens with journal_head and ":a-0000056".  I'm sure this could be
filtered out with the right shell or perl script, but now we're
getting even farther from /proc/slabinfo.  

Further, if you look at Documentation/sysfs-rules.txt (does anyone
ever read it or bother to update it?), /sys/slab isn't documented so
it falls in the category of "everything else is a kernel-level
implementation detail which is not guaranteed to be stable".  Nice....

Also, looking at sysfs-rules.txt, it talks an awful lot about reading
symlinks, and implying that actually trying to *dereference* the
symlinks might not actually work.  So I'm pretty sure linus's "cd
/sys/slab; for i in *" violates the Documentation/sysfs-rules.txt
guidelines even if /sys/slab were documented.

In general, /sys has some real issues that need some careful scrutiny.
Documentation/sysfs-rules.txt is *not* enough.  For example, various
official websites are telling people that the way to enable or disable
power-saving is:

	echo 5 > /sys/bus/pci/drivers/iwl4965/*/power_level

(Reference: http://www.lesswatts.org/tips/wireless.php)

Aside from the fact that "iwconfig wlan0 power on" is easier to type
and simpler to undrestand, I'm pretty sure that the above violates
sysfs-rules.txt.  From looking at sysfs-rules.txt, the fact that you
have to read symlinks as the only valid and guaranteed way for things
to work means that you have to write a perl script to safely
manipulate things in /sys.  People are ignoring sysfs-rules.txt, and
giving advice in websites contrary to sysfs-rules.txt because it's too
hard to follow.

> I worry about the lack of atomicity in getting the statistics.

And that's another problem with trying to use /sys when trying to get
statistics in bulk...  Very often the tables in /proc/* were not only
more convenient, but allowed for atomic access.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux