On Sun, 19 Mar 2006, Linus Torvalds wrote:
>
> Now, the simple version will iterate 8 times over the loop, while the more
> complex version will iterate just as many times as there are CPU's in the
> actual system. So there's a trade-off. The "load the CPU mask first and
> then shift it down by one every time" thing (that required different
> syntax) would have been able to exit early. This one isn't. So the syntax
> change would still help a lot (and would avoid the "btl").
Hah. My "abuse every gcc feature and do really ugly macros" dark side has
been vetted, and it came up with the appended work of art.
Doing a "break" inside a conditional by using the gcc statement
expressions is sublime.
And it works. It's a couple of instructions longer than the shortest
version (but still about half the size of the horror we have now), but the
instructions are simpler (just a shift rather than a "btl"), and it now
knows to break out early if there are no CPU's left to check.
It has the added advantage that because it uses simpler core instructions,
gcc can actually optimize it better - in "nr_context_switches()" gcc
happily hoisted the "mask->bits[0]" load to outside the loop, for example.
With NR_CPUS==2, gcc even apparently unrolls the loop, to the point where
it isn't even a loop at all. Good boy (at least for that case - I sure
hope it won't do that for some of the larger loops ;).
Of course, I shouldn't say "works", since it is still totally untested. It
_looks_ good, and that statement expression usage is just _so_ ugly it's
cute.
Linus
---
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 60e56c6..17a965c 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -313,11 +313,20 @@ static inline void __cpus_remap(cpumask_
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
}
-#if NR_CPUS > 1
+#define check_for_each_cpu(cpu, mask) \
+ ({ unsigned long __bits = (mask).bits[0] >> (cpu); if (!__bits) break; __bits & 1; })
+
+#if NR_CPUS > 32
#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = first_cpu(mask); \
(cpu) < NR_CPUS; \
(cpu) = next_cpu((cpu), (mask)))
+#elif NR_CPUS > 1
+#define for_each_cpu_mask(cpu, mask) \
+ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \
+ if (!check_for_each_cpu(cpu, mask)) \
+ continue; \
+ else
#else /* NR_CPUS == 1 */
#define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++)
#endif /* NR_CPUS */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]