Fedora Users — Re: Enquiry,,,

On Sun, 2007-02-04 at 17:08 -0600, Jonathan Berry wrote:
> On 2/4/07, Les <hlhowell@xxxxxxxxxxx> wrote:
> > On Fri, 2007-02-02 at 17:44 -0800, Evan Klitzke wrote:
> > > On Fri, 2007-02-02 at 14:57 -0500, Dmitriy Kropivnitskiy wrote:
> > > > James Wilkinson wrote:
> > > > > It's usually a bit faster.
> > > > Just to avoid the confusion, are you saying that 64-bit capable
> > > > processors are faster than 32-bit only or that application compiled for
> > > > 64-bit architecture is faster then the same application compiled under
> > > > 32-bit architecture on the same hardware. The reply to your post tells
> > > > me that people think you mean the former, where I was talking about the
> > > > latter. I will not dispute the claim that 64-bit CPUs are faster then
> > > > 32-bit, cause I don't think they make 32-bit only CPUs anymore (at least
> > > > in the x86 architecture). So any 32-bit CPU will be just plain outdated
> > > > and therefore slower then any modern 64-bit (and 32-bit capable) CPU. As
> > > > for the applications, I believe the difference should be negligible
> > > > unless the application is trying to use a lot of RAM. I think I have
> > > > seen some benchmarks confirming this, but at the moment I cannot seem to
> > > > find them.
> > >
> > > IIRC 64-bit architectures have more registers. This should make code
> > > compiled for a 64-bit processor a little bit faster than code compiled
> > > for a 32-bit processor, even if the application doesn't actually make
> > > use of quantities larger than 32 bits. I'm not sure how much a
> > > difference this actually makes in real world benchmarks, but it's
> > > something to think about.
> > >
> > > -- Evan Klitzke
> > >
> > Hi, Evan, and others,
> >         The extra registers are valuable if the code takes advantage of them.
> > This depends upon a lot of variables.  For instance a native C compiler
> > only uses about three registers.  That is because the code is optimized
> > for the C machine's 24 instructions.  However, if the compiler breaks
> 
> C machine?  What are you talking about?  Last I heard, C was a
> language, not an architecture.
C is a language based on the C VM, which is a processor with 24 total
instructions.  You can port C to any system basically by just writing an
interpreter for the C instruction set and using the assmebly of the base
C compiler to port the entire language, editors, and so forth.  Of
course the basic operating system interface would have to be created
( the libraries ), but if you were lucky and the machine capabilities
were memory mapped (one should be so lucky all the time), and the base
drivers were originally written in C, then the whole shooting match can
be done with just a core of about 1K.  However except for very small
OS's this is never true.  However C is a language and a VM.  It is
really amazing when you think about the whole of C existing on just 24
instructions.  K&R were a couple of bright folks and had a real gift.
Maybe that is why you see their names in so many places.
> 
> > out the C code to C assembly, then calls a cross assmebler, followed by
> > an optimizing assembler, the results will occur one way, or if the C
> 
> C assembly?  Again, I am quite confused here.
That is OK, you would have to be an old timer to remember this stuff.

> > code is compiled directly into native code as some compilers do, then
> > the optimizations will occur in a different order.  Moreover, most
> > compilers can optimize for space or speed, yielding different machine
> > instructions for the same code with the same compiler.  Also the code
> 
> Of course.  You can always optimize differently.  And some compilers
> will be better than others.  A good compiler, though, should take full
> advantage of the target platform.

Taking full advantage of the target platform depends on the design of
the application. And in the case of 32 bit vs 64 bit also on something
called framing.  If you have 32 bit data interspersed with character
data, then the character data may or may not be padded (again depending
on the compiler default or author choices) to fit the 32/64 bit word.
This will entail some speed impact.

> > author can instruct the compiler to utilize register variables, and in
> > that case, most optimizing compilers (but not all) will utilize all
> > available registers to ensure that can occur.  This will produce the
> > most noticable effects in tight loops, or loops within loops such as are
> > used for array processing programs or graphics processing.
> 
> Sure.  Using "register" is a strong suggestion to the compiler, but it
> is free to ignore you.  Especially if you overallocate the number of
> registers that you have.

Of course, and since the 64 bit processors generally have more
registers, this implies a speed differential as well, but code for a 32
bit machine would not take advantage of the additional register space.

> >         So there are those things going on.  However if a 64 bit
> processor
> > running 64 bit memory is running 32 bit code, generally it will only
> > access memory about 2/3 as often, and since memory access is a "slow"
> 
> Why?  Because of 64-bit pointers?  Still, caching makes RAM access a
> very murky subject to just talk about.
Not really.  Caching has only to do with pagination from main memory.
The CPU accesses cache only.  Since the code for a 32 bit processor
would "know" only about 32 bit pagination, the break points in the code
would be different.  This may mean that a cache hit would occur on a
different schedule, severely impacting execution time.  Conversely, such
code running on a 64 bit processor with a larger cache size might never
see a cache miss and so execute faster than the equivalent 64 bit code.

> > operation relative to processor speed, the system will seem faster.
> > Also 64 bit processors generically have twice as much high speed cache,
> > thus reducing the cache misses, another gain.  Finally, the segmentation
> 
> This is irrelevant as we are talking about running a 64-bit or 32-bit
> OS on the same chip.  That chip will have the same amount of cache no
> matter what.
> 
> > of double precision floating point numbers and long pointers required
> > for 32 bit operation doesn't have to occur for 64 bit operation, another
> > (although slight) speed gain.  Finally, the pipelines on modern 64 bit
> > processors have a few more capabilities in terms of "look ahead" and
> > "prefetch" for jumps and calls than is available on their 32 bit family
> > members.
> 
> If you are going to go into pipeline design, then we are going to have
> to talk about a particular chip, as this can be wildly different
> between chips (especially Intel versus AMD).  Again, this really is
> not relevant since we are talking about the same chip in two modes.
> 
Intel particularly has multiple pipeline possibilities depending on the
way the code is authored.  It is one of the hallmarks of a good real
time programmer to know the aspects of the target machine.  And this is
very processor model, clock, cache and register dependent.  However
generically, again, the 32 bit code won't know about the 64 bit
machine's capabilities in this area.  Therefore the 32 bit code will
generally be less effective and slower because of that issue.
> > So there are many things that affect the relative speed of the
> > processors, Registers, instruction sets, optimizations, look ahead,
> > branch prefetch, cache and memory depth just to name the most obvious.
> 
> We weren't comparing the relative speed of processors.  We were
> talking about running a 64-bit processor in 32-bit of 64-bit mode.  Or
> at least that is what I was talking about.  It sounds like you were
> maybe talking about 64-bit versus 32-bit only CPUs.  Yes, comparing
> different processors is very difficult, and complex.  It can really
> only be done with benchmarks, and even then you have to be careful
> (special compilers or optimizations, etc).  You can dig deeply into
> the details (or at least as much as you can find) but this stuff gets
> very complicated very quickly.  Not to mention that usually the CPU is
> not the limiting factor on speed.  Anyway...
But when you are discussing what the compiler does for the different
versions, this can include pipeline differences as well.  This is
because the code is not just straight line setup, but is parsed
according to optimization algorithms built into the compiler, linker,
and loader.  Therefore the result of the compiled code will be different
in many different dimensions.

For references I refer you to The C Programming Language, by Brian
Kernighan and Dennis 
Ritchie.  Published by Prentice-Hall.  Copyright 1978.

Regards,
Les H