Fedora Users — Re: OT: Requesting C advice

On Fri, 2007-06-01 at 19:02 -0400, Matthew Saltzman wrote:
> On Fri, 1 Jun 2007, Les wrote:
> 
> > On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
> >>
> >>> I know why their programs failed.  I also know that C uses a pushdown
> >>                                                       ^some particular
> >>                                                        implementations of
> >>> stack for variables in subroutines.  You can check it out with a very
> >>> simple program using pointers:
> >>>
> >>>    #include <sttlib.h>
> >>>
> >>>    int i,j,k;
> >>>
> >>>    main()
> >>>    {
> >>>        int mi,mj,mk;
> >>>        int *x;
> >>>        mi=4;mj=5;mk=6;
> >>>        x=&mk;
> >>>        printf ("%d  %d  %d\n",*x++,*X++;*X++);
> >>>        x=&i;
> >>>        printf ("%d  %d  %d\n",*x++,*x++,*x++);
> >>>        i-1;j=2;k=3;
> >>>        printf ("%d  %d  %d\n",*x++,*x++,*x++);
> >>>  )
> >>>
> >>> Just an exercise you understand.  compile and run this with several c
> >>> packages, or if the package you choose supports it, have it compile K&R.
> >>> and try it.
> >>
> >> Of course, several constructs here are undefined, so there is no such
> >> thing as "correct" or "incorrect" behavior.
> >>
> >> After correcting obvious typos and adding #include <stdio.h> so it would
> >> compile, I got (using gcc-4.1.1-51.fc6 with no options):
> >>
> >>      $ ./a.out
> >>      5  4  6
> >>      0  0  0
> >>      0  0  0
> >
> > OOPS, forgot to reset the X pointer between the last two print
> > statements.  This bit of code is intended to show that globals are on a
> > heap and locals are on a stack.
> 
> Fixed that.  Now I get:
> 
> $ ./a.out
> 5  4  6
> 0  0  0
> 0  2  1
> 
> But I confess, I don't see how this code proves your point.  It does 
> demonstrate that globals are initialized by default, though.
> 
Actually, it doesn't.  And this is the problem.  Many people assume that
because they obtained 0 one time, that the value was set in memory by
some behind the scenes action of the compiler.  In fact the memory could
have been set by any of a number of actions.  Some memory chips start
with all data zero'ed (at the output, at the physical layer the
construction is designed to minimize current drain and transitions, but
that is another topic entirely.)  In that case, if power had been off
all memory not explicitly set would be zero by default.  Another
situation is when a memory checker runs, and leaves memory in a zero
state (most do by design).  Thus if the compiler doesn't initialize
memory, and the memory where the code is placed has not been used in a
prior run, the variable space will be zero.  But if the program is
deleted, and the memory filled with a nonzero pattern, and the code
reloaded and compiled, the result may be much different, and can cause
the program to crash.  When the program is saved to disk as an
executable, the memory pattern that is saved is the last state of the
code, whatever that was, and depending on how the code development
system saves the code, the variables may or may not be set to zero at
save time.  At load time, the memory will be initialized according to
the data in the executable file. 

    So, while the compiler may initialize the variables, there are other
issues that can impact the true state at run time, and therefore default
state should not be relied on as the condition.  After all, you create a
variable to store information, don't you?  Why would you not iinitialize
it?  Anyway, while this has been a good discussion, I hope that you have
begun to realize that all is not just in the compiler, but in the
implementation, in the memory of the system, and in the methods of
implementing and running code.  And by the way, Matthew, this is in no
way critizing you.  I have heard of you before, and will probably hear
great things from you in the future.

    Good luck, and good fortune.
> >
> >>
> >> Was that what you were expecting?
> >>
> >>
> >>>
> >>> I cannot vouch for every compiler, only Microsoft, Sun, and Instant C
> >>> off the top of my head.  I have used a few other packages as well.  But
> >>> any really good programmer NEVER relies on system initialization.  It is
> >>> destined to fail you at bad times.
> >>
> >> How much effort are you willing to expend to defend against potentially
> >> buggy compilers (as opposed to undefined or implementation-defined
> >> behaviors)?  The Intel fdiv bug would seem to prove that you should NEVER
> >> rely on arithmetic instructions to provide the correct answer.  There's an
> >> economic tradeoff between protecting yourself from all conceivable errors
> >> and actually getting work done.
> >>
> >
> > There is a difference between implementation differences and hardware
> > errors, which was the microsoft error.  They had
> > a bug in their silicon compiler that caused that IIRC.
> 
I misspoke here, and said Microsoft, when I meant Intel.
> I could just as easily reference some other obscure compiler bug or 
> implementation-defined behavior and make the same point.  The thing about 
> a standard is that there are clear requirements about what is 
> implementation-defined and what is not.  Static initialization in ISO C is 
> not one of those implementation-defined things.
> 
> I will concede that explicit initializations--even to default 
> values--might be a useful self-documentation tool.
> 
> >
> >>>                                     One case is as has been pointed out
> >>> here, that NULL is sometimes 0, sometimes 0x80000000, and sometimes
> >>> 0xffffffff.  Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending
> >>> on the implementation.  But strings always end in a character NULL or
> >>> 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers.
> >>> They may do otherwise on some others.  It can byte (;-) you if you are
> >>> not careful.
> >>
> >> In your source code, NULL is *always* written 0 (or sometimes (void *) 0
> >> to indicate that it's intented to stand for a null pointer value, not a
> >> NUL character value).  The string terminator character is *always* written
> >> '\0'.  The machine's representation of that value is immaterial.  If you
> >> type-pun to try to look at the actual machine's representation, your
> >> program's behavior is undefined and you deserve what you get.  It's the
> >> compiler's responsibility to ensure that things work as expected, no
> >> matter what the machine's representation is.  (For example, '\0' == 0 must
> >> return 1.)
> >>
> >
> > '\0' is an escape forcing the 0, so of course this will be equal.
> 
> OK.  But the main point is that it doesn't matter what bit pattern 
> represents a null pointer.  Your source code will always use the value 0 
> to represent it.  For example,
> 
>  	int *p;
>  	/* ...code that sets p... */
>  	if ( p == 0 ) /* *not*  if ( p == 0x80000000 ) or
>  				if ( p == 0xffffffff ) */
>  	{ /* ...handle null pointer value... */ }
> 
Actually this is one of the problem areas.  0 is an explicit, and is
actully zero.  Only if using c++ and equality is overloaded for pointers
will this work.  Otherwise the actual contents of p will be used to
compare to 0 and that will fail in some systems.  Some compilers may
deal with it as you expect, but I have not used one that did.
> >
> >>>
> >>>    And since that is so, how are those variables initialized? and to
> >>> what value?  What is a pointer set to when it is intialized.  Hint, on
> >>> Cyber the supposed default for assigned pointers used to the the address
> >>> of the pointer.  Again, system dependencies may get you.
> >>
> >> Pre-ANSI/ISO compilers might have initialized static memory to
> >> all-bits-zero even when that was not the correct representation of the
> >> default for the type being initialized.  ANSI/ISO compilers are not
> >> allowed to do that.  The required default initializations are well
> >> defined.  (This is the sort of thing that motivates the creation of
> >> standards in the first place.)
> >>
> >>>
> >>>    And those systems that used the first location to store the return
> >>> address are not re-entrant, without other supporting code in the
> >>> background.  I think I used one of those once as well.
> >>
> >> There's no requirement for re-entrancy in K&R or ANSI/ISO.  In fact
> >> several standard library routines are known to not be re-entrant.
> >>
> >
> > This is true, but knowing that the base code is not reentrant due to
> > design constraints or due to hardware constraints makes the difference
> > on modern multithreaded systems, where the same executable memory can be
> > used for the program (if the hardware allows that).
> 
> Sure, you need to know that you can compile re-entrant code if you need 
> it.
> 
> >
> >>>
> >>>    PS.  A stack doesn't necessarily mean a processor call and return
> >>> stack.  It is any mechanism of memory address where the data is applied
> >>> to the current location, then the pointer incremented (or decremented
> >>> depending on the architecture).
> >>
> >> But usually in the context of discussions about compiler architectures,
> >> call stacks are exactly what is meant.
> >>
> >
> > I am not sure that is true, because in some implementations, the data
> > heap and stack are in the same segment of memory, while the runtime
> > stack for the processor is somewhere else.  For high security systems
> > running  this should be a requirement.  It prevents obvious means of
> > inserting malicious code through variable initialization, and then stack
> > manipulation.  I say should be, because it has been tossed around from
> > time to time, but I am unsure if it has ever been formalized.
> >
> > One system I worked on looked like this:
> >    init jump
> >    heap
> >    variable stack (push down)
> >    program entrance
> >    program
> >    local libraries
> >    relocation table
> >    symbol table (if not removed)
> >    machine stack
> >
> >    Unfortunately I no longer remember which system that was.  Just the
> > fact that some standard libraries at that time would not run on it
> > because they did manipulate the stack.
> >
> > Regards,
> > Les H
> >
> 
I have said all that I know.  I hope it helps you all in the future.  C
is wonderful, compact, close to the machine, and a good language,
capable of expressing many many complex concepts.  I am sure there are
other languages out there, and I have used a few, but I love C.

Regards,
Les H