Fedora Users — Re: OT: Requesting C advice

On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:

On Thu, 31 May 2007, Les wrote:

> On Thu, 2007-05-31 at 18:44 -0500, Mike McCarty wrote:
>> Les Mikesell wrote:
>>> Mike McCarty wrote:
>>>
>>
>> [Les wrote about "older" compilers]
>>
>>>> Older? ANSI C is since 1989. I guess one could characterize 19 years
>>>> as "older". :-)
>>>
>>>
>>> C was old before ANSI came along.  Maybe we could revive the discussion
>>> of why "abcd"[2] must evaluate to 'c'.
>>
>> I wasn't explicit enough, I guess. I wouldn't characterize
>> 19 years as "older", but rather as "antique" or "ancient"
>> in this context. I was objecting to using too weak a word,
>> not too strong a word :-)
>>
> I'm sorry, I don't think of myself as antique or ancient.  You assume
> that all C compilers are ANSI compliant.  They are not.  K&R was around
> a long time before the standards committee got involved.  And although
> the standard may have been generated in 1989, Microsoft didn't implement
> the ANSI standard for several years after that, and then failed some of
> the standard tests.  SUN's compiler was not ANSII compliant until about
> 1994 or so.  There are many compilers out there, and most are probably
> not fully compliant either.  AND I do know that the variables are not
> initialized in many compilers.  I have fixed code for many, many people.

If static variables are not initialized, those compilers are broken.  K&R 
woud tell you so even for pre-ANSI/ISO compilers.

> I know why their programs failed.  I also know that C uses a pushdown
                                                      ^some particular
                                                       implementations of
> stack for variables in subroutines.  You can check it out with a very
> simple program using pointers:
>
>    #include <sttlib.h>
>
>    int i,j,k;
>
>    main()
>    {
>        int mi,mj,mk;
>        int *x;
>        mi=4;mj=5;mk=6;
>        x=&mk;
>        printf ("%d  %d  %d\n",*x++,*X++;*X++);
>        x=&i;
>        printf ("%d  %d  %d\n",*x++,*x++,*x++);
>        i-1;j=2;k=3;
>        printf ("%d  %d  %d\n",*x++,*x++,*x++);
>  )
>
> Just an exercise you understand.  compile and run this with several c
> packages, or if the package you choose supports it, have it compile K&R.
> and try it.

Of course, several constructs here are undefined, so there is no such 
thing as "correct" or "incorrect" behavior.

After correcting obvious typos and adding #include <stdio.h> so it would 
compile, I got (using gcc-4.1.1-51.fc6 with no options):

     $ ./a.out
     5  4  6
     0  0  0
     0  0  0

OOPS, forgot to reset the X pointer between the last two print statements. This bit of code is intended to show that globals are on a heap and locals are on a stack.

Was that what you were expecting?


>
> I cannot vouch for every compiler, only Microsoft, Sun, and Instant C
> off the top of my head.  I have used a few other packages as well.  But
> any really good programmer NEVER relies on system initialization.  It is
> destined to fail you at bad times.

How much effort are you willing to expend to defend against potentially 
buggy compilers (as opposed to undefined or implementation-defined 
behaviors)?  The Intel fdiv bug would seem to prove that you should NEVER 
rely on arithmetic instructions to provide the correct answer.  There's an 
economic tradeoff between protecting yourself from all conceivable errors 
and actually getting work done.

There is a difference between implementation differences and hardware errors, which was the microsoft error. They had
a bug in their silicon compiler that caused that IIRC.

>                                     One case is as has been pointed out
> here, that NULL is sometimes 0, sometimes 0x80000000, and sometimes
> 0xffffffff.  Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending
> on the implementation.  But strings always end in a character NULL or
> 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers.
> They may do otherwise on some others.  It can byte (;-) you if you are
> not careful.

In your source code, NULL is *always* written 0 (or sometimes (void *) 0 
to indicate that it's intented to stand for a null pointer value, not a 
NUL character value).  The string terminator character is *always* written 
'\0'.  The machine's representation of that value is immaterial.  If you 
type-pun to try to look at the actual machine's representation, your 
program's behavior is undefined and you deserve what you get.  It's the 
compiler's responsibility to ensure that things work as expected, no 
matter what the machine's representation is.  (For example, '\0' == 0 must 
return 1.)

'\0' is an escape forcing the 0, so of course this will be equal.

>
>    And since that is so, how are those variables initialized? and to
> what value?  What is a pointer set to when it is intialized.  Hint, on
> Cyber the supposed default for assigned pointers used to the the address
> of the pointer.  Again, system dependencies may get you.

Pre-ANSI/ISO compilers might have initialized static memory to 
all-bits-zero even when that was not the correct representation of the 
default for the type being initialized.  ANSI/ISO compilers are not 
allowed to do that.  The required default initializations are well 
defined.  (This is the sort of thing that motivates the creation of 
standards in the first place.)

>
>    And those systems that used the first location to store the return
> address are not re-entrant, without other supporting code in the
> background.  I think I used one of those once as well.

There's no requirement for re-entrancy in K&R or ANSI/ISO.  In fact 
several standard library routines are known to not be re-entrant.

This is true, but knowing that the base code is not reentrant due to design constraints or due to hardware constraints makes the difference on modern multithreaded systems, where the same executable memory can be used for the program (if the hardware allows that).

>
>    PS.  A stack doesn't necessarily mean a processor call and return
> stack.  It is any mechanism of memory address where the data is applied
> to the current location, then the pointer incremented (or decremented
> depending on the architecture).

But usually in the context of discussions about compiler architectures, 
call stacks are exactly what is meant.

I am not sure that is true, because in some implementations, the data heap and stack are in the same segment of memory, while the runtime stack for the processor is somewhere else. For high security systems running this should be a requirement. It prevents obvious means of inserting malicious code through variable initialization, and then stack manipulation. I say should be, because it has been tossed around from time to time, but I am unsure if it has ever been formalized.

One system I worked on looked like this:
    init jump
    heap
    variable stack (push down)
    program entrance
    program
    local libraries
    relocation table
    symbol table (if not removed)
    machine stack

    Unfortunately I no longer remember which system that was. Just the fact that some standard libraries at that time would not run on it because they did manipulate the stack.

Regards,
Les H