Re: OT: Requesting C advice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, readers, this is getting rather long and still pretty OT. Les, if you want to continue, perhaps we should take it offline?

Comments insterspersed throughout.

On Sun, 3 Jun 2007, Les wrote:

On Fri, 2007-06-01 at 19:02 -0400, Matthew Saltzman wrote:
On Fri, 1 Jun 2007, Les wrote:

On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:

I know why their programs failed.  I also know that C uses a pushdown
                                                      ^some particular
                                                       implementations of
stack for variables in subroutines.  You can check it out with a very
simple program using pointers:

   #include <sttlib.h>

   int i,j,k;

   main()
   {
       int mi,mj,mk;
       int *x;
       mi=4;mj=5;mk=6;
       x=&mk;
       printf ("%d  %d  %d\n",*x++,*X++;*X++);
       x=&i;
       printf ("%d  %d  %d\n",*x++,*x++,*x++);
       i-1;j=2;k=3;
       printf ("%d  %d  %d\n",*x++,*x++,*x++);
 )

Just an exercise you understand.  compile and run this with several c
packages, or if the package you choose supports it, have it compile K&R.
and try it.

Of course, several constructs here are undefined, so there is no such
thing as "correct" or "incorrect" behavior.

After correcting obvious typos and adding #include <stdio.h> so it would
compile, I got (using gcc-4.1.1-51.fc6 with no options):

     $ ./a.out
     5  4  6
     0  0  0
     0  0  0

OOPS, forgot to reset the X pointer between the last two print
statements.  This bit of code is intended to show that globals are on a
heap and locals are on a stack.

Fixed that.  Now I get:

$ ./a.out
5  4  6
0  0  0
0  2  1

But I confess, I don't see how this code proves your point.  It does
demonstrate that globals are initialized by default, though.

Actually, it doesn't.  And this is the problem.  Many people assume that

Note I said "demonstrate", not "prove". For a math teacher, there's an important distinction 8^).

because they obtained 0 one time, that the value was set in memory by
some behind the scenes action of the compiler.  In fact the memory could
have been set by any of a number of actions.  Some memory chips start
with all data zero'ed (at the output, at the physical layer the
construction is designed to minimize current drain and transitions, but
that is another topic entirely.)  In that case, if power had been off
all memory not explicitly set would be zero by default.  Another
situation is when a memory checker runs, and leaves memory in a zero
state (most do by design).  Thus if the compiler doesn't initialize
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
memory, and the memory where the code is placed has not been used in a
  ^^^^^^

But this is the key: In the absence of an explicit initializer, an ISO-compliant compiler *must* generate code to properly initialize static memory (not automatic or dynamic memory) just as if the default initializer had been provided explicitly.

Proper initialization means that floats and doubles must be initialized to 0.0 and pointers must be initialized to the null pointer value, even if those bit patterns differ from all-bits-zero. (calloc() must initialize its memory to all-bits-zero.)

If you don't believe me, how about the Usenet News comp.lang.c FAQ?
See http://c-faq.com/decl/index.html for a general discussion of allocation and initialization, but pay particular attention to http://c-faq.com/decl/initval.html:
-----------------------------------
comp.lang.c FAQ list  Question 1.30

Q: What am I allowed to assume about the initial values of variables and arrays which are not explicitly initialized? If global variables start out as ``zero'', is that good enough for null pointers and floating-point zeroes?

A: Uninitialized variables with static duration (that is, those declared outside of functions, and those declared with the storage class static), are guaranteed to start out as zero, just as if the programmer had typed ``= 0'' or ``= {0}''. Therefore, such variables are implicitly initialized to the null pointer (of the correct type; see also section 5) if they are pointers, and to 0.0 if they are floating-point. [1]

Variables with automatic duration (i.e. local variables without the static storage class) start out containing garbage, unless they are explicitly initialized. (Nothing useful can be predicted about the garbage.) If they do have initializers, they are initialized each time the function is called (or, for variables local to inner blocks, each time the block is entered at the top[2] ).

These rules do apply to arrays and structures (termed aggregates); arrays and structures are considered ``variables'' as far as initialization is concerned. When an automatic array or structure has a partial initializer, the remainder is initialized to 0, just as for statics. [3] See also question 1.31.

Finally, dynamically-allocated memory obtained with malloc and realloc is likely to contain garbage, and must be initialized by the calling program, as appropriate. Memory obtained with calloc is all-bits-0, but this is not necessarily useful for pointer or floating-point values (see question 7.31, and section 5).

References: K&R1 Sec. 4.9 pp. 82-4
K&R2 Sec. 4.9 pp. 85-86
ISO Sec. 6.5.7, Sec. 7.10.3.1, Sec. 7.10.5.3
H&S Sec. 4.2.8 pp. 72-3, Sec. 4.6 pp. 92-3, Sec. 4.6.2 pp. 94-5, Sec. 4.6.3 p. 96, Sec. 16.1 p. 386

[1] This requirement means that compilers and linkers on machines which use nonzero internal representations for null pointers or floating-point zeroes cannot necessarily make use of uninitialized, 0-filled memory, but must emit explicit initializers for these values (rather as if the programmer had).

[2] Initializers are not effective if you jump into the middle of a block, either with a goto or a switch. Initializers are therefore never effective on variables declared in the main block of a switch statement.

[3] Early printings of K&R2 incorrectly stated that partially-initialized automatic aggregates were filled out with garbage. -----------------------------------

prior run, the variable space will be zero.  But if the program is
deleted, and the memory filled with a nonzero pattern, and the code
reloaded and compiled, the result may be much different, and can cause
the program to crash.  When the program is saved to disk as an
executable, the memory pattern that is saved is the last state of the
code, whatever that was, and depending on how the code development
system saves the code, the variables may or may not be set to zero at
save time.  At load time, the memory will be initialized according to
the data in the executable file.

   So, while the compiler may initialize the variables, there are other
issues that can impact the true state at run time, and therefore default
state should not be relied on as the condition.

Yes, I know all this. I've been programming since the 1960s and writing C since the 1980s.

                                                 After all, you create a
variable to store information, don't you?  Why would you not iinitialize
it?

As I said, even if you are guaranteed that initialization will take place, it can't hurt and might help readability to do it explicitly anyway.

     Anyway, while this has been a good discussion, I hope that you have
begun to realize that all is not just in the compiler, but in the
implementation, in the memory of the system, and in the methods of
implementing and running code.

Sure.  But in this case, the compiler's guarantee trumps all that.

                                And by the way, Matthew, this is in no
way critizing you.  I have heard of you before, and will probably hear
great things from you in the future.

   Good luck, and good fortune.

And the same to you, sir.



Was that what you were expecting?

But you still haven't answered this question, nor explained how your code demonstrates the difference between "the heap" and "the stack".




I cannot vouch for every compiler, only Microsoft, Sun, and Instant C
off the top of my head.  I have used a few other packages as well.  But
any really good programmer NEVER relies on system initialization.  It is
destined to fail you at bad times.

How much effort are you willing to expend to defend against potentially
buggy compilers (as opposed to undefined or implementation-defined
behaviors)?  The Intel fdiv bug would seem to prove that you should NEVER
rely on arithmetic instructions to provide the correct answer.  There's an
economic tradeoff between protecting yourself from all conceivable errors
and actually getting work done.


There is a difference between implementation differences and hardware
errors, which was the microsoft error.  They had
a bug in their silicon compiler that caused that IIRC.

I misspoke here, and said Microsoft, when I meant Intel.
I could just as easily reference some other obscure compiler bug or
implementation-defined behavior and make the same point.  The thing about
a standard is that there are clear requirements about what is
implementation-defined and what is not.  Static initialization in ISO C is
not one of those implementation-defined things.

I will concede that explicit initializations--even to default
values--might be a useful self-documentation tool.


                                    One case is as has been pointed out
here, that NULL is sometimes 0, sometimes 0x80000000, and sometimes
0xffffffff.  Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending
on the implementation.  But strings always end in a character NULL or
0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers.
They may do otherwise on some others.  It can byte (;-) you if you are
not careful.

In your source code, NULL is *always* written 0 (or sometimes (void *) 0
to indicate that it's intented to stand for a null pointer value, not a
NUL character value).  The string terminator character is *always* written
'\0'.  The machine's representation of that value is immaterial.  If you
type-pun to try to look at the actual machine's representation, your
program's behavior is undefined and you deserve what you get.  It's the
compiler's responsibility to ensure that things work as expected, no
matter what the machine's representation is.  (For example, '\0' == 0 must
return 1.)


'\0' is an escape forcing the 0, so of course this will be equal.

OK.  But the main point is that it doesn't matter what bit pattern
represents a null pointer.  Your source code will always use the value 0
to represent it.  For example,

 	int *p;
 	/* ...code that sets p... */
 	if ( p == 0 ) /* *not*  if ( p == 0x80000000 ) or
 				if ( p == 0xffffffff ) */
 	{ /* ...handle null pointer value... */ }

Actually this is one of the problem areas.  0 is an explicit, and is
actully zero.  Only if using c++ and equality is overloaded for pointers
will this work.  Otherwise the actual contents of p will be used to
compare to 0 and that will fail in some systems.  Some compilers may
deal with it as you expect, but I have not used one that did.

No, I may have been mistaken about ints and chars, but in a pointer context, 0 means a null pointer, whatever bit pattern represents it, and an ISO-compliant compiler *must* do the right thing. Again, the comp.lang.c FAQ covers null pointers in great detail (http://c-faq.com/null/index.html), but in particular, there's this (http://c-faq.com/null/machnon0.html):

----------------------------------
comp.lang.c FAQ list  Question 5.5

Q: How should NULL be defined on a machine which uses a nonzero bit pattern as the internal representation of a null pointer?

A: The same as on any other machine: as 0 (or some version of 0; see question 5.4).

Whenever a programmer requests a null pointer, either by writing ``0'' or ``NULL'', it is the compiler's responsibility to generate whatever bit pattern the machine uses for that null pointer. (Again, the compiler can tell that an unadorned 0 requests a null pointer when the 0 is in a pointer context; see question 5.2.) Therefore, #defining NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other: the compiler must always be able to generate the machine's correct null pointers in response to unadorned 0's seen in pointer contexts. A constant 0 is a null pointer constant; NULL is just a convenient name for it (see also question 5.13).

(Section 4.1.5 of the C Standard states that NULL ``expands to an implementation-defined null pointer constant,'' which means that the implementation gets to choose which form of 0 to use and whether to use a void * cast; see questions 5.6 and 5.7. ``Implementation-defined'' here does not mean that NULL might be #defined to match some implementation-specific nonzero internal null pointer value.)

See also questions 5.2, 5.10 and 5.17.

References: ISO Sec. 7.1.6
Rationale Sec. 4.1.5
----------------------------------





   And since that is so, how are those variables initialized? and to
what value?  What is a pointer set to when it is intialized.  Hint, on
Cyber the supposed default for assigned pointers used to the the address
of the pointer.  Again, system dependencies may get you.

Pre-ANSI/ISO compilers might have initialized static memory to
all-bits-zero even when that was not the correct representation of the
default for the type being initialized.  ANSI/ISO compilers are not
allowed to do that.  The required default initializations are well
defined.  (This is the sort of thing that motivates the creation of
standards in the first place.)


   And those systems that used the first location to store the return
address are not re-entrant, without other supporting code in the
background.  I think I used one of those once as well.

There's no requirement for re-entrancy in K&R or ANSI/ISO.  In fact
several standard library routines are known to not be re-entrant.


This is true, but knowing that the base code is not reentrant due to
design constraints or due to hardware constraints makes the difference
on modern multithreaded systems, where the same executable memory can be
used for the program (if the hardware allows that).

Sure, you need to know that you can compile re-entrant code if you need
it.



   PS.  A stack doesn't necessarily mean a processor call and return
stack.  It is any mechanism of memory address where the data is applied
to the current location, then the pointer incremented (or decremented
depending on the architecture).

But usually in the context of discussions about compiler architectures,
call stacks are exactly what is meant.


I am not sure that is true, because in some implementations, the data
heap and stack are in the same segment of memory, while the runtime
stack for the processor is somewhere else.  For high security systems
running  this should be a requirement.  It prevents obvious means of
inserting malicious code through variable initialization, and then stack
manipulation.  I say should be, because it has been tossed around from
time to time, but I am unsure if it has ever been formalized.

One system I worked on looked like this:
   init jump
   heap
   variable stack (push down)
   program entrance
   program
   local libraries
   relocation table
   symbol table (if not removed)
   machine stack

   Unfortunately I no longer remember which system that was.  Just the
fact that some standard libraries at that time would not run on it
because they did manipulate the stack.

Regards,
Les H


I have said all that I know.  I hope it helps you all in the future.  C
is wonderful, compact, close to the machine, and a good language,
capable of expressing many many complex concepts.  I am sure there are
other languages out there, and I have used a few, but I love C.

Hear, hear! But like any complex language, it is not without its subtleties.


Regards,
Les H




--
		Matthew Saltzman

Clemson University Math Sciences
mjs AT clemson DOT edu
http://www.math.clemson.edu/~mjs


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux