Why, if the c++ standard says that the syntax is incorrect, does g++ allow it? - c++

I just read a comment that said something along the likes of:
"You should never use void main() you should always use int main()."
Now I know the reasons for using int main() (so that you can check for success on return and whatnot) but I didn't know that using void main() was illegal. I did some investigating and the only reason I could find not use void main() is because the "standard says so".
My question is:
Why, if the C++ standard says that main must return a value, does g++ allow programmers to use void main() as valid syntax? Shouldn't it return an error / warning because it goes against what the standard says?

That only means that a particular version of your compiler may allow it, but the later versions (which is likely to be more Standard conformant) may not allow it. So better write Standard Conformant code from the beginning!

According to the standard, main is indeed required to return int. But many compilers allow a return type of void since in pre-standard C++ it was allowed, and for a long time much code was written with a return type of void.
It is also worth to mention that C++ explicitly allows obission of the return statement for void:
int main() {
}
will return 0. But that is only allowed for main.

You can force the compiler to be standards compliment by using the following the build commands:
-ansi -pedantic -Wall
If you are not coding cross-platform code then -c99 might be a better choice. Not all compilers support that.

The GNU Project has a decent summary of their philosophy:
In most cases, following published standards is convenient for users—it means that their programs or scripts will work more portably. ...
But we do not follow either of these specifications rigidly, and there are specific points on which we decided not to follow them, so as to make the GNU system better for users.
For instance, Standard C says that nearly all extensions to C are prohibited. How silly! GCC implements many extensions, some of which were later adopted as part of the standard. If you want these constructs to give an error message as "required" by the standard, you must specify --pedantic, which was implemented only so that we can say "GCC is a 100% implementation of the standard," not because there is any reason to actually use it.
POSIX.2 specifies that df and du must output sizes by default in units of 512 bytes. What users want is units of 1k, so that is what we do by default. If you want the ridiculous behavior "required" by POSIX, you must set the environment variable POSIXLY_CORRECT (which was originally going to be named POSIX_ME_HARDER). ...
In particular, don’t reject a new feature, or remove an old one, merely because a standard says it is "forbidden" or "deprecated."
Sometimes, GCC has removed extensions when they caused confusion like this one. I believe this extension existed to allow old code with an incorrect main declaration to compile, not necessarily to encourage people writing void main(). Similar to the extension that allowed pre-POSIX function declarations. Besides, while int main(int argc, const char** argv) is the C-approved declaration for main, the C++ standard also sanctions int main(), and POSIX sanctions int main(int argc, const char** argv, const char** envp). There may well be other declarations that I haven't run into yet.

Related

What is the origin of void main?

Often times I see the infamous void main() around the forums and almost immediately a comment following the question telling the user to never use void main() (which I am in complete agreement with). But where is the origin of void main()?
Why am I still seeing newer people pick up the bad habit of having main return nothing when the proper way is to return an int.
I understand WHY this method is wrong as explained in this question and multitudes of others, but I don't how this method of declaring main came about or even why it is still taught to some students.
Even Bjarne Stroustrup has written void main, in C++, so it's indeed a common anti-meme, and an old one, predating Java and other contemporary languages that support void main. Of course Bjarne has also written that void main has never been part of either C or C++. However, for this latter statement (in his FAQ), at least as of C99 it looks as if Bjarne is wrong, because the N869 draft of the C99 standard says in its §5.1.2.2.3/1 that
“If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0. If the return type is not compatible with int, the termination status returned to the host environment is unspecified.”
And earlier, in its §5.1.2.2.1/1 it states about the signature of main,
“ or in some other implementation-defined manner.”
A return type “not compatible with int” could, for example, be void.
So, while this is not a complete answer (I doubt that historical sources about this are available on the net), at least it goes some way towards correcting the assumptions of the question. It is not the case that void main is a complete abomination in C and C++. But in C++ it's invalid: it's a C thing that's not supported in a hosted C++ implementation.
I have been a victim of this problem, so I think I can tell you why this happens, During our C lectures the faculties have to start our lectures using a sample program (probably "Hello World") and for that they have to use main() method.
But since they don't want to confuse students and also they don't want to get into the complexity of teaching the return types and return statements at the very start of their C programming lessons, they use(and also ask us to use) void main() and tell us to assume this as the default type till we study functions and return types in detail.
Hence this leads to develop a wrong habit of using void main() from the very first lecture of our C-Programming.
Hope that explains u well about why most of the Computer Programmers especially the newer ones pick up this bad practice.
Cheers,
Mayank
Personally I think it's the following: K&R C didn't require to specify a return type and implicitly assumed it to be int and at the same time the examples in K&R didn't use a return value.
For example the first code in K&R first edition is the following:
#include <stdio.h>
main()
{
printf("Hello World\n");
}
So it's no wonder that people reading this later (after a void type was added to the language as an extension by some compilers) assumed that main actually had a void return statement.. I would've done the same thing.
Actually K&R does say later:
In the interests of simplicity, we have omitted return statements from
our main functions up to this point, but we will include them
hereafter, as a reminder that programs should return status to their
environment.
So that's just another example of what happens when you write incorrect code and include a disclaimer later under the assumption that people will read everything before doing stupid things ;)
As one author amongst a number of others, Herbert Schildt wrote some popular but not necessarily high quality books which espoused the idea.
One egregious example is his The Annotated C Standard. He quotes the ISO/IEC 9899:1990 standard on left-hand pages and provides annotations on the right-hand pages. When he quotes section 5.1.2.2.1 Program Startup, it says:
The function called at program startup is named main. The implementation declares no prototype for this function. It can be defined with no parameters:
int main(void) { /* ... */ }
or with two parameters (…):
int main(int argc, char *argv[]) { /* ... */ }
This doesn't include the 'or in some other implementation-defined manner' clause that was added to C99.
Then, in the annotations, he says:
Interestingly, there is no prototype for main() declared by the compiler. You are therefore free to declare main() as required by your program. For example, here are three common methods of declaring main():
void main(void) /* no return value, no parameters */
int main(void) /* return a value, no parameters */
/* return a value and include command-line parameters */
int main(int argc, char *argv[])
The first variation is not allowed by the C90 standard, despite what he says, but innocent readers might be confused.
Note that section 5.1.2.2.3 Program termination says:
A return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument. If the main function executes a return that specifies no value, the termination status returned to the hosted environment is undefined.
Since you'd find that exit takes an int argument, it is clear from this that the return type of main should be int.
The commentary says:
In most implementations, the return value from main(), if there is one, is returned to the operating system. Remember, if you don't explicitly return a value from main() then the value passed to the operating system is, technically, undefined. Though most compilers will automatically return 0 when no other return value is specified (even when main() is declared as void), you should not rely on this fact because it is not guaranteed by the standard.
Some of this commentary is so much bovine excrement, a view in which I am not alone in holding. The only merit in the book is that it includes almost all of the C90 standard (there's one page missing from the description of fprintf — the same page got printed twice) for far less than the cost of the standard. It's been argued that the difference in price represents the loss of value from the commentary. See Lysator generally for some information on C, and Clive Feather's review of The Annotated C Standard.
Another of his books is C: The Complete Reference, which made it to at least the 4th Edition. The 3rd Edition used void main() extensively; this may have been cleaned up by the 4th Edition, but it's sad it took that many editions to get such a fundamental issue correct.
Embedded programs that run on bare metal, that is without an operating system, never return. On power up, the reset vector jumps indirectly (there is some memory initialization that happens first) to main and inside of main, there is an infinite while (1){} loop. Semantically, a return value for main doesn't make sense.
Possible reasons:
Java programmers used to writing public static void main(...).
Missing return statement could have some assume main does't return, although it implicitly returns 0.
In C you were able to write main() with no return type, and it would be int by default. Maybe some assume a missing return type is equivalent to a void.
Bad books / teachers?
From a C++ point-of-view, 3 sources of confusion exist:
Fundamentalist PC/desktop programmers who fanatically and blindly preach int main() without actually knowing the complete picture in the standard themselves. C and C++ have completely different rules for how main() should be declared in freestanding systems (when programming bare metal embedded systems or operative systems).
The C language, which historically has had different rules compared with C++. And in C, the rules for main() have changed over time.
Legacy compilers and coding standards from the dark ages, including programming teachers stuck in the 1980s.
I'll address each source of confusion in this answer.
The PC/desktop programmers are problematic since they assume that hosted systems are the only systems existing and therefore spread incorrect/incomplete propaganda about the correct form of main(), dogmatically stating that you must use int main(), incorrectly citing the standard while doing so, if at all.
Both the C and C++ standards has always listed two kinds of systems: freestanding and hosted.
In freestanding implementations, void main (void) has always been allowed in C. In C++, freestanding implementations are slightly different: a freestanding implementation may not name the entry function main() or it has to follow the stated forms that return int.
Not even Bjarne Stroustrup manages to cite the standards or explain this correctly/completely, so no wonder that the average programmer is confused! (He is citing the hosted environment sub-chapter and fails to cite all relevant parts of it).
This is all discussed in detail with references to the standard(s) here, Bjarne and others please read.
Regarding void main (void) in hosted systems, this originates way back, from somewhere in the dark ages before the ISO C standard, where everything was allowed.
I would suspect that the major culprit behind it is the Borland Turbo C compiler, which was already the market leader when ISO C was released in 1990. This compiler allowed void main (void).
And it should be noted that void main (void) for hosted implementations was implicitly forbidden in C90 for hosted systems, no implementation-defined forms were allowed. So Turbo C was never a strictly conforming implementation. Yet it is still used in schools (particularly in India)! Teaching every student incorrect programming standards from scratch.
Since C99, void main (void) and other forms became allowed in C, because of a strange sentence which was added: "or in some other implementation-defined manner". This is also discussed in the linked answer above, with references to the C99 rationale and other parts of the C standard that are assuming that a hosted system main() may not return int.
Therefore in C, void main (void) is (arguably) currently an allowed form for hosted implementations, given that the compiler documents what it does. But note that since this is implementation-defined behavior, it is the compiler that determines whether this form is allowed or not, not the programmer!
In C++, void main (void) is not an allowed form.

How CRT calls main , having different parameter

We can write main function in several ways,
int main()
int main(int argc,char *argv[])
int main(int argc,char *argv[],char * environment)
How run-time CRT function knows which main should be called. Please notice here, I am not asking about Unicode supported or not.
The accepted answer is incorrect, there's no special code in the CRT to recognize the kind of main() declaration.
It works because of the cdecl calling convention. Which specifies that arguments are pushed on the stack from right to left and that the caller cleans up the stack after the call. So the CRT simply passes all arguments to main() and pops them again when main() returns. The only thing you need to do is specify the arguments in the right order in your main() function declaration. The argc parameter has to be first, it is the one on the top of the stack. argv has to be second, etcetera. Omitting an argument makes no difference, as long as you omit all the ones that follow as well.
This is also why the printf() function can work, it has a variable number of arguments. With one argument in a known position, the first one.
In general, the compiler/linker would need to recognise the particular form of main that you are using and then include code to adapt that from the system startup function to your C or C++ main function.
It is true that specific compilers on specific platforms could get away without doing this, using the methods that Hans describes in his answer. However, not all platforms use the stack to pass parameters, and it is possible to write conforming C and C++ implementations which have incompatible parameter lists. For such cases, then the compiler/linker would need to determine which form of main to call.
Hmmm. It seems that perhaps the currently accepted answer, which indicates that the previously accepted answer is incorrect, is itself incorrect. The tags on this question indicate it applies to C++ as well as C, so I’ll stick to the C++ spec, not C99. Regardless of all other explanations or arguments, the primary answer to this question is that “main() is treated special in an implementation-defined way.” I believe that David's answer is technically more correct than Hans', but I'll explain it in more detail....
The main() function is a funny one, treated by the compiler & linker with behavior that matches no other function. Hans is correct that there is no special code in the CRT to recognize different signatures of main(), but his assertion that it “works because of the cdecl calling convention” applies only to specific platform(s), notably Visual Studio. The real reason that there’s no special code in the CRT to recognize different signatures of main() is that there’s no need to. And though it’s sort of splitting hairs, it’s the linker whose job it is to tie the startup code into main() at link time, it’s not the CRT’s job at startup time.
Much of how the main() function is treated is implementation-defined, as per the C++ spec (see Section 3.6, “Start and termination”). It’s likely that most implementations’ compilers treat main() implicitly with something akin to extern “C” linkage, leaving main() in a non-decorated state so that regardless of its function prototype, its linker symbol is the same. Alternatively, the linker for an implementation could be smart enough to scan through the symbol table looking for any whose decorated name resolves to some form of “[int|void] main(...)” (note that void as a return type is itself an implementation-specific thing, as the spec itself says that the return type of main() must be ‘int’). Once such a function is found in the available symbols, the linker could simply use that where the startup code refers to “main()”, so the exact symbol name doesn’t necessarily have to match anything in particular; it could even be wmain() or other, as long as either the linker knows what variations to look for, or the compiler endows all of the variations with the same symbol name.
Also key to note is that the spec says that main() may not be overloaded, so the linker shouldn’t have to “pick” between multiple user implementations of various forms of main(). If it finds more than one, that’s a duplicate symbol error (or other similar error) even if the argument lists don’t match. And though all implementations “shall” allow both
int main() { /* ... */ }
and
int main(int argc, char* argv[]) { /* ... */ }
they are also permitted to allow other argument lists, including the version you show that includes an environment string array pointer, and any other variation that makes sense in any given implementation.
As Hans indicates, the Visual Studio compiler’s cdecl calling convention (and calling conventions of many other compilers) provide a framework wherein a caller can set up the calling environment (i.e. the stack, or ABI-defined registers, or some combination of the two) in such a way that a variable number of arguments can be passed, and when the callee returns, the caller is responsible for cleanup (popping the used argument space off the stack, or in the case of registers, nothing needs done for cleanup). This setup lends itself neatly to the startup code passing more parameters than might be needed, and the user’s main() implementation is free to use or not use any of these arguments, as is the case with many platforms’ treatment of the various forms of main() you list in your question. However, this is not the only way a compiler+linker could accomplish this goal: Instead, the linker could choose between various versions of the startup code based on the definition of your main(). Doing so would allow a wide variety of main() argument lists that would otherwise be impossible with the cdecl caller-cleanup model. And since all of that is implementation-defined, it’s legal per the C++ spec, as long as the compiler+linker supports at least the two combinations shown above (int main() and int main(int, char**)).
The C 99 Standard (5.1.2.2.1 Program startup) says that an implementation enforces no prototype for the main() function, and that a program can define it as either of:
1) int main(void);
2) int main(int argc, char *argv[]);
or in a manner semantically equivalent to 2), e.g.
2') int main(int argc, char **argv);
or in other implementation defined ways. It does not mandate that the prototype:
3) int main(int argc, char *argv[],char * envp[]);
will have the intended behaviour - although that prototype must compile, because any prototype must compile. 3) is supported by GCC and Microsoft C among other compilers. (N.B. The questioner's
3rd prototype has char *envp rather than char *envp[], whether by accident or because he/she has some other compiler).
Both GCC and Microsoft C will compile main() with any prototype whatsoever, as they ought to. They parse the prototype that you actually specify and generate assembly language to consume the arguments, if any, in the correct manner. Thus for example they will each generate the expected behaviour for the program:
#include <stdio.h>
void main(double d, char c)
{
printf("%lf\n",d);
putchar(c);
}
if you could find a way of passing a double and a char directly to the program, not via an array of strings.
These observations can be verified by enabling the assembly language listings for experimental programs.
The question of how the compiler's standard CRT permits us to invoke the generated implementation of main() is distinct from the question of how main() may be defined to the compiler.
For both GCC and MS C, main() may defined any way we like. In each case however the implemention's standard CRT, AFIK, supports passing arguments to main() only than as per 3). So 1) - 2') will also have the expected behavior by ignoring excess arguments, and we have no other options short of providing a non-standard runtime of our own.
Hans Passant's answer seems incidentally misleading in suggesting that argc tells the function how many subsequent arguments to consume in the same manner as the first argument to printf(). If argc is present at all, it only denotes the number of elements in the the array passed as the second argument argv. It does not indicate how many arguments are passed to main(). Both GCC and MS C figure out how what arguments are expected by parsing the prototype that you write - essentially what a compiler does with any function except those, like printf(), that
are defined to take a variable number of arguments.
main() does not take a variable number of arguments. It takes the arguments you specify in your definition, and the standard CRTs of the usual compilers assume them to be (int, char *[], char *[]).
First, the main function is treated specifically in GCC (e.g. the main_identifier_node in file gcc/c-family/c-common.c of the source tree of GCC 4.7)
And the C11 and C++11 standards have specific wording and specification about it.
Then, the C calling ABI conventions are usually so that extra arguments don't harm much.
So you can think of it as if both the language specification and the compiler have specific things regarding "overloading" of main.
I even think that main might not be an ordinary function. I believe that some words in the standard -which I don't have right now- might be e.g. understood as forbidding taking its address or recursing on main.
In practice, main is called by some assembly code compiled into crt*.o files linked by gcc. Use gcc -v to understand more what is happenning.

Why is it bad to type void main() in C++ [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Difference between void main and int main?
Why is
void main() {
//return void
}
bad?
The other day I typed this and someone pointed out to me that it is wrong to do so. I was so confused. I have been writing like this for a while now, I know it isn't C++ standard, but the compiler doesn't give out any warnings. Why is this wrong?
Because the compiler you use does not error out on it, it doesn't mean other compilers won't. You know its not standard, after all...
It is wrong exactly because it is not standard. One compiler might accept this, another might complain, and the pedantic believers will burn your ass on the stake anyways.
Because every program should indicate to other programs whether or not it completed successfully, or if there was some sort of error, and you can't do that if your main doesn't return anything.
Plus, the standard says that main should return an int.
It's wrong because the standard (at least C++03) states that main should return an int (for hosted environments, that is - freestanding environments like embedded systems can pretty well do whatever they want). From 3.6.1 Main function, paragraph 2:
An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined.
All implementations shall allow both of the following definitions of main: int main() { /* ... */ } and int main(int argc, char* argv[]) { /* ... */ }.
If you value portability at all (and you should), you should writ code that conforms with the standard as much as practicable.
Undefined behaviour like:
x = x++ + --x;
may work (for whatever definition of "work" you have) under some circumstances as well, that doesn't make it a good idea :-)
It's nonstandard.
i.e. you're not writing "C++" (as it was conceived) when you write this. It might look like C++, but you're not following the rules, so you're not actually writing C++.
Also its result is undefined in most cases.
Unlike in other languages like C++ or C#, where "bad" behavior causes errors, C++ allows anything to happen when an erroneous construct is used. So you can't depend on the compiler doing the "correct" thing, because it may do so one time, but not another.
In general, you want to avoid undefined behavior, so you shouldn't do this.

Why isn't main defined `main(std::vector<std::string> args)`?

This question is only half tongue-in-cheek. I sometimes dream of a world without naked arrays or c strings.
If you're using c++, shouldn't the preferred definition of main be something like:
int main(std::vector<std::string> args)
?
There are already multiple definitions of main to choose from, why isn't there a version that is in the spirit of C++?
Because C++ was designed to be (almost) backwards compatible with C code.
There are cases where C code will break in a C++ compiler, but they're fairly rare, and there's generally a good reason for why this breakage is required.
But changing the signature of main, while convenient for us, isn't necessary. For someone porting code from C, it'd just be another thing you had to change, for no particular gain.
Another reason is that std::vector is a library, not a part of the core language. And so, you'd have to #include <vector> in every C++ program.
And of course, in its early years, C++ didn't have a vector. So when the vector was added to the language, sure, they could have changed the signature of main, but then they'd break not just C code, but also every existing C++ program.
Is it worth it?
There's another reason besides compatibility with C. In C++, the standard library is meant to be entirely optional. There's nothing about the C++ language itself that forces you to use things from the standard library like std::string and std::vector, and that is entirely by design. In fact, it is by design that you should be able to use some parts of the standard library without having to use others (although this has led to some generally annoying things like std::ifstream and std::ofstream operating on const char* C-style strings rather than on std::string objects).
The theory is that you are supposed to be able to take the C++ language and use whatever library of objects, containers, etc, that you want with it, be it the standard library or some proprietary library (e.g. Qt, MFC), or something that you created yourself. Defining main to accept an argument composed of types defined in the standard library defeats this design goal.
Because it will force you to include <vector> and <string>.
A concern that keeps coming back to my mind is that once you allow complex types, you end up with the risk of exceptions being thrown in the type's constructor. And, as the language is currently designed, there's absolutely no way for such an exception to be caught. If it were decided that such exceptions should be caught, then that would require considerably more work, both for the committee and compiler writers, making it all somewhat more troublesome than simply saying "allow std::vector<std::string>>".
There might be other issues as well. The whole "incompatible with runtimes" seems like something of a red herring to me, given that you can provide basically the same functionality now with macros. But something like this is rather more involved.
Like #jalf, I sometimes find myself writing
int main(int argc, char** argv) {
std::vector<std::string> args(argv, argv+argc);
But yes, like everyone said, main has to be C-compatible. I see it as an interface to the OS runtime, which is (at least int the systems I use) is written in C.
Although some development environment encourage replacements such as wmain or _tmain. You could write your own compiler/IDE, which would encourage the use of int vmain(const std::vector<std::string>& args).
Because C++ was in existence long before the C++ standard was, and built heavily on C. And, like the original ANSI C standard, codifying existing practice was an important part of it.
There's no point in changing something that works, especially if it will break a whole lot of existing code.
Even ISO C, which has been through quite a few iterations, still takes backwards compatibility very seriously.
Basically, to remain compatable with C. If we were to give up that, main() would be moved into a class.
The multiple definitions of main() aren't really multiple definitions. There are three:
int main(void) (C99)
int main(int argc, char *argv[]) (C99)
int main(int argc, char *argv[], char *envp[]) (POSIX, I think)
But in POSIX, you only really get the third. The fact that you can call a function with extra arguments is down to the C calling convention.
You can't have extern "C" int main(std::vector<std::string> argv) unless the memory layout happens to be magically compatible in a portable way. The runtime will call main() with the wrong arguments and fail. There's no easy way around this.
Instead, provided main() wasn't extern "C", the runtime could try the various supported symbols in order until it found one. I imagine main() is extern "C" by default, and that you can't overload extern "C" functions.
For more fun, void main(void).
I'll try explain in the best possible sentence.
C++ was designed to be backward compatible with C and std::vector was included in a library that only got included in C++.
Also, C++ and C programs were designed to run in shells or command lines (windows, linux, mac) and OS pass arguments to a program as an array of String. How would an OS really translate vectors?
That's the most reason I can think of, feel free to criticize it.

Undefined/Unspecified/Implementation-defined behaviour warnings?

Can't a compiler warn (even better if it throws errors) when it notices a statement with undefined/unspecified/implementation-defined behaviour?
Probably to flag a statement as error, the standard should say so, but it can warn the coder at least. Is there any technical difficulties in implementing such an option? Or is it merely impossible?
Reason I got this question is, in statements like a[i] = ++i; won't it be knowing that the code is trying to reference a variable and modifying it in the same statement, before a sequence point is reached.
It all boils down to
Quality of Implementation: the more accurate and useful the warnings are, the better it is. A compiler that always printed: "This program may or may not invoke undefined behavior" for every program, and then compiled it, is pretty useless, but is standards-compliant. Thankfully, no one writes compilers such as these :-).
Ease of determination: a compiler may not be easily able to determine undefined behavior, unspecified behavior, or implementation-defined behavior. Let's say you have a call stack that's 5 levels deep, with a const char * argument being passed from the top-level, to the last function in the chain, and the last function calls printf() with that const char * as the first argument. Do you want the compiler to check that const char * to make sure it is correct? (Assuming that the first function uses a literal string for that value.) How about when the const char * is read from a file, but you know that the file will always contain valid format specifier for the values being printed?
Success rate: A compiler may be able to detect many constructs that may or may not be undefined, unspecified, etc.; but with a very low "success rate". In that case, the user doesn't want to see a lot of "may be undefined" messages—too many spurious warning messages may hide real warning messages, or prompt a user to compile at "low-warning" setting. That is bad.
For your particular example, gcc gives a warning about "may be undefined". It even warns for printf() format mismatch.
But if your hope is for a compiler that issues a diagnostic for all undefined/unspecified cases, it is not clear if that should/can work.
Let's say you have the following:
#include <stdio.h>
void add_to(int *a, int *b)
{
*a = ++*b;
}
int main(void)
{
int i = 42;
add_to(&i, &i); /* bad */
printf("%d\n", i);
return 0;
}
Should the compiler warn you about *a = ++*b; line?
As gf says in the comments, a compiler cannot check across translation units for undefined behavior. Classic example is declaring a variable as a pointer in one file, and defining it as an array in another, see comp.lang.c FAQ 6.1.
Different compilers trap different conditions; most compilers have warning level options, GCC specifically has many, but -Wall -Werror will switch on most of the useful ones, and coerce them to errors. Use \W4 \WX for similar protection in VC++.
In GCC You could use -ansi -pedantic, but pedantic is what it says, and will throw up many irrelevant issues and make it hard to use much third party code.
Either way, because compilers catch different errors, or produce different messages for the same error, it is therefore useful to use multiple compilers, not necessarily for deployment, but as a poor-man's static analysis. Another approach for C code is to attempt to compile it as C++; the stronger type checking of C++ generally results in better C code; but be sure that if you want C compilation to work, don't use the C++ compilation exclusively; you are likely to introduce C++ specific features. Again this need not be deployed as C++, but just used as an additional check.
Finally, compilers are generally built with a balance of performance and error checking; to check exhaustively would take time that many developers would not accept. For this reason static analysers exist, for C there is the traditional lint, and the open-source splint. C++ is more complex to statically analyse, and tools are often very expensive. One of the best I have used is QAC++ from Programming Research. I am not aware of any free or open source C++ analysers of any repute.
gcc does warn in that situation (at least with -Wall):
#include <stdio.h>
int main(int argc, char *argv[])
{
int a[5];
int i = 0;
a[i] = ++i;
printf("%d\n", a[0]);
return 0;
}
Gives:
$ make
gcc -Wall main.c -o app
main.c: In function ‘main’:
main.c:8: warning: operation on ‘i’ may be undefined
Edit:
A quick read of the man page shows that -Wsequence-point will do it, if you don't want -Wall for some reason.
Contrarily, compilers are not required to make any sort of diagnosis for undefined behavior:
§1.4.1:
The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior.”
Emphasis mine. While I agree it may be nice, the compiler's have enough problem trying to be standards compliant, let alone teach the programmer how to program.
GCC warns as much as it can when you do something out of the norms of the language while still being syntactically correct, but beyond the certain point one must be informed enough.
You can call GCC with the -Wall flag to see more of that.
If your compiler won't warn of this, you can try a Linter.
Splint is free, but only checks C http://www.splint.org/
Gimpel Lint supports C++ but costs US $389 - maybe your company c an be persuaded to buy a copy? http://www.gimpel.com/