Why does the following code compile?
class Demo
{
public:
Demo() : a(this->a){}
int& a;
};
int main()
{
Demo d;
}
In this case, a is a reference to an integer. However, when I initialize Demo, I pass a reference to a reference of an integer which has not yet been initialized. Why does this compile?
This still compiles even if instead of int, I use a reference to a class which has a private default constructor. Why is this allowed?
Why does this compile?
Because it is syntactically valid.
C++ is not a safe programming language. There are several features that make it easy to do the right thing, but preventing someone from doing the wrong thing is not a priority. If you are determined to do something foolish, nothing will stop you. As long as you follow the syntax, you can try to do whatever you want, no matter how ludicrous the semantics. Keep that in mind: compiling is about syntax, not semantics.*
That being said, the people who write compilers are not without pity. They know the common mistakes (probably from personal experience), and they recognize that your compiler is in a good position to spot certain kinds of semantic mistakes. Hence, most compilers will emit warnings when you do certain things (not all things) that do not make sense. That is why you should always enable compiler warnings.
Warnings do not catch all logical errors, but for the ones they do catch (such as warning: 'Demo::a' is initialized with itself and warning: '*this.Demo::a' is used uninitialized), you've saved yourself a ton of debugging time.
* OK, there are some semantics involved in compiling, such as giving a meaning to identifiers. When I say compiling is not about semantics, I am referring to a higher level of semantics, such as the intended behavior.
Why does this compile?
Because there is no rule that would make the program ill-formed.
Why is this allowed?
To be clear, the program is well-formed, so it compiles. But the behaviour of the program is undefined, so from that perspective, the premise of your question is flawed. This isn't allowed.
It isn't possible to prove all cases where an indeterminate value is used, and it isn't easy to specify which of the easy cases should be detected by the compiler, and which would be considered to be too difficult. As such, the standard doesn't attempt to specify it, and leaves it up to the compiler to warn when it is able to detect it. For what it's worth, GCC is able to detect it in this case for example.
C++ allows you to pass a reference to a reference to uninitialized data because you might want to use the called function as the initializer.
Related
Looking at What's the point of g++ -Wreorder, I fully understand what -Wreorder is useful for. But it doesn't seem unreasonable that the compiler would be able to detect whether such a reordering is harmless:
struct Harmless {
C() : b(1), a(2) {}
int a;
int b;
};
or broken:
struct Broken {
C() : b(1), a(b + 1) {}
int a;
int b;
};
My question is then: why doesn't GCC detect (and warn about) the actual use of an undefined member in an initializer instead of this blanket warning on the ordering of initializers?
As far as I understand, -Wuninitialized only applies to automatic variables, and indeed it does not detect the error above.
EDIT:
A stab at formalizing the behavior I want:
Given initializer list : a1(expr1), a2(expr2), a3(expr3) ... an(exprn), I want a warning if (and only if) the execution of any of the initializers, in the order they will be executed, would reference an uninitialized value. I.e. in the same manner as -Wuninitialized warns about use of uninitialized automatic variables.
Some additional background: I work in a mostly windows-based company, where basically everybody but me uses Visual Studio. VS does not have this warning, thus nobody cares about having the correct order (and have no means of knowing when they screw up the ordering except manual inspection), thus leaving me with endless warnings that I have to constantly fix everytime someone breaks something. I would like to be informed about only the cases that are really problematic and ignore the benign cases. So my question is maybe better phrased as: is it technically feasible to implement a warning/error like this? My gut feeling says it is, but the fact that it isn't already implemented makes me doubt it.
My speculation is that it's for the same reason we have -Wold-style-cast: safety erring on the side of being too conservative. All it takes is a moment's inattention to transform Harmless into CarelessMistake. Maybe this developer's in a hurry or has an older version of GCC or sees that it's "just a warning" and presses on.
This is basically true among many warnings. They often are spurious, and require a little bit of restructuring to compile cleanly, but on some occasions they represent real problems. Every good programmer will prefer some working through some false positives if that means they get fewer false negatives.
I would be surprised if there's a valid direct answer to the question. There's no technical reason I see that it couldn't be done. It's just . . . why bother trying to figure out if something questionable is actually okay? Programming is the human's job.
As a personal reason, I think initializing variables in the order you declare them often makes sense.
The following code invokes undefined behaviour.
int& foo()
{
int bar = 1234;
return bar;
}
g++ issues a warning:
warning: reference to local variable ‘bar’ returned [-Wreturn-local-addr]
clang++ too:
warning: reference to stack memory associated with local variable 'bar' returned [-Wreturn-stack-address]
Why is this not a compile error (ignoring -Werror)?
Is there a case where returning a ref to a local var is valid?
EDIT As pointed out, the spec mandates this be compilable. So, why does the spec not prohibit such code?
I would say that requiring this to make the program ill-formed (that is, make this a compilation error) would complicate the standard considerably for little benefit. You'd have to exactly spell out in the standard when such cases shall be diagnosed, and all compilers would have to implement them.
If you specify too little, it will not be too useful. And compilers probably already check for this to emit warnings, and real programmers compile with -Wall_you_can_give_me -Werror anyway.
If you specify too much, it will be difficult (or impossible) for compilers to implement the standard.
Consider this class (for which you only have the header and a library):
class Foo
{
int x;
public:
int& getInteger();
};
And this code:
int& bar()
{
Foo f;
return f.getInteger();
}
Now, should the standard be written to make this ill-formed or not? Probably not, what if Foo is implemented like this:
#include "Foo.h"
int global;
int& Foo::getInteger()
{
return global;
}
At the same time, it could be implemented like this:
#include "Foo.h"
int& Foo::getInteger()
{
return x;
}
Which of course would give you a dangling reference.
My point is that the compiler cannot really know whether returning a reference is OK or not, except for a few trivial cases (returning a reference to a function-scope automatic variable or parameter of non-reference type). I don't think it's worth it to complicate the standard for that. Especially as most compilers already warn about this as a quality-of-implementation matter.
Also, because you may want to get the current stack pointer (whatever that means on your particular implementation).
This function:
void* get_stack_pointer (void) { int x; return &x; };
AFAIK, it is not undefined behavior if you don't dereference the resulting pointer.
is much more portable than this one:
void* get_stack_pointer (void) {
register void* sp asm ("%esp"); return sp; }
As to why you may want to get the stack pointer: well, there are cases where you have a valid reason to get it: for instance the conservative Boehm garbage collector needs to scan the stack (so wants the stack pointer and the stack bottom).
And if you returned a C++ reference on which you would only take its address using the & unary operator, getting such an address is IIUC legal (it is IMHO the only licit operation you can do on it).
Another reason to get the stack pointer would be to get a non-NULL pointer address (which you could e.g. hash) different of any heap, local or static data. However, you could use (void*)1 or (void*)-1 for that purpose.
So the compiler is right in only warning against this.
I guess that a C++ compiler should accept
int& get_sp_ref(void) { int x; return x; }
void show_sp(void) {
std::cout << (&(get_sp_ref())) << std::endl; }
For the same reason C allows you to return a pointer to a memory block that's been freed.
It's valid according to the language specification. It's a horribly bad idea (and is nowhere close to being guaranteed to work) but it's still valid inasmuch as it's not forbidden.
If you're asking why the standard allows this, it's probably because, when references were introduced, that's the way they worked. Each iteration of the standard has certain guidelines to follow (such as minimising the possibility of "breaking changes", those that render existing well-formed programs invalid) and the standard is an agreement between user and implementer, with undoubtedly more implementers than users sitting on the committees :-)
It may be worth pushing that idea through as a potential change and seeing what ISO say but I suspect it would be considered one of those "breaking changes" and therefore very suspect.
To expand on the earlier answers, the ISO C++ standard does not capture the distinction between warnings and errors to begin with; it simply uses the term 'diagnostic' when referring to what a compiler must emit upon seeing an ill-formed program. Quoting N3337, 1.4, paragraphs 1 and 2:
The set of diagnosable rules consists of all syntactic and semantic rules in this
International Standard except for those rules containing an explicit notation that “no
diagnostic is required” or which are described as resulting in “undefined behavior.”
Although this International Standard states only requirements on C++ implementations,
those requirements are often easier to understand if they are phrased as requirements on
programs, parts of programs, or execution of programs. Such requirements have the
following meaning:
If a program contains no violations of the rules in this International Standard, a
conforming implementation shall, within its resource limits, accept and correctly execute
that program.
If a program contains a violation of any diagnosable rule or an occurrence of a
construct described in this Standard as “conditionally-supported” when the
implementation does not support that construct, a conforming implementation shall issue
at least one diagnostic message.
If a program contains a violation of a rule for which no diagnostic is required, this
International Standard places no requirement on implementations with respect to that
program.
Something not mentioned by other answers yet is that this code is OK if the function is never called.
The compiler isn't required to diagnose whether a function might ever be called or not. For example you might set up a program which looks for counterexamples to Fermat's Last Theorem, and calls this function if it finds one. It would be a mistake for the compiler to reject such a program.
Returning reference into local variable is bad idea, however some people may create code which requires that, so compiler should only warn about that and don't determine valid (valid structure) code as erroneous.
Angew already posted sample with local variable that is actually global. However there is some other (IMHO better) sample.
Object& GetSmth()
{
Object* obj = new Object();
return *obj;
}
In this case reference to local object is valid and caller after usage should dealocate memory.
IMPORTANT NOTE I don't encourage and don't recommend to use such coding style, because it is bad, usually it is hard to understand what is going on and it leads in some kind of problems like memory leaks or crashes. It is just a sample which shows why this particular situation cannot be treated as error.
Is it possible in either gcc/g++ or ms c++ to set a flag which only allows defined behavior? so something like the below gives me a warning or preferably an error
func(a++, a, ++a)
Undefined and unspecified behavior is designated so in the standard specifically because it could cause undue burden on the implementation to diagnose all examples of it (or it would be impossible to determine).
It's expected that the programmer take care to avoid those areas that are undefined.
For your stated example it should be fairly obvious to a programmer to just not write that code in the first place.
That being said, g++ -Wall will catch some bad code, such as missing return in a non-void function to give one example.
EDIT: #sehe also points out -Wsequence-point which will catch this precise code construct, although there should be a sequence point between evaluation of each argument (the order in which arguments is evaluated is unspecified however).
GNU C++ has the following
-Wsequence-point
Warn about code that may have undefined semantics because of violations of sequence point rules in the C and C++ standards.
This will correctly flag the invocation you showed
-Wstrict-overflow
-Wstrict-overflow
-fstrict-aliasing
-fstrict-overflow
HTH
No. For example, consider the following:
int badfunc(int &a, int &b) {
return func(a++, b++);
}
This has undefined behavior if a and b have the same referand. In general the compiler cannot know what arguments will be passed to a function, so it can't reliably catch this case of undefined behavior. Therefore it can't catch all undefined behavior.
Compiler warnings serve to identify some instances of undefined behavior, but never all.
In theory you could write a C++ implementation that does vast numbers of checks at runtime to ensure that undefined behavior is always identified and dealt with in ways defined by that implementation. It still wouldn't tell you at compile time (see: halting problem), and in practice you'd probably be better off with C#, which was designed to make the necessary runtime checks reasonably efficient...
Even if you built that magical checking C++ implementation, it still might not tell you what you really want to know, which is whether your code is correct. Sometimes (hang on to your seats), it is implementation-defined whether or not behavior is undefined. For a simple example, tolower((char)-1); has defined behavior[*] if the char type is unsigned, but undefined behavior if the char type is signed.
So, unless your magical checking implementation makes all the same implementation choices as the "real" implementation that you want your code to run on, it won't tell you whether the code has defined behavior for the set of implementation choices made in the "real" implementation, only whether it has defined behavior for the implementation choices made in the magical checking implementation.
To know that your code is correct and portable, you need to know (for starters) that it produces no undefined behavior for any set of implementation choices. And, for that matter, for any input, not just the inputs used in your tests. You might think that this is a big deficiency in C++ compared to languages with no undefined behavior. Certainly it is inconvenient at times, and affects how you go about sandboxing programs for security. In practice, though, for you to consider your code correct you don't just need it to have defined behavior, you need the behavior to match the specification document. That's a much bigger problem, and in practice it isn't very much harder to write a bug in (say) Java or Python than it is in C++. I've written countless bugs in all three, and knowing that in Java or Python the behavior was defined but wrong didn't help me all that much.
[*] Well, the result is still implementation-defined, it depends on the execution character set, but the implementation has to return the correct result. If char is signed it's allowed to crash.
This gave me a good laugh. Sorry about that, didn't mean any offense; it's a good question.
There is no compiler on the planet that only allows 100% defined behavior. It's the undefined nature of things that makes it so hard. There are a lot of cases taken up in the standard, but they're often too vague to efficiently implement in a compiler.
I know Clang developers showed some interest to adding that functionality, but they haven't started as far as I know.
The only thing you can do now and in the near/far future is cranking up the warning level and strictness of your compiler. Sadly, even in recent versions, MSVC is a pain in that regard. On warning level 4 and up, it spits some stupid warnings that have nothing to do with code correctness, and you often have to jump through hoops to get them to go away.
GCC is better at that in my personal experience. I personnally use these options, ensuring the strictest checks (I currently know of)
-std=c++0x -pedantic -Wextra -Weffc++ -Wmissing-include-dirs -Wstrict-aliasing
I of course ensure zero warnings, if you want to enforce even that, just add -Werror to the line above and any error will error out. It's mostly the std and pedantic options that enforce Standard behavior, Wextra catches some off-chance semi-errors.
And of course, compile your code with different compilers if possible (and make sure they are correctly diagnosing the problem by asking here, where people know what the Standard says/means).
While I agree with Mark's answer, I just thought I should let you know...
#include <stdio.h>
int func(int a, int b, int c)
{
return a + b + c;
}
int main()
{
int a=0;
printf("%d\n", func(a++, a, ++a)); /* line 11 */
return 0;
}
When compiling the code above with gcc -Wall, I get the following warnings:
test.c:11: warning: operation on ‘a’ may be undefined
test.c:11: warning: operation on ‘a’ may be undefined
because of a++ and ++a, I suppose. So to some degree, it's been implemented. But obviously we can't expect all undefined behavior to be recognized by the compiler.
See, what I don't get is, why should programs like the following be legal?
int main()
{
static const int i = 0;
i < i > i;
}
I mean, surely, nobody actually has any current programs that have expressions with no side effects in them, since that would be very pointless, and it would make parsing & compiling the language much easier. So why not just disallow them? What benefit does the language actually gain from allowing this kind of syntax?
Another example being like this:
int main() {
static const int i = 0;
int x = (i);
}
What is the actual benefit of such statements?
And things like the most vexing parse. Does anybody, ever, declare functions in the middle of other functions? I mean, we got rid of things like implicit function declaration, and things like that. Why not just get rid of them for C++0x?
Probably because banning then would make the specification more complex, which would make compilers more complex.
it would make parsing & compiling the
language much easier
I don't see how. Why is it easier to parse and compile i < i > i if you're required to issue a diagnostic, than it is to parse it if you're allowed to do anything you damn well please provided that the emitted code has no side-effects?
The Java compiler forbids unreachable code (as opposed to code with no effect), which is a mixed blessing for the programmer, and requires a little bit of extra work from the compiler than what a C++ compiler is actually required to do (basic block dependency analysis). Should C++ forbid unreachable code? Probably not. Even though C++ compilers certainly do enough optimization to identify unreachable basic blocks, in some cases they may do too much. Should if (foo) { ...} be an illegal unreachable block if foo is a false compile-time constant? What if it's not a compile-time constant, but the optimizer has figured out how to calculate the value, should it be legal and the compiler has to realise that the reason it's removing it is implementation-specific, so as not to give an error? More special cases.
nobody actually has any current
programs that have expressions with no
side effects in them
Loads. For example, if NDEBUG is true, then assert expands to a void expression with no effect. So that's yet more special cases needed in the compiler to permit some useless expressions, but not others.
The rationale, I believe, is that if it expanded to nothing then (a) compilers would end up throwing warnings for things like if (foo) assert(bar);, and (b) code like this would be legal in release but not in debug, which is just confusing:
assert(foo) // oops, forgot the semi-colon
foo.bar();
things like the most vexing parse
That's why it's called "vexing". It's a backward-compatibility issue really. If C++ now changed the meaning of those vexing parses, the meaning of existing code would change. Not much existing code, as you point out, but the C++ committee takes a fairly strong line on backward compatibility. If you want a language that changes every five minutes, use Perl ;-)
Anyway, it's too late now. Even if we had some great insight that the C++0x committee had missed, why some feature should be removed or incompatibly changed, they aren't going to break anything in the FCD unless the FCD is definitively in error.
Note that for all of your suggestions, any compiler could issue a warning for them (actually, I don't understand what your problem is with the second example, but certainly for useless expressions and for vexing parses in function bodies). If you're right that nobody does it deliberately, the warnings would cause no harm. If you're wrong that nobody does it deliberately, your stated case for removing them is incorrect. Warnings in popular compilers could pave the way for removing a feature, especially since the standard is authored largely by compiler-writers. The fact that we don't always get warnings for these things suggests to me that there's more to it than you think.
It's convenient sometimes to put useless statements into a program and compile it just to make sure they're legal - e.g. that the types involve can be resolved/matched etc.
Especially in generated code (macros as well as more elaborate external mechanisms, templates where Policies or types may introduce meaningless expansions in some no-op cases), having less special uncompilable cases to avoid keeps things simpler
There may be some temporarily commented code that removes the meaningful usage of a variable, but it could be a pain to have to similarly identify and comment all the variables that aren't used elsewhere.
While in your examples you show the variables being "int" immediately above the pointless usage, in practice the types may be much more complicated (e.g. operator<()) and whether the operations have side effects may even be unknown to the compiler (e.g. out-of-line functions), so any benefit's limited to simpler cases.
C++ needs a good reason to break backwards (and retained C) compatibility.
Why should doing nothing be treated as a special case? Furthermore, whilst the above cases are easy to spot, one could imagine far more complicated programs where it's not so easy to identify that there are no side effects.
As an iteration of the C++ standard, C++0x have to be backward compatible. Nobody can assert that the statements you wrote does not exist in some piece of critical software written/owned by, say, NASA or DoD.
Anyway regarding your very first example, the parser cannot assert that i is a static constant expression, and that i < i > i is a useless expression -- e.g. if i is a templated type, i < i > i is an "invalid variable declaration", not a "useless computation", and still not a parse error.
Maybe the operator was overloaded to have side effects like cout<<i; This is the reason why they cannot be removed now. On the other hand C# forbids non-assignment or method calls expresions to be used as statements and I believe this is a good thing as it makes the code more clear and semantically correct. However C# had the opportunity to forbid this from the very beginning which C++ does not.
Expressions with no side effects can turn up more often than you think in templated and macro code. If you've ever declared std::vector<int>, you've instantiated template code with no side effects. std::vector must destruct all its elements when releasing itself, in case you stored a class for type T. This requires, at some point, a statement similar to ptr->~T(); to invoke the destructor. int has no destructor though, so the call has no side effects and will be removed entirely by the optimizer. It's also likely it will be inside a loop, then the entire loop has no side effects, so the entire loop is removed by the optimizer.
So if you disallowed expressions with no side effects, std::vector<int> wouldn't work, for one.
Another common case is assert(a == b). In release builds you want these asserts to disappear - but you can't re-define them as an empty macro, otherwise statements like if (x) assert(a == b); suddenly put the next statement in to the if statement - a disaster! In this case assert(x) can be redefined as ((void)0), which is a statement that has no side effects. Now the if statement works correctly in release builds too - it just does nothing.
These are just two common cases. There are many more you probably don't know about. So, while expressions with no side effects seem redundant, they're actually functionally important. An optimizer will remove them entirely so there's no performance impact, too.
This is a C++ disaster, check out this code sample:
#include <iostream>
void func(const int* shouldnotChange)
{
int* canChange = (int*) shouldnotChange;
*canChange += 2;
return;
}
int main() {
int i = 5;
func(&i);
std::cout << i;
return 0;
}
The output was 7!
So, how can we make sure of the behavior of C++ functions, if it was able to change a supposed-to-be-constant parameter!?
EDIT: I am not asking how can I make sure that my code is working as expected, rather I am wondering how to believe that someone else's function (for instance some function in some dll library) isn't going to change a parameter or posses some behavior...
Based on your edit, your question is "how can I trust 3rd party code not to be stupid?"
The short answer is "you can't." If you don't have access to the source, or don't have time to inspect it, you can only trust the author to have written sane code. In your example, the author of the function declaration specifically claims that the code will not change the contents of the pointer by using the const keyword. You can either trust that claim, or not. There are ways of testing this, as suggested by others, but if you need to test large amounts of code, it will be very labour intensive. Perhaps moreso than reading the code.
If you are working on a team and you have a team member writing stuff like this, then you can talk to them about it and explain why it is bad.
By writing sane code.
If you write code you can't trust, then obviously your code won't be trustworthy.
Similar stupid tricks are possible in pretty much any language. In C#, you can modify the code at runtime through reflection. You can inspect and change private class members. How do you protect against that? You don't, you just have to write code that behaves as you expect.
Apart from that, write a unittest testing that the function does not change its parameter.
The general rule in C++ is that the language is designed to protect you from Murphy, not Machiavelli. In other words, its meant to keep a maintainance programmer from accidentally changing a variable marked as const, not to keep someone from deliberatly changing it, which can be done in many ways.
A C-style cast means all bets are off. It's sort of like telling the compiler "Trust me, I know this looks bad, but I need to do this, so don't tell me I'm wrong." Also, what you've done is actually undefined. Casting off const-ness and then modifying the value means the compiler/runtime can do anything, including e.g. crash your program.
The only thing I can suggest is to allocate the variable shouldNotChange from a memory page that is marked as read-only. This will force the OS/CPU to raise an error if the application attempts to write to that memory. I don't really recommend this as a general method of validating functions just as an idea you may find useful.
The simplest way to enforce this would be to just not pass a pointer:
void func(int shouldnotChange);
Now a copy will be made of the argument. The function can change the value all it likes, but the original value will not be modified.
If you can't change the function's interface then you could make a copy of the value before calling the function:
int i = 5;
int copy = i
func(©);
Don't use C style casts in C++.
We have 4 cast operators in C++ (listed here in order of danger)
static_cast<> Safe (When used to 'convert numeric data types').
dynamic_cast<> Safe (but throws exceptions/returns NULL)
const_cast<> Dangerous (when removing const).
static_cast<> Very Dangerous (When used to cast pointer types. Not a very good idea!!!!!)
reinterpret_cast<> Very Dangerous. Use this only if you understand the consequences.
You can always tell the compiler that you know better than it does and the compiler will accept you at face value (the reason being that you don't want the compiler getting in the way when you actually do know better).
Power over the compiler is a two edged sword. If you know what you are doing it is a powerful tool the will help, but if you get things wrong it will blow up in your face.
Unfortunately, the compiler has reasons for most things so if you over-ride its default behavior then you better know what you are doing. Cast is one the things. A lot of the time it is fine. But if you start casting away const(ness) then you better know what you are doing.
(int*) is the casting syntax from C. C++ supports it fully, but it is not recommended.
In C++ the equivalent cast should've been written like this:
int* canChange = static_cast<int*>(shouldnotChange);
And indeed, if you wrote that, the compiler would NOT have allowed such a cast.
What you're doing is writing C code and expecting the C++ compiler to catch your mistake, which is sort of unfair if you think about it.