I was reading the Wikipedia article about Undefined behaviour.
in C the use of any automatic variable before it has been initialized yields undefined behavior
However, this answer says, it is ok for character types. Is Wikipedia wrong here?
In C (I have no idea about C++), the type unsigned char is the only type that guarantees all possible bit representations have a specific value. There is no trap representation, no invalid values, no padding bits, no nothing (as can be in other types).
However it is a bad idea to make a program that relies on an unknown bit pattern for some reason.
Undefined behavior does not mean illegal or your program will crash here.
If you use an uninitialized variable (which happens if you allocate a primitive variable and don't assign a value to it, character types being one special kind of primitive types), the value is simply not determined. It can be anything. You might not bother the fact that it can be anything, because for example you could assign a value later, maybe only in some case.
However, when this becomes serious is when you read the value of the variable and make further decisions depending on this uninitialized value, for example in a condition:
int x;
if (x > 0) {
...
} else {
...
}
This will bring you here.
What the answer you linked says is that the following is perfectly fine:
int x;
if (someCase) {
x = ...
} else {
...
}
// later:
if (someCase) {
// read x
}
Related
I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.
I've checked myself, I wrote a program like this
int main() {
int i;
cout << i;
return 0;
}
I ran the program a few times and the result was same all the time, zero.
I've tried it in C and the result was the same.
But my textbook says
If you don’t initialize an variable that’s defined
inside a function, the variable value remain undefined.That means the element takes on
whatever value previously resided at that location in memory.
How's this possible when the program always assign a free memory location to a variable? How could it be something other than zero(I assume that default free memory value is zero)?
How's this possible when the program always assign a free memory
location to a variable? How could it be something rather than zero?
Let's take a look at an example practical implementation.
Let's say it utilizes stack to keep local variables.
void
foo(void)
{
int foo_var = 42;
}
void
bar(void)
{
int bar_var;
printf("%d\n", bar_var);
}
int
main(void)
{
bar();
foo();
bar();
}
Totally broken code above illustrates the point. After we call foo, certain location on the stack where foo_var was placed is set to 42. When we call bar, bar_var occupies that exact location. And indeed, executing the code results in printing 0 and 42, showing that bar_var value cannot be relied upon unless initialized.
Now it should be clear that local variable initialisation is required. But could main be an exception? Is there anything which could play with the stack and in result give us a non-zero value?
Yes. main is not the first function executed in your program. In fact there is tons of work required to set everything up. Any of this work could have used the stack and leave some non-zeros on it. Not only you can't expect the same value on different operating systems, it may very well suddenly change on the very system you are using right now. Interested parties can google for "dynamic linker".
Finally, the C language standard does not even have the term stack. Having a "place" for local variables is left to the compiler. It could even get random crap from whatever happened to be in a given register. It really can be totally anything. In fact, if an undefined behaviour is triggered, the compiler has the freedom to do whatever it feels like.
If you don’t initialize an variable that’s defined inside a function, the variable value remain undefined.
This bit is true.
That means the element takes on whatever value previously resided at that location in memory.
This bit is not.
Sometimes in practice this will occur, and you should realise that getting zero or not getting zero perfectly fits this theory, for any given run of your program.
In theory your compiler could actually assign a random initial value to that integer if it wanted, so trying to rationalise about this is entirely pointless. But let's continue as if we assumed that "the element takes on whatever value previously resided at that location in memory"…
How could it be something rather than zero(I assume that default free memory value is zero)?
Well, this is what happens when you assume. :)
This code invokes Undefined Behavior (UB), since the variable is used uninitialized.
The compiler should emit a warning, when a warning flag is used, like -Wall for example:
warning: 'i' is used uninitialized in this function [-Wuninitialized]
cout << i;
^
It just happens, that at this run, on your system, it had the value of 0. That means that the garbage value the variable was assigned to, happened to be 0, because the memory leftovers there suggested so.
However, notice that, kernel zeroes appear relatively often. That means that it is quite common that I can get zero as an output of my system, but it is not guaranteed and should not be taken into a promise.
Static variables and global variables are initialized to zero:
Global:
int a; //a is initialized as 0
void myfunc(){
static int x; // x is also initialized as 0
printf("%d", x);}
Where as non-static variables or auto variables i.e. the local variables are indeterminate (indeterminate usually means it can do anything. It can be zero, it can be the value that was in there, it can crash the program). Reading them prior to assigning a value results in undefined behavior.
void myfunc2(){
int x; // value of x is assigned by compiler it can even be 0
printf("%d", x);}
It mostly depends on compiler but in general most cases the value is pre assumed as 0 by the compliers
I have stumbled upon the following code structure and I'm wondering whether this is intentional or just poor understanding of casting mechanisms:
struct AbstractBase{
virtual void doThis(){
//Basic implementation here.
};
virtual void doThat()=0;
};
struct DerivedA: public AbstractBase{
virtual void doThis(){
//Other implementation here.
};
virtual void doThat(){
// some stuff here.
};
};
// More derived classes with similar structure....
// Dubious stuff happening here:
void strangeStuff(AbstractBase* pAbstract, int switcher){
AbstractBase* a = NULL;
switch(switcher){
case TYPE_DERIVED_A:
// why would someone use the abstract base pointer here???
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
a->doThat();
break;
// similar case statement with other derived classes...
}
}
// "main"
DerivedA* pDerivedA = new DerivedA;
strangeStuff( pDerivedA, TYPE_DERIVED_A );
My guess is, that this dynamic_cast statement is just the result of poor understanding and very bad programming style in general (the whole way the code works, just feels painful to me) and that it doesn't cause any change in behaviour for this specific use case.
However, since I'm not an expert on casting, I'd like to know whether there are any subtle side-effects that I'm not aware of.
Blockquote [C++11: 5.2.7/9]: The value of a failed cast to pointer type is the null pointer value of the required result type.
The dynamic_cast can return NULL if the type was wrong, making the following lines crash. Hence, this can be either 1. an attempt to make (logical) errors more explicit, or 2. some sort of in-code documentation.
So while it doesn't look like the best design, it is not exactly true that the cast has no effect whatsoever.
My guess would be that the coder screwed up.
A second guess would be that you skipped a check for a being null in your simplification.
A third, and highly unlikely possibility, is that the coder was exploiting undefined behavior to optimize.
With this code:
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
if a is not of type DerivedA* (or more derived), the a->doThis() is undefined behavior. And if it is of type DerivedA*, then the dynamic_cast does absolutely nothing (guaranteed).
A compiler can, in theory, optimize out any other possibility away, even if you did not change the type of a, and remain conforming behavior. Even if someone later checks if a is null, the execution of undefined behavior on the very next line means that the compiler is free not to set a to null on the dynamic_cast line.
I would doubt that a given compiler would do this, but I could be wrong.
There are compilers that detect certain paths cause undefined behavior (in the future), eliminate such possibilities from happening backwards in execution to the point where the undefined behavior would have been set in motion, and then "know" that the code in question cannot be in the state that would trigger undefined behavior. It can then use this knowledge to optimize the code in question.
Here is an example:
std::string foo( unsigned int x ) {
std::string r;
if (x == (unsigned)-1)) {
r = "hello ";
}
int y = x;
std::stringstream ss;
ss << y;
r += ss.str();
return r;
}
The compiler can see the y=x line above. If x would overflow an int, then the conversion y=x is undefined behavior. It happens regardless of the result of the first branch.
In short, if the first branch runs, undefined behavior would result. And undefined behavior can do anything, including time travel -- it can go back in time and prevent that branch from being taken.
So the
if (x == (unsigned)-1)) {
r = "hello ";
}
branch can be eliminated by the optimizer, legally in C++.
While the above is just a toy case, gcc does optimizations very much like this. There is a flag to tell it not do.
(unsigned -1 is defined behavior, but overflowing an int is not, in C++. In practice, this is because there are platforms in which signed int overflow causes problems, and C++ doesn't want to impose extra costs on them to make a conforming implementation.)
dynamic_cast will confirm that the dynamic type does match the type indicated by the switcher variable, making the code slightly less dangerous. However, it will give a null pointer in the case of a mismatch, and the code neglects to check for that.
But it seems more likely that the author didn't really understand the use of virtual functions (for uniform treatment of polymorphic types) and RTTI (for the rarer cases where you need to distinguish between types), and attempted to invent their own form of manual, error-prone type identification.
I want to parse UTF-8 in C++. When parsing a new character, I don't know in advance if it is an ASCII byte or the leader of a multibyte character, and also I don't know if my input string is sufficiently long to contain the remaining characters.
For simplicity, I'd like to name the four next bytes a, b, c and d, and because I am in C++, I want to do it using references.
Is it valid to define those references at the beginning of a function as long as I don't access them before I know that access is safe? Example:
void parse_utf8_character(const string s) {
for (size_t i = 0; i < s.size();) {
const char &a = s[i];
const char &b = s[i + 1];
const char &c = s[i + 2];
const char &d = s[i + 3];
if (is_ascii(a)) {
i += 1;
do_something_only_with(a);
} else if (is_twobyte_leader(a)) {
i += 2;
if (is_safe_to_access_b()) {
do_something_only_with(a, b);
}
}
...
}
}
The above example shows what I want to do semantically. It doesn't illustrate why I want to do this, but obviously real code will be more involved, so defining b,c,d only when I know that access is safe and I need them would be too verbose.
There are three takes on this:
Formally
well, who knows. I could find out for you by using quite some time on it, but then, so could you. Or any reader. And it's not like that's very practically useful.
EDIT: OK, looking it up, since you don't seem happy about me mentioning the formal without looking it up for you. Formally you're out of luck:
N3280 (C++11) §5.7/5 “If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Two situations where this can produce undesired behavior: (1) computing an address beyond the end of a segment, and (2) computing an address beyond an array that the compiler knows the size of, with debug checks enabled.
Technically
you're probably OK as long as you avoid any lvalue-to-rvalue conversion, because if the references are implemented as pointers, then it's as safe as pointers, and if the compiler chooses to implement them as aliases, well, that's also ok.
Economically
relying needlessly on a subtlety wastes your time, and then also the time of others dealing with the code. So, not a good idea. Instead, declare the names when it's guaranteed that what they refer to, exists.
Before going into the legality of references to unaccessible memory, you have another problem in your code. Your call to s[i+x] might call string::operator[] with a parameter bigger then s.size(). The C++11 standard says about string::operator[] ([string.access], §21.4.5):
Requires: pos <= size().
Returns: *(begin()+pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
This means that calling s[x] for x > s.size() is undefined behaviour, so the implementation could very well terminate your program, e.g. by means of an assertion, for that.
Since string is now guaranteed to be continous, you could go around that problem using &s[i]+x to get an address. In praxis this will probably work.
However, strictly speaking doing this is still illegal unfortunately. The reason for this is that the standard allows pointer arithmetic only as long as the pointer stays inside the same array, or one past the end of the array. The relevant part of the (C++11) standard is in [expr.add], §5.7.5:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Therefore generating references or pointers to invalid memory locations might work on most implementations, but it is technically undefined behaviour, even if you never dereference the pointer/use the reference. Relying on UB is almost never a good idea , because even if it works for all targeted systems, there are no guarantees about it continuing to work in the future.
In principle, the idea of taking a reference for a possibly illegal memory address is itself perfectly legal. The reference is only a pointer under the hood, and pointer arithmetic is legal until dereferencing occurs.
EDIT: This claim is a practical one, not one covered by the published standard. There are many corners of the published standard which are formally undefined behaviour, but don't produce any kind of unexpected behaviour in practice.
Take for example to possibility of computing a pointer to the second item after the end of an array (as #DanielTrebbien suggests). The standard says overflow may result in undefined behaviour. In practice, the overflow would only occur if the upper end of the array is just short of the space addressable by a pointer. Not a likely scenario. Even when if it does happen, nothing bad would happen on most architectures. What is violated are certain guarantees about pointer differences, which don't apply here.
#JoSo If you were working with a character array, you can avoid some of the uncertainty about reference semantics by replacing the const-references with const-pointers in your code. That way you can be certain no compiler will alias the values.
I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.