Why doesn't Valgrind detect usage of uninitialized variable? - c++

As I understand Valgrind should report errors when code contains usage of uninitialized variables. In this toy example below printer is uninitialized, but program "happily" prints message anyway.
#include <iostream>
class Printer {
public:
void print() {
std::cout<<"I PRINT"<<std::endl;
}
};
int main() {
Printer* printer;
printer->print();
};
When I test this program with Valgrind it doesn't report any errors.
Is it expected behavior? And if yes, why so?

The variable is actually never used.
The method call is inlined1, so the variable is not passed as an argument.
The method itself doesn't use this in any way, so the variable is not used at all.
Above is independent of turning optimizations on or off.
As a matter of fact, in optimized code the variable will never exist at all - not even as memory allocation.
Question about a similar case: Extern variable only in header unexpectedly working, why?
.
1 All methods defined in the class body are inlined by default.
Is it an Undefined Behavior?
Yes it is. Calling the method requires this to point at an actual, initialized intance of object to be well-formed. As Nir Friedman points out, compiler is free to assume that and optimize on that base (and IIRC this kind of optimizations can happen even with -O0!).
I'd personally expect the specific code in question to work in any practical conditions (as the pointer value is really irrelevant), but I would never rely on that. You should fix your code right now.
Detection
To detect usage of uninitialized variables in Clang/GCC, use option -Wuninitialized (or simply use -Wall, which includes this flag).
-Wuninitialized should mostly cover use of stack-allocated memory, though I guess some use of stack-allocated arrays may still slip. Some compilers may support including extra runtime checks for uninitialized reads with -fsanitize=... options, like -fsanitize=memory in Clang (thx, chtz). These checks should cover the edge cases as well as the use of heap-allocated memory.

The main() function has undefined behaviour, since printer is uninitialised and the statement printer->print() both accesses the value of printer and dereferences it via -> and the call of the member function.
Practically, however, a compiler is permitted to handle undefined behaviour by simply assuming it is not present. Then the compiler can, if it chooses, follow a chain of logic;
When it sees a statement like printer->print() this means it is allowed to reason that printer has a value that can be accessed and dereferenced without introducing undefined behaviour.
Based on this reasoning, it is then permitted to assume that printer must have been initialised (by some means invisible to the compiler) to point at a valid object.
Based on this assumption, it can reason that the statement printer->print() will result in a call of Printer::print().
Since the compiler can see the definition of Printer::print(), it can simply inline it, and execute the statement std::cout<<"I PRINT"<<std::endl.
Since it doesn't need to access printer at all to produce that output, it can optimise out any reference to the variable named printer in main().
If a compiler follows the above sequence of logic, the program will simply print I PRINT and exit, without accessing any memory in a way that might trigger a report from Valgrind.
If you think the above sounds far-fetched, then you are mistaken. LLVM/Clang is one compiler that notionally follows a chain of logic very similar to what I have described. For more information have a look at the the LLVM Project Blog link to first article, second article, and
third article.

Related

Dereferencing a nullptr class object [duplicate]

Will the program:
#include <stdio.h>
struct foo
{
void blah() {printf("blah\n");}
int i;
};
void main(int, char**)
{
((foo*)NULL)->blah();
}
Ever crash, or do anything other than output blah, on any compiler you are aware of? Will any function crash, when called via a NULL pointer, if it doesn't access any members (including the vtable)?
There have been other questions on this topic, for instance Accessing class members on a NULL pointer and Is it legal/well-defined C++ to call a non-static method that doesn't access members through a null pointer?, and it is always pointed out that this results in undefined behavior. But is this undefined in the real world, or only in the standard's world? Does any extant compiler not behave as expected? Can you think of any plausible reason why any future compiler wouldn't behave as expected?
What if the function does modify members, but the NULL ptr is guarded against. For instance,
void foo::blah()
{
foo* pThis = this ? this : new foo();
pThis->i++;
}
Edit:
For the record, the reason I wanted this was to make the interface to my linked list class as easy and concise as possible. I wanted to initialize the list to NULL have idiomatic usage look like:
pList = pList->Insert(elt);
pList = pList->Remove(elt);
...
Where all the operators return the new head element. Somehow I didn't realize that using a container class would make things even easier, with no downside.
Can you think of any plausible reason why any future compiler wouldn't behave as expected?
A helpful compiler might add code to access the real object under the hood in debug builds in the hope of helping you catch this issue in your code early in the development cycle.
What if the function does modify members, but the NULL ptr is guarded against. For instance,
void foo::blah()
{
foo* pThis = this ? this : new foo();
pThis->i++;
}
Since it is undefined behavior to call that function with a null pointer, the compiler can assume that the test will always pass and optimize that function to:
void foo::blah()
{
this->i++;
}
Note that this is correct, since if this is not null, it behaves as-if the original code was executed, and if this was null, it would be undefined behavior and the compiler does not need to provide any particular behavior at all.
Undefined behavior means you can't rely on what will happen. However it's sometimes useful to know what's happening under the covers while you're debugging so that you're not surprised when the impossible happens.
Most compilers will code this as a simple function with a hidden this parameter, and if the this parameter is never referenced the code will work as expected.
Checking for this == NULL might not work, depending on how aggressively your compiler optimizes. Since a well formed program couldn't possibly have this==NULL, the compiler is free to pretend that it will never happen and optimize away the if statement entirely. I know though that Microsoft's C++ will not make this optimization because their GetSafeHWND function relies on it working as expected.
Trying to guard for this == NULL wouldn't give you any real desirable effect. Mainly dereferencing NULL pointer, AFAIK, is undefined. It works differently for different compilers. Let's say that it does work in one scenario (like this) it doesn't work for this scenarios or this (virtual functions). The second and third scenarios are understandable, since the instance doesn't have a vtable entry to check for which of the virtual functions to call. But I'm not sure the same can be said for the first.
The other thing that you need to consider is that any invalid pointer can also give the same type of error you'd want to guard against, like this. Notice that it successfully printed 'Foo' and then went into a runtime error trying to access a. This is because the memory location being pointed to by Test* t is invalid. The same behavior is seen here, when Test* t is NULL.
So, in general, avoid such behaviors and design in your code. It's not predictable and it would cause undesirable effect if someone comes after you and changes your code thinking it should behave as it did previously.

What are the dangers of uninitialised variables?

In a program I am writing I currently have several uninitialised variables in my .h files, all of which are initialised at run-time. However, in Visual Studio it warns me every time I do this to "Always initialise a member variable" despite how seemingly pointless it feels to do so. I am well aware that attempting to use a variable when uninitialised will lead to undefined behaviour, but as far as I know, this can be avoided by not doing so. Am I overlooking something?
Thanks.
These variables could contain any value if you don't initialize them and reading them in an uninitialized stated is undefined behavior. (except if they are zero initalized)
And if you forgot to initialize one of them, and reading from it by accident results in the value you expect it should have on your current system configuration (due to undefined behavior), then your program might behave unpredictable/unexpected after a system update, on a different system or when you do changes in your code.
And these kinds of errors are hard to debug. So even if you set them at runtime it is suggested to initialize them to known values so that you have a controlled environment with predictable behavior.
There are a few exceptions, e.g. if you set the variable right after you declared it and you can't set it directly, like if you set its value using a streaming operator.
You have not included the source so we have to guess about why it happens, and I can see possible reasons with different solutions (except just zero-initializing everything):
You don't initialize at the start of the constructor, but you combine member initialization with some other code that calls some functions for the not fully initialized object. That's a mess - and you never know when some functions will call another function using some non-initialized member. If you really need this, don't send in the entire object - but only the parts you need (might need more refactoring).
You have the initialization in an Init-function. Just use the recent C++-feature of having one constructor call another instead.
You don't initialize some members in the constructor, but even later. If you really don't want to initialize it having a pointer (or std::unique_ptr) containing that data, and create it when needed; or don't have it in the object.
It's a safety measure to not allow uninitialized variables, witch is a good thing, but if you are sure of what you are doing and you make sure your variables are always initialzed before use, you can turn this off, right click on your project in solution explorer -> properties -> C/C++ -> SDL checks, this should be marked as NO. It comes as YES by default.
Note that these compile-time checks do more than just check for unitialized variables, so before you turn this off I advise reading https://learn.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=vs-2019
You can also disable a specific warning in you code using warning pragma
Personally I keep these on because IMO in the tradeoff safety/annoyance I prefer safety, but I reckon that someone else can have a different opinion.
There are two parts to this question: first is reading uninitialized variables dangerous and second is defining variables uninitialized dangerous even if I make sure I never access uninitialized variables.
What are the dangers of accessing uninitialized variables?
With very few exceptions, accessing an uninitialized variable makes the whole program have Undefined Behavior. There is a common misconception (which unfortunately is taught) that uninitialized variables have "garbage values" and so reading an uninitialized variable will result in reading some value. This is completely false. Undefined Behavior means the program can have any behavior: it can crash, it can behave as the variable has some value, it can pretend the variable doesn't even exist or all sorts of weird behaviors.
For instance:
void foo();
void bar();
void test(bool cond)
{
int a; // uninitialized
if (cond)
{
a = 24;
}
if (a == 24)
{
foo();
}
else
{
bar();
}
}
What is the result of calling the above function with true? What about with false?
test(true) will cleary call foo().
What about test(false)? If you answer: "Well it depends on what garbage value is in variable a, if it is 24 it will call foo, else it will call bar" Then you are completely wrong.
If you call test(false) the program accesses an uninitialized variable and has Undefined Behavior, it is an illegal path and so the compilers are free to assume cond is never false (because otherwise the program would be illegal). And surprise surprise both gcc and clang with optimizations enabled actually do this and generate this assembly for the function:
test(bool):
jmp foo()
So don't do this! Never access uninitialized variable! It is undefined behavior and it's much much worse than "the variable has some garbage value". Furthermore, on your system could work as you expect, on other systems or with other compiler flags it can behave in unexpected ways.
What are the dangers of defining uninitialized variables if I make sure I always initialize them later, before accessing them?
Well, the program is correct from this respect, but the source code is prone to errors. You have to mentally burden yourself with always checking if somewhere you actually initialized the variable. And if you did forget to initialize a variable finding the bug will be difficult as you have a lot of variables in your code who are defined uninitialized.
As opposed, if you always initialize your variables you and the programmers after you have a much much easier job and ease of mind.
It's just a very very good practice.

Why does my C++ program crash when I forget the return statement, rather than just returning garbage?

I've started using CLang recently to compile embedded C++ ARM programs.
Prior to this I used GCC and C, almost exclusively for embedded work.
I've noticed that when I have a method that returns a value, and I forget the return statement, the program core dumps. There is no error printed other than "msleep error -1" from one of my device drivers. This is on FreeBSD.
I would expect that forgetting the return statement would just result in garbage being returned from the function, not a core dump.
EDIT: I'm returning a bool, not a pointer or object or anything complicated. The program crashes even when the return value doesn't matter.
What is going on?
E.G.:
bool MyClass::DummyFunc() {
<do some stuff and forget the return value>
}
Elsewhere:
if(pMyObj->DummyFunc()) {
print ("Hey, it's true!\n");
} else {
print ("Darn, it's false!\n");
}
That code should not crash, regardless of the return value.
From second hand sources because I don't want to pay for the C++ standard, it apparently says:
Flowing off the end of a function is equivalent to a return with no
value; this results in undefined behavior in a value-returning
function.
This is C++, when you do something undefined you should expect it to crash your computer and eat your laundry, unless the implementation says otherwise.
Forgetting the return value results in the control flow reaching the end of a function. The C and C++ standards both describe this case. Note that main is an exception to this case and is described separately.
Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior
in a value-returning function C++11 Draft pg 136
My experience with clang has been that the compiler will treat instances of "undefined behavior" as will not happen and optimize away the paths. Which is legal, since the behavior is undefined. Often clang will instead emit an illegal instruction along the omitted path so that the code will crash if the "impossible" happens, which is likely your case here. In fact, the compiler could then determine that calling DummyFunc() results in undefined behavior and therefore cannot happen, and instead start optimizing away the calling bodies.
gcc is far "friendlier" and tries to generate something nice, such as returning 0.
Note, both compilers are correct and are producing valid code according to the standard.
I would expect that forgetting the return statement would just result in garbage being returned from the function, not a core dump.
What is going on?
Your expectation is wrong, is what's going on.
The compiler — an incredibly complicated piece of machinery — is free to assume that your function returns a value, because you promised that it did.
Then you broke that promise.
This is why we shall not make assumptions about implementation details, particularly when writing programs whose behaviour is undefined. You cannot make a direct comparison between C++ code and "some data goes into this register and there is this call stack that does precisely this thing".
There are near-infinite ways that this can break, because in the process of converting your C++ code to a well-optimised program readable by your computer, the compiler makes a series of extremely complicated optimisations, and any one of them can be snapped in half by your violation of the standard's rules. Attempting to determine exactly what happened on any particular run of your program would require the following knowledge:
CPU make, model and version
operating system make, model and version
compiler make, model and version
all compiler flags
10 years' experience with the compiler's source code
time machine to inspect the state of every bit of memory in your computer at the time of your program's execution.
It's really just not worth it.
Fortunately, in certain cases, we can additionally rationalise about this case without betraying the abstraction: for example, consider what happens when you promised you'd return a std::string, but didn't, and then that non-existent std::string goes out of scope. The destructor is being invoked on random nonsense, and that's not going to go well.

Undefined behaviour with const_cast

I was hoping that someone could clarify exactly what is meant by undefined behaviour in C++. Given the following class definition:
class Foo
{
public:
explicit Foo(int Value): m_Int(Value) { }
void SetValue(int Value) { m_Int = Value; }
private:
Foo(const Foo& rhs);
const Foo& operator=(const Foo& rhs);
private:
int m_Int;
};
If I've understood correctly the two const_casts to both a reference and a pointer in the following code will remove the const-ness of the original object of type Foo, but any attempts made to modify this object through either the pointer or the reference will result in undefined behaviour.
int main()
{
const Foo MyConstFoo(0);
Foo& rFoo = const_cast<Foo&>(MyConstFoo);
Foo* pFoo = const_cast<Foo*>(&MyConstFoo);
//MyConstFoo.SetValue(1); //Error as MyConstFoo is const
rFoo.SetValue(2); //Undefined behaviour
pFoo->SetValue(3); //Undefined behaviour
return 0;
}
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined. I know that const_casts are, broadly speaking, frowned upon, but I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed, for example:
Foo& rAnotherFoo = (Foo&)MyConstFoo;
Foo* pAnotherFoo = (Foo*)&MyConstFoo;
rAnotherFoo->SetValue(4);
pAnotherFoo->SetValue(5);
In what circumstances might this behaviour cause a fatal runtime error? Is there some compiler setting that I can set to warn me of this (potentially) dangerous behaviour?
NB: I use MSVC2008.
I was hoping that someone could clarify exactly what is meant by undefined behaviour in C++.
Technically, "Undefined Behaviour" means that the language defines no semantics for doing such a thing.
In practice, this usually means "don't do it; it can break when your compiler performs optimisations, or for other reasons".
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
In this specific example, attempting to modify any non-mutable object may "appear to work", or it may overwrite memory that doesn't belong to the program or that belongs to [part of] some other object, because the non-mutable object might have been optimised away at compile-time, or it may exist in some read-only data segment in memory.
The factors that may lead to these things happening are simply too complex to list. Consider the case of dereferencing an uninitialised pointer (also UB): the "object" you're then working with will have some arbitrary memory address that depends on whatever value happened to be in memory at the pointer's location; that "value" is potentially dependent on previous program invocations, previous work in the same program, storage of user-provided input etc. It's simply not feasible to try to rationalise the possible outcomes of invoking Undefined Behaviour so, again, we usually don't bother and instead just say "don't do it".
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
As a further complication, compilers are not required to diagnose (emit warnings/errors) for Undefined Behaviour, because code that invokes Undefined Behaviour is not the same as code that is ill-formed (i.e. explicitly illegal). In many cases, it's not tractible for the compiler to even detect UB, so this is an area where it is the programmer's responsibility to write the code properly.
The type system — including the existence and semantics of the const keyword — presents basic protection against writing code that will break; a C++ programmer should always remain aware that subverting this system — e.g. by hacking away constness — is done at your own risk, and is generally A Bad Idea.™
I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed.
Absolutely. With warning levels set high enough, a sane compiler may choose to warn you about this, but it doesn't have to and it may not. In general, this is a good reason why C-style casts are frowned upon, but they are still supported for backwards compatibility with C. It's just one of those unfortunate things.
Undefined behaviour depends on the way the object was born, you can see Stephan explaining it at around 00:10:00 but essentially, follow the code below:
void f(int const &arg)
{
int &danger( const_cast<int&>(arg);
danger = 23; // When is this UB?
}
Now there are two cases for calling f
int K(1);
f(k); // OK
const int AK(1);
f(AK); // triggers undefined behaviour
To sum up, K was born a non const, so the cast is ok when calling f, whereas AK was born a const so ... UB it is.
Undefined behaviour literally means just that: behaviour which is not defined by the language standard. It typically occurs in situations where the code is doing something wrong, but the error can't be detected by the compiler. The only way to catch the error would be to introduce a run-time test - which would hurt performance. So instead, the language specification tells you that you mustn't do certain things and, if you do, then anything could happen.
In the case of writing to a constant object, using const_cast to subvert the compile-time checks, there are three likely scenarios:
it is treated just like a non-constant object, and writing to it modifies it;
it is placed in write-protected memory, and writing to it causes a protection fault;
it is replaced (during optimisation) by constant values embedded in the compiled code, so after writing to it, it will still have its initial value.
In your test, you ended up in the first scenario - the object was (almost certainly) created on the stack, which is not write protected. You may find that you get the second scenario if the object is static, and the third if you enable more optimisation.
In general, the compiler can't diagnose this error - there is no way to tell (except in very simple examples like yours) whether the target of a reference or pointer is constant or not. It's up to you to make sure that you only use const_cast when you can guarantee that it's safe - either when the object isn't constant, or when you're not actually going to modify it anyway.
What is puzzling me is why this appears to work
That is what undefined behavior means.
It can do anything including appear to work.
If you increase your optimization level to its top value it will probably stop working.
but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
At the point it were it does the modification the object is not const. In the general case it can not tell that the object was originally a const, therefore it is not possible to warn you. Even if it was each statement is evaluated on its own without reference to the others (when looking at that kind of warning generation).
Secondly by using cast you are telling the compiler "I know what I am doing override all your safety features and just do it".
For example the following works just fine: (or will seem too (in the nasal deamon type of way))
float aFloat;
int& anIntRef = (int&)aFloat; // I know what I am doing ignore the fact that this is sensable
int* anIntPtr = (int*)&aFloat;
anIntRef = 12;
*anIntPtr = 13;
I know that const_casts are, broadly speaking, frowned upon
That is the wrong way to look at them. They are a way of documenting in the code that you are doing something strange that needs to be validated by smart people (as the compiler will obey the cast without question). The reason you need a smart person to validate is that it can lead to undefined behavior, but the good thing you have now explicitly documented this in your code (and people will definitely look closely at what you have done).
but I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed, for example:
In C++ there is no need to use a C style cast.
In the worst case the C-Style cast can be replaced by reinterpret_cast<> but when porting code you want to see if you could have used static_cast<>. The point of the C++ casts is to make them stand out so you can see them and at a glance spot the difference between the dangerous casts the benign casts.
A classic example would be trying to modify a const string literal, which may exist in a protected data segment.
Compilers may place const data in read only parts of memory for optimization reasons and attempt to modify this data will result in UB.
Static and const data are often stored in another part of you program than local variables. For const variables, these areas are often in read-only mode to enforce the constness of the variables. Attempting to write in a read-only memory results in an "undefined behavior" because the reaction depends on your operating system. "Undefined beheavior" means that the language doesn't specify how this case is to be handled.
If you want a more detailed explanation about memory, I suggest you read this. It's an explanation based on UNIX but similar mecanism are used on all OS.

Is there a practical benefit to casting a NULL pointer to an object and calling one of its member functions?

Ok, so I know that technically this is undefined behavior, but nonetheless, I've seen this more than once in production code. And please correct me if I'm wrong, but I've also heard that some people use this "feature" as a somewhat legitimate substitute for a lacking aspect of the current C++ standard, namely, the inability to obtain the address (well, offset really) of a member function. For example, this is out of a popular implementation of a PCRE (Perl-compatible Regular Expression) library:
#ifndef offsetof
#define offsetof(p_type,field) ((size_t)&(((p_type *)0)->field))
#endif
One can debate whether the exploitation of such a language subtlety in a case like this is valid or not, or even necessary, but I've also seen it used like this:
struct Result
{
void stat()
{
if(this)
// do something...
else
// do something else...
}
};
// ...somewhere else in the code...
((Result*)0)->stat();
This works just fine! It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right? So the question remains: Is there a practical use case, where one would benefit from using such a construct? I'm especially concerned about the second case, since the first case is more of a workaround for a language limitation. Or is it?
PS. Sorry about the C-style casts, unfortunately people still prefer to type less if they can.
The first case is not calling anything. It's taking the address. That's a defined, permitted, operation. It yields the offset in bytes from the start of the object to the specified field. This is a very, very, common practice, since offsets like this are very commonly needed. Not all objects can be created on the stack, after all.
The second case is reasonably silly. The sensible thing would be to declare that method static.
I don't see any benefit of ((Result*)0)->stat(); - it is an ugly hack which will likely break sooner than later. The proper C++ approach would be using a static method Result::stat() .
offsetof() on the other hand is legal, as the offsetof() macro never actually calls a method or accesses a member, but only performs address calculations.
Everybody else has done a good job of reiterating that the behavior is undefined. But lets pretend it wasn't, and that p->member is allowed to behave in a consistent manner under certain circumstances even if p isn't a valid pointer.
Your second construct would still serve almost no purpose. From a design perspective, you've probably done something wrong if a single function can do its job both with and without accessing members, and if it can then splitting the static portion of the code into a separate, static function would be much more reasonable than expecting your users to create a null pointer to operate on.
From a safety perspective, you've only protected against a small portion of the ways an invalid this pointer can be created. There's uninitialized pointers, for starters:
Result* p;
p->stat(); //Oops, 'this' is some random value
There's pointers that have been initialized, but are still invalid:
Result* p = new Result;
delete p;
p->stat(); //'this' points to "safe" memory, but the data doesn't belong to you
And even if you always initialize your pointers, and absolutely never accidentally reuse free'd memory:
struct Struct {
int i;
Result r;
}
int main() {
((Struct*)0)->r.stat(); //'this' is likely sizeof(int), not 0
}
So really, even if it weren't undefined behavior, it is worthless behavior.
Although libraries targeting specific C++ implementations may do this, that doesn't mean it's "legitimate" generally.
This works just fine! It avoids a null
pointer dereference by testing for the
existence of this, and it does not try
to access class members in the else
block. So long as these guards are in
place, it's legitimate code, right?
No, because although it might work fine on some C++ implementations, it is perfectly okay for it to not work on any conforming C++ implementation.
Dereferencing a null-pointer is undefined behavior and anything can happen if you do it. Don't do it if you want a program that works.
Just because it doesn't immediately crash in one specific test case doesn't mean that it won't get you into all kinds of trouble.
Undefined behaviour is undefined behaviour. Do this tricks "work" for your particular compiler? well, possibly. will they work for the next iteration of it. or for another compiler? Possibly not. You pays your money and you takes your choice. I can only say that in nearly 25 years of C++ programming I've never felt the need to do any of these things.
Regarding the statement:
It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right?
The code is not legitimate. There is no guarantee that the compiler and/or runtime will actually call to the method when the pointer is NULL. The checking in the method is of no help because you can't assume that the method will actually end up being called with a NULL this pointer.