how to detect references to members of temporary objects

how to detect references to members of temporary objects - c++

My colleague recently compiled our program in Windows, and discovered a bug of the sort:
std::string a = "hello ";
std::string b = "world";
const char *p = (a+b).c_str();
printf("%s\n", p);
which for some reason did not crash in our Linux executables.
None of our compilers give any kind of warning, so we are now worried that this error might exist in the code.
Although we can grep for c_str() occurrences and do a visual inspection, there is a possibility that one might have also done the following:
struct I {
int num;
I() { num=0; }
};
struct X {
I *m;
X() { m = new I; }
~X() { delete m; }
I get() { return *m; } // version 1, or
I& get() { return *m; } // version 2
};
and accessed it like:
I& a = X().get(); // will get a reference to a temporary, or a valid copy?
cout << a.num;
instead of :
cout << X().get().num;
which is safe (isn't it?)
Question: Is there a way I can catch such errors (perhaps using the compiler, or even an assertion) ?
I need to be sure that if author of struct X changes get() between version 1 and 2 that the program will warn for the error

Simple answer: In general you cannot catch those errors, and the reason is that there are similar constructs that might be perfectly fine, so the compiler would have to know the semantics of all the functions to be able to warn you.
In simpler cases, like obtaining the address of a temporary, many compilers already warn you, but in the general case, it is quite difficult if not impossible for the compiler to know.
For some similar example to the .c_str() consider:
std::vector< const char * > v;
v.push_back( "Hi" );
const char* p = *v.begin();
The call to begin returns a temporary, similar to the expression (a+b), and you are calling a member function of that temporary (operator*) that returns a const char*, which is quite similar to your original case (from the point of view of the types involved). The problem is that in this case the pointee is still valid after the call, while in yours (.c_str()) it isn't, but it is part of the semantics of the operation, not the syntax that the compiler can check for you. The same goes for the .get() example, the compiler does not know if the returned reference is to an object that will be valid after the expression or not.
All these fall under the category of Undefined Behavior.

Check out this question's solution, I think it does something similar to what you're looking for:
C++ catching dangling reference
There are runtime based solutions which instrument the code to check
invalid pointer accesses. I've only used mudflap so far (which is
integrated in GCC since version 4.0). mudflap tries to track each
pointer (and reference) in the code and checks each access if the
pointer/reference actually points to an alive object of its base type.
Here is an example: {...}

Related

Is there a way to reinterpret_cast to a virtual derived* and calling overriden from parent?

#include <iostream>
template<typename T>
struct printer {
virtual const T* get(size_t& sz) const = 0;
void print() {
size_t sz;
const T* _t = get(sz); //tries to access 'this'
for (size_t t = 0; t < sz; t++)
std::cout << _t[t] << std::endl;
}
};
template<typename T, size_t sz>
struct mask_t : public printer<T> {
T data[sz];
virtual const T* get(size_t& _sz) const override {
_sz = sz;
return data;
}
};
int main(int argc, char** argv) {
char* buffer = new char[1024];
{
using mask_f = mask_t<float, 12>;
using mask_i = mask_t<int, 12>;
mask_f* mf = reinterpret_cast<mask_f*>(buffer + 42);
mask_i* mi = reinterpret_cast<mask_i*>(buffer + 42);
//'this' is uninitialized
mf->print();
mi->print();
mask_f _mf = *mf;
mask_i _mi = *mi;
//ok
_mf.print();
_mi.print();
}
delete[] buffer;
return 0;
}
print() tries to access this when invoking get() is it because of a vfptr lookup ? In other words is this impossible to do ?
Edit : I know I can create a new mask_t with either new or as I have done here by dereferencing the pointer. Then mask_t::this is defined.
The reason I don't want to create the instance is for performance issues [and that's not visible in my example I admit]. If you want to answer please address the only question in this post.

This is not valid code regardless of the types. In C++, you can't just cast a random pointer to an object and pretend one exists.
And yes, in C++20, they do allow you to do that under certain circumstances. But even there, those circumstances do not include operations on types with virtual member functions (as they are insufficiently trivial).
Just use placement-new to construct the object. That's how you're supposed to create objects in storage.
print() tries to access this when invoking get() is it because of a vfptr lookup ?
Does it matter? That's a implementation aspect of how the undefined behavior causes a crash.
This: mask_f* mf = reinterpret_cast<mask_f*>(buffer + 42); causes undefined behavior. As does this: mask_f _mf = *mf;. Both of these access an object which does not exist. Therefore, they both exhibit undefined behavior.
That a particular compiler (version) might make one of these appear to do what you want and might make the other crash is a matter of detail and implementation. Both of these pieces of code are equally nonsensical, and neither can be relied upon to do what you want.
I could explain why the assembly the compiler generated allowed you to get away with UB in one case and not in the other. But that ignores the fact that, either way, you're relying on UB.
I am reading memory of a remote process.
That just isn't a reasonable thing to do in C++. Not for types with virtual members, at any rate. The traditional method is to serialize the data of that type into memory, then de-serialize it in the receiving process back into a new object of that type.
Now yes, if you're willing to write platform-specific hackery, there are ways to pass virtual types between processes like this. They require extracting vtable pointers (from valid objects) and writing them into the object data received from the remote process, thus effectively "fixing" the object in-situ.
But these are platform-specific hackery; if you want something portable, you have to work with serialization.

"I know I can create a new mask_t with either new or as I have done here by dereferencing the pointer. Then mask_t::this is defined."
That is wrong. An object starts to exist when the body of its constructor is entered. new, including placement new, is a way to cause the constructor to run. Other ways are just declaring a local or global variable. But "dereferencing the pointer" as you assume does not magically cause a constructor to run, and an object to be created. And without an object, this cannot point to the current object.
Instead, you get Undefined Behavior. Anything can happen. Your harddisk could be erased.

Getting an error, or at least a warning, when using a variable that has been std::move'ed elsewhere

This:
void foo(int &&r) {
std::cout << r << std::endl;
}
int main() {
int i = 2;
foo(std::move(i));
i = 3; //no warning. any way to get some warnings here?
return 0;
}
Is there no way to tell the compiler to give me an error (or warning) if I accidentally use the variable after I have moved it? I think this would be very convenient. A lot of times I find myself moving variables elsewhere like that, but then I manually have to be VERY CAREFUL that I don't use them afterwards. Now this hasn't caused any problems yet, but who knows down the line... better be safe!
Maybe there are some preprocessor trickery (or pretty widely available compiler extensions) that exists to do this stuff?
More realistic example:
struct HugeStorage {
std::vector<double> m_vec;
HugeStorage(std::vector<double> vec) : m_vec(std::move(vec)) { }
};
struct SmallStorage {
std::vector<double> m_vec;
SmallStorage(std::vector<double> vec) : m_vec(std::move(vec)) { }
};
std::vector<double> vec_from_data_source() {
return std::vector<double>(); //only example!!
}
int main() {
std::vector<double> vec = vec_from_data_source();
if (vec.size() > 10000)
{
HugeStorage storage(std::move(vec));
//do some things, but I gotta be careful I don't do anything to vec
}
else
{
SmallStorage storage(std::move(vec));
//do some things, but I gotta be careful I don't do anything to vec
}
return 0;
}

Is there no way to tell the compiler to give me an error (or warning) if I accidentally use the variable after I have moved it?
The answer is "no, there is no way" (to the best of my knowledge at least, no currently available compiler offers such an option, and for a good reason - see below).
Even if that was possible at all, why would you expect a warning, or even worse an error, to be given in this case? First of all, moving from an integer is not any different than copying it.
Secondly, for most types, assigning a moved-from object of that type is a perfectly legal operation; this is always true of fundamental types like int, and it is definitely true of std::vector, although it might not be true of other types.
In general, whether or not assigning a moved-from object is legal depends on the particular post-conditions of the move operation for that type and on the preconditions of the assignment operator (the assignment operator for types of the Standard Library have no preconditions on the left-hand side argument). This is something a compiler cannot check in the general case.
Therefore, if you were to:
Move from an object for which the move assignment or move constructor places the moved-from object in an unspecified state (that's the case for std::vector), and then;
Invoke any function with preconditions on the state of that object (and that's not the case for the assignment to an std::vector);
That would certainly be bad. On the other hand, the compiler does not have a way to perform a semantic analysis of your program and find out whether this is the case:
A x, y;
...
if (complicatedCondition())
{
y = move(x);
}
foo(x); // Did I move from x? And if so, is it safe to call foo()?
Moreover, don't forget that the philosophy of C++ is to give you power and (most often) design guidelines, but "lets you shoot your feet" if you are really trying to do that.
There are dangerous, even meaningless things that you can do in C++ (will your compiler give you a warning, or an error, if you try to delete the same pointer twice?), but the language itself won't prevent you from doing them, under the assumption that you really, really know what you are doing.

//do some things, but I gotta be careful I don't do anything to vec
Clarification: You need to be careful that you don't do anything to vec that requires a precondition. You can do anything with vec that does not require any preconditions. For example you can assign vec a new value. You can call vec.clear(). You can call vec.size(). But do not call vec.pop_back() because that member function has a precondition.

C++: References as return values

I noticed I don't get any compiler errors when I accidentally forget to return from a function that is supposed to return a reference. I wrote some small tests to see what actually happens and I got more confused than anything.
struct Foo
{
int x;
Foo() {
x = 3;
}
};
Foo* foo = new Foo;
Foo& test(bool flag) {
if (flag)
return *foo;
}
If test() doesn't (explicitly) return a value, I will still get something returned. However the Foo object that is returned is not initialized using the default constructor — that's because x is different from 3 in the non-explicitly returned value.
What is actually happening when you don't return a reference? If this is a feature, is it safe to use it as a means to return dummy objects in case errors occur, as opposed to returning a null pointer. (See example below.)
class FooFactory
{
// Return reference...
Foo& createFooRef() {
Foo* foo = new Foo;
bool success = foo->load();
if (success)
return *foo;
// Implicit (and safe?) return value on failure?
}
// ... as opposed to returning a pointer.
Foo* createFooPtr() {
Foo* foo = new foo;
bool success = foo->load();
if (success)
return foo;
else
return 0;
}
// Yes, I am aware of the memory leaks,
// but that's not the point of the example.

Most compilers will give you a warning about this, but you may have to crank up the warning level of the compiler to see it.
No, this is not safe. It is bad. It may lead to stack corruption by just returning whatever happens to be on the stack at the time. As you've already seen, it does not use a constructor for you. If you want a default constructed object, you have to do that yourself (but be careful about returning a reference to a temporary object. That's also bad).

The usual way to lower references in compilers is to pointers. For a reference-returning function, it will mean you get an arbitrary address represented, whatever was in the register or stack slot used for the return value.
Formally in the language, the effects are undefined.

This is undefined behaviour, and infinite bad things may happen, or indeed, may not happen, or might happen sometimes, or might simultaneously happen and not happen if it doesn't like you, or send engineers from Microsoft to your house to beat you over the head with a baseball bat.

The described behaviour is not limited to functions that returns references. The following code will also compile:
int func1( int i )
{
if( i )
return 3; // C4715 warning, nothing returned if i == 0
}
I'm not sure why they generate just a warning, not an error (there might be an option in settings to turn it into error), but you will get undefined behaviour if you call such a function

References are typically just syntactic sugar for pointers, so the return is going to grab a pointer's worth of bytes from the stack for the return value. If you aren't giving it that it will just grab garbage.
I had to use the function and then add -Wall to get g++ to complain:
g++ -Wall foo.cc
foo.cc: In member function 'Foo& FooFactory::createFooRef()':
foo.cc:19: warning: control reaches end of non-void function

Have you tried compiling with /O1 optimisations or greater on and treat warnings as errors? That might fail. I remember something along those lines happening in GCC 4.1. You could forget to return the reference in debug mode, but the reference would return; as soon as you put any optimisations on it would still compile, but not return the reference. When coding in a text editor (as I was in those days) it was a total pain and a huge surprise to me.

Checking for a null reference?

Lets say you have something like this:
int& refint;
int* foo =0;
refint = *foo;
How could you verify if the reference is NULL to avoid a crash?

You can't late-initialize a reference like that. It has to be initialized when it's declared.
On Visual C++ I get
error C2530: 'refint' : references
must be initialized
with your code.
If you 'fix' the code, the crash (strictly, undefined behaviour) happens at reference usage time in VC++ v10.
int* foo = 0;
int& refint(*foo);
int i(refint); // access violation here
The way to make this safe is to check the pointer at reference initialization or assignment time.
int* foo =0;
if (foo)
{
int& refint(*foo);
int i(refint);
}
though that still does not guarantee foo points to usable memory, nor that it remains so while the reference is in scope.

You don't, by the time you have a "null" reference you already have undefined behaviour. You should always check whether a pointer is null before trying to form a reference by dereferencing the pointer.
(Your code is illegal; you can't create an uninitialized reference and try and bind it by assigning it; you can only bind it during initialization.)

In general, you can't.
Whoever "creates a null reference" (or tries to, I should say) has already invoked undefined behavior, so the code might (or might not) crash before you get a chance to check anything.
Whoever created the reference should have done:
int *foo = 0;
if (foo) {
int &refint = *foo;
... use refint for something ...
}
Normally it's considered the caller's problem if they've written *foo when foo is null, and it's not one function's responsibility to check for that kind of error in the code of other functions. But you could litter things like assert(&refint); through your code. They might help catch errors made by your callers, since after all for any function you write there's a reasonable chance the caller is yourself.

All the answers above are correct, but if for some reason you want to do this I thought at least one person should provide an answer. I am currently trying to track down a bad reference in some source code and it would be useful to see if someone has deleted this reference and set it to null at some point. Hopefully this wont generate to many down votes.
#include <iostream>
int main()
{
int* foo = nullptr;
int& refint = *foo;
if(&refint == nullptr)
std::cout << "Null" << std::endl;
else
std::cout << "Value " << refint << std::endl;
}
Output:
Null

To make the above code compile, you will have to switch the order:
int* foo =0;
int& refint = *foo; // on actual PCs, this code will crash here
(There may be older processor or runtime architectures where this worked.)

....saying all of the above, if you do want to have a null reference, use boost::optional<>, works like a charm..

You don't need to, references cannot be null.
Read the manual.

Using a function with reference as a function with pointers?

Today I stumbled over a piece of code that looked horrifying to me. The pieces was chattered in different files, I have tried write the gist of it in a simple test case below. The code base is routinely scanned with FlexeLint on a daily basis, but this construct has been laying in the code since 2004.
The thing is that a function implemented with a parameter passing using references is called as a function with a parameter passing using pointers...due to a function cast. The construct has worked since 2004 on Irix and now when porting it actually do work on Linux/gcc too.
My question now. Is this a construct one can trust? I can understand if compiler constructors implement the reference passing as it was a pointer, but is it reliable? Are there hidden risks?
Should I change the fref(..) to use pointers and risk braking anything in the process?
What do you think?
Edit
In the actual code both fptr(..) and fref(..) use the same struct - changed code below to reflect this better.
#include <iostream>
#include <string.h>
using namespace std;
// ----------------------------------------
// This will be passed as a reference in fref(..)
struct string_struct {
char str[256];
};
// ----------------------------------------
// Using pointer here!
void fptr(string_struct *str)
{
cout << "fptr: " << str->str << endl;
}
// ----------------------------------------
// Using reference here!
void fref(string_struct &str)
{
cout << "fref: " << str.str << endl;
}
// ----------------------------------------
// Cast to f(const char*) and call with pointer
void ftest(void (*fin)())
{
string_struct str;
void (*fcall)(void*) = (void(*)(void*))fin;
strcpy(str.str, "Hello!");
fcall(&str);
}
// ----------------------------------------
// Let's go for a test
int main() {
ftest((void (*)())fptr); // test with fptr that's using pointer
ftest((void (*)())fref); // test with fref that's using reference
return 0;
}

What to you think?
Clean it up. That's undefined behavior and thus a bomb which might blow up anytime. A new platform or compiler version (or moon phase, for that matter) could trip it.
Of course, I don't know what the real code looks like, but from your simplified version it seems that the easiest way would be to give string_struct an implicit constructor taking a const char*, templatize ftest() on the function pointer argument, and remove all the casts involved.

It's obviously a horrible technique, and formally it's undefined behaviour and a serious error to call a function through an incompatible type, but it should "work" in practice on a normal system.
At the machine level, a reference and a pointer have exactly the same representation; they are both just the address of something. I would fully expect that fptr and fref compile to exactly the same thing, instruction for instruction, on any computer you could get your hands on. A reference in this context can simply be thought of as syntactic sugar; a pointer that is auto-dereferenced for you. At the machine level they are exactly the same. Obviously there might be some obscure and/or defunct platforms where that might not be the case, but generally speaking that's true 99% of the time.
Furthermore, on most common platforms, all object pointers have the same representation, as do all function pointers. What you've done really isn't all that different from calling a function expecting an int through a type taking a long, on a platform where those types have the same width. It's formally illegal, and all but guaranteed to work.
It can even be inferred from the definition of malloc that all object pointers have the same representation; I can malloc a huge chunk of memory, and stick any (C-style) object I like there. Since malloc only returned one value, but that memory can be reused for any object type I like, it's hard to see how different object pointers could reasonably use different representations, unless the compiler was maintaining an big set of value-representation mappings for every possible type.
void *p = malloc(100000);
foo *f = (foo*)p; *f = some_foo;
bar *b = (bar*)p; *b = some_bar;
baz *z = (baz*)p; *z = some_baz;
quux *q = (quux*)p; *q = some_quux;
(The ugly casts are necessary in C++). The above is required to work. So while I don't think it is formally required that afterwards memcmp(f, b) == memcmp(z, q) == memcmp(f, q) == 0, but it's hard to imagine a sane implementation that could make those false.
That being said, don't do this!

It works by pure chance.
fptr expects a const char * while fref expects a string_struct &.
The struct string_struct have the same memory layout as the const char * since it only contains a 256 bytes char array, and does not have any virtual members.
In c++, call by reference e.g. string_struct & is implemented by passing a hidden pointer to the reference so on the call stack it will be the same as if it was passed as a true pointer.
But if the structure string_struct changes, everything will break so the code is not considered safe at all. Also it is dependent on compiler implementation.

Let's just agree that this is very ugly and you're going to change that code.
With the cast you promise that you make sure the types match and they clearly don't.
At least get rid of the C-style cast.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js