I have recently encountered a behavior in C++ regarding function pointers, that I can't fully understand. I asked Google for help as well as some of my more experienced colleagues, but even they couldn't help.
The following code showcases this mystique behavior:
class MyClass{
private:
int i;
public:
MyClass(): i(0) {}
MyClass(int i): i(i) {}
void PrintText() const { std::cout << "some text " << std::endl;}
};
typedef void (*MyFunction) (void*);
void func(MyClass& mc){
mc.PrintText();
}
int main(){
void* v_mc = new MyClass;
MyFunction f = (MyFunction) func; //It works!
f(v_mc); //It works correctly!!!
return 0;
}
So, first I define a simple class that will be used later (especially, it's member method PrintText). Then, I define name object void (*) (void*) as MyFunction - a pointer to function that has one void* parameter and doesn't return a value.
After that, I define function func() that accepts a reference to MyClass object and calls its method PrintText.
And finally, magic happens in main function. I dynamically allocate memory for new MyClass object casting the returned pointer to void*. Then, I cast pointer to func() function to MyFunction pointer - I didn't expect this to compile at all but it does.
And finally, I call this new object with a void* argument even though underlying function (func()) accepts reference to MyClass object. And everything works correctly!
I tried compiling this code with both Visual Studio 2010 (Windows) and XCode 5 (OSX) and it works in the same manner - no warnings are reported whatsoever. I imagine the reason why this works is that C++ references are actually implemented as pointers behind the scenes but this is not an explanation.
I hope someone can explain this behavior.
The formal explanation is simple: undefined behaviour is undefined. When you call a function through a pointer to a different function type, it's undefined behaviour and the program can legally do anything (crash, appear to work, order pizza online ... anyting goes).
You can try reasoning about why the behaviour you're experiencing happens. It's probably a combination of one or more of these factors:
Your compiler internally implements references as pointers.
On your platform, all pointers have the same size and binary representation.
Since PrintText() doesn't access *this at all, the compiler can effectively ignore the value of mc altogether and just call the PrintText() function inside func.
However, you must remember that while you're currently experiencing the behaviour you've described on your current platform, compiler version and under this phase of the moon, this could change at any time for no apparent reason whatsoever (such as a change in surrounding code triggering different optimisations). Remember that undefined behaviour is simply undefined.
As to why you can cast &func to MyFunction - the standard explicitly allows that (with a reinterpret_cast, to which the C-style cast translates in this context). You can legally cast a pointer to function to any other pointer to function type. However, pretty much the only thing you can legally do with it is move it around or cast it back to the original type. As I said above, if you call through a function pointer of the wrong type, it's undefined behaviour.
I hope someone can explain this behavior.
The behaviour is undefined.
MyFunction f = (MyFunction) func; //It works!
It "works" because you use c-style cast which has the same effect as reinterpret_cast in this case I think. If you had used static_cast or simply not cast at all, the compiler would have warned of your mistake and failed. When you call the wrongly interpreted function pointer, you get undefined behaviour.
It's only by chance that it works. Compilers are not guaranteed to make it work. Behind the scenes, your compiler is treating the reference as a pointer, so your alternative function signature just happens to work.
I'm sorry, to me isn't clear why you call this a strange behavior, I don't see a undefined behavior that depends on moon cycle here, is the way to use function pointers in C.
Adding some debug output you may see that the pointer to the object remain the same in all the calls.
void PrintText() const { std::cout << "some text " << this << std::endl;}
^^^^
void func(MyClass& mc){
std::cout << (void *)&mc << std::endl;
^^^
void *v_mc = new MyClass;
std::cout << (void *)v_mc << std::endl;
^^^^
Related
Suppose the C++ below. Before calling of a->method1() it has an
assert (a) to check if a is sane.
The call a->method2() has no such assertion; instead method2 itself
checks for a valid this by means of assert (this).
It that viable code re. the C++ specification?
Even if it's covered by the standard, it not good style of course, and
it's error prone if the code ever changes, e.g. if the method is
refactored to a virtual method. I am just curios about what the
standard has to say, and whether g++ code words by design or just by
accident.
The code below works as expected with g++, i.e. the assertion in
method2 triggers as intended, because just to call method2 no
this pointer is needed.
#include <iostream>
#include <cassert>
struct A
{
int a;
A (int a) : a(a) {}
void method1 ()
{
std::cout << a << std::endl;
}
void method2 ()
{
assert (this);
std::cout << a << std::endl;
}
};
void func1 (A *a)
{
assert (a);
a->method1();
}
void func2 (A *a)
{
a->method2();
}
int main ()
{
func1 (new A (1));
func2 (new A (2));
func2 (nullptr);
}
Output
1
2
Assertion failed: this, file main.cpp, line 16
Even if it's [permitted] by the standard
It isn't.
it not good style of course
Nope.
and it's error prone if the code ever changes, e.g. if the method is refactored to a virtual method.
I concede that a virtual member function is more likely to cause a "crash" here, but you already have undefined behaviour and that's not just a theoretical concern: you can expect things like the assertion or conditions to be elided, or other weird things to happen.
This pattern is a big no-no.
I am just curios about what the standard has to say
It says:
[expr.ref/2] [..] For the second option (arrow) the first expression shall be a prvalue having pointer type. The expression E1->E2 is converted to the equivalent form (*(E1)).E2 [..]
[expr.unary.op/1] The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. [..]
Notice that it doesn't explicitly say "the object must exist", but by saying that the expression refers to the object, it implicitly tells us that there must be an object. This sort of "gap" falls directly into the definition of undefined behaviour, by design.
whether g++ code words by design or just by accident.
The last one.
Answering your question up front: "C++: Is "assert (this)" a viable pattern?" - No.
assert(this); is pointless. The C++ standard guarantees that the this pointer is never nullptr in valid programs.
If your program has undefined behaviour then all bets are, of course, off and this might be nullptr. But an assert is not the correct fix in that case, fixing the UB is.
this cannot be nullptr, (else there is already undefined behavior).
in your case
a->method2(); // with a == nullptr
invokes undefined behavior, so checking afterward is useless.
Better signature to mean not null pointer is reference:
void func3(A& a)
{
a.method1();
}
int main ()
{
A a1(1); // no new, so no (missing) delete :-)
A a2(2);
func1(&a1);
func2(&a2);
func2(nullptr); :/
func3(a1);
}
It is, as far as I have known, been a good rule that a pointer like argument type to a function should be a pointer if the argument can sensible be null and it should be a reference if the argument should never be null.
Based on that "rule", I have naiively expected that doing something like
someMethodTakingAnIntReference(*aNullPointer) would fail when trying to make the call, but to my surprise the following code is running just fine which kinda makes "the rule" less usable. A developer can still read meaning from the argument type being reference, but the compiler doesn't help and the location of the runtime error does not either.
Am I misunderstanding the point of this rule, or is this undefined behavior, or...?
int test(int& something1, int& something2)
{
return something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs fine returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
While the above works, obviously the following does not, but the same would be true if the references were just pointers; meaning that the rule and the compiler is not really helping.
int test(int& something1, int& something2)
{
return something1+something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
Writing test(*i1, *i2) causes undefined behaviour; specifically the part *i1. This is covered in the C++ Standard by [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
This defines the behaviour of *X only for the case where X points to an object or function. Since i1 does not point to an object or function, the standard does not define the behaviour of *i1, therefore it is undefined behaviour. (This is sometimes known as "undefined by omission", and this same practice handles many other uses of lvalues that don't designate objects).
As described in the linked page, undefined behaviour does not necessitate any sort of diagnostic message. The runtime behaviour could literally be anything. The compiler could, but is not required to, generate a compilation warning or error. In general, it's up to the programmer to comply with the rules of the language. The compiler helps out to some extent but it cannot cover all cases.
You're better off thinking of references as little more than a handy notation for pointers.
They are still pointers, and the runtime error occurs when you use (dereference) a null pointer, not when you pass it to a function.
(An added advantage of references is that they can not be changed to reference something else, once initialized.)
Consider following program:
#include <iostream>
void f(void* a)
{
std::cout<<"(void*)fun is called\n";
std::cout<<*(int*)a<<'\n';
}
int main()
{
int a=9;
void* b=(int*)&a;
f(b);
return 0;
}
If I change the function call statement like this:
f(&b);
It still compiles fine & crashes at runtime. Why? What is the reason? Should I not get the compile time error? Because the correct way to call the function is f(b). right? Also, why it is allowed to pass NULL to a function whose parameter is of type (void*)?
Please correct me If I am missing something or understanding something incorrectly.
It still compiles fine & crashes at runtime. Why? What is the reason?
Because void* is a technique for removing all type-safety and type checking.
Should I not get the compile time error?
By using void* instead of the correct pointer type int*, you are expressly telling the compiler not to tell you if you are using a type incorrectly or in an undefined way.
Because the correct way to call the function is f(b). right?
That's where your function declaration and contents disagree.
std::cout<<"(void*)fun is called\n";
std::cout<<*(int*)a<<'\n';
The contents above imply that a pointer to int should be passed:
void f(void* a)
This declaration implies some pointer should be passed, and no other restrictions are made.
void* can capture any type of pointers, there is no exception to void**
Ok.
As requested.
Do not use void pointers unless it you cannot think of any other way around it.
And the go to bed and think again.
Void pointers enables the programmer to forget about types. This means that the compile can give up on simple checks. This also in my mind means that the programmer has lost the plot.
Downvote me if you wish.
Using types have the luxury that the compiler can check things out for you. E.g. how things are related. How to treat that object.
But using a void pointer you are very much on your own. Good luck
You'll not get a compile time error because f(&b) calls f and passes the address of b as a parameter which is then casted into a void*. You get a runtime error because then, you are trying to cast a pointer to an integer as an integer.
But yes, as others have stated, doing this a very bad.
First
You can have a void* point to void**. Your code is one of the many examples showing how dangerous void* pointers can be.
Second
You should for type conversion use:
void* b = static_cast<int*>(&a);
instead of the c style conversion you are using:
void*b = (int*)&a;
If I change the function call statement like this:
f(&b);
It still compiles fine & crashes at runtime. Why?
Your function f(void*) will accept a pointer to any type, without complaint. Any pointer quietly converts to a void pointer. A pointer to a pointer is still a pointer. So your second case does indeed compile just fine. And then it crashes. Maybe.
In the first case, you converted from a pointer to int to a pointer to void back to a pointer to int. Those round trip conversions through void* (and also through char*) must work. In the second case, you converted from a void** to a void* to an int*. Now you're invoking undefined behavior. Anything goes. On my computer, my compiler, your code runs just fine. It prints garbage. I was quite sure that your code wouldn't erase my hard drive, but it could. Anything goes with undefined behavior. Don't invoke undefined behavior.
The reason for supporting void* is historic. There is a lot of old C and C++ code that use void pointers. The only reason to write new C++ code that uses void pointers is if you need to interact with one of those old functions that use void pointers.
I found myself here because I am working on a homework assignment that requests the same functionality. After combining each comment, this is what I came up with.
// A function that accepts a void pointer
void f(void* a)
{
std::cout<<"(void*)fun is called\n";
std::cout<< "value for a: " << *(int*)a << '\n';
}
int main() {
int a = 9;
void* c = static_cast<int*>(&a);
int b = 3;
f(&b);
f(c);
}
what happens when you dereference a pointer when passing by reference to a function?
Here is a simple example
int& returnSame( int &example ) { return example; }
int main()
{
int inum = 3;
int *pinum = & inum;
std::cout << "inum: " << returnSame(*pinum) << std::endl;
return 0;
}
Is there a temporary object produced?
Dereferencing the pointer doesn't create a copy; it creates an lvalue that refers to the pointer's target. This can be bound to the lvalue reference argument, and so the function receives a reference to the object that the pointer points to, and returns a reference to the same. This behaviour is well-defined, and no temporary object is involved.
If it took the argument by value, then that would create a local copy, and returning a reference to that would be bad, giving undefined behaviour if it were accessed.
The Answer To Your Question As Written
No, this behavior is defined. No constructors are called when pointer types are dereferenced or reference types used, unless explicitly specified by the programmer, as with the following snippet, in which the new operator calls the default constructor for the int type.
int* variable = new int;
As for what is really happening, as written, returnSame(*pinum) is the same variable as inum. If you feel like verifying this yourself, you could use the following snippet:
returnSame(*pinum) = 10;
std::cout << "inum: " << inum << std::endl;
Further Analysis
I'll start by correcting your provided code, which it doesn't look like you tried to compile before posting it. After edits, the one remaining error is on the first line:
int& returnSame( int &example ) { return example; } // semi instead of colon
Pointers and References
Pointers and references are treated in the same way by the compiler, they differ in their use, not so much their implementation. Pointer types and reference types store, as their value, the location of something else. Pointer dereferencing (using the * or -> operators) instructs the compiler to produce code to follow the pointer and perform the operation on the location it refers to rather than the value itself. No new data is allocated when you dereference a pointer (no constructors are called).
Using references works in much the same way, except the compiler automatically assumes that you want the value at the location rather than the location itself. As a matter of fact, it is impossible to refer to the location specified by a reference in the same way pointers allow you to: once assigned, a reference cannot be reseated (changed) (that is, without relying on undefined behavior), however you can still get its value by using the & operator on it. It's even possible to have a NULL reference, though handling of these is especially tricky and I don't recommend using them.
Snippet analysis
int *pinum = & inum;
Creates a pointer pointing to an existing variable, inum. The value of the pointer is the memory address that inum is stored in. Creating and using pointers will NOT call a constructor for a pointed-to object implicitly, EVER. This task is left to the programmer.
*pinum
Dereferencing a pointer effectively produces a regular variable. This variable may conceptually occupy the same space that another named variable uses, or it may not. in this case, *pinum and inum are the same variable. When I say "produces", it's important to note than no constructors are called. This is why you MUST initialize pointers before using them: Pointer dereferencing will NEVER allocate storage.
returnSame(*pinum)
This function takes a reference and returns the same reference. It's helpful to realize that this function could be written with pointers as well, and behave exactly the same way. References do not perform any initialization either, in that they do not call constructors. However, it is illegal to have an uninitialized reference, so running into uninitialized memory through them is not as common a mistake as with pointers. Your function could be rewritten to use pointers in the following way:
int* returnSamePointer( int *example ) { return example; }
In this case, you would not need to dereference the pointer before passing it, but you would need to dereference the function's return value before printing it:
std::cout << "inum: " << *(returnSamePointer(pinum)) << std::endl;
NULL References
Declaring a NULL reference is dangerous, since attempting to use it will automatically attempt to dereference it, which will cause a segmentation fault. You can, however, safely check if a reference is a null reference. Again, I highly recommend not using these ever.
int& nullRef = *((int *) NULL); // creates a reference to nothing
bool isRefNull = (&nullRef == NULL); // true
Summary
Pointer and Reference types are two different ways to accomplish the same thing
Most of the gotchas that apply to one apply to the other
Neither pointers nor references will call constructors or destructors for referenced values implicitly under any circumstances
Declaring a reference to a dereferenced pointer is perfectly legal, as long as the pointer is initialized properly
A compiler doesn't "call" anything. It just generates code. Dereferencing a pointer would at the most basic level correspond to some sort of load instruction, but in the present code the compiler can easily optimize this away and just print the value directly, or perhaps shortcut directly to loading inum.
Concerning your "temporary object": Dereferencing a pointer always gives an lvalue.
Perhaps there's a more interesting question hidden in your question, though: How does the compiler implement passing function arguments as references?
Consider the following example:
class Base {
public:
int data_;
};
class Derived : public Base {
public:
void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
};
int main() {
Base base;
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun();
return 0;
}
Function call is obviously undefined behavior according to C++ standard. But on all available machines and compilers (VC2005/2008, gcc on RH Linux and SunOS) it works as expected (prints "Hi!"). Do anyone know configuration this code can work incorrectly on? Or may be, more complicated example with the same idea (note, that Derived shouldn't carry any additional data anyway)?
Update:
From standard 5.2.9/8:
An rvalue of type “pointer to cv1 B”, where B is a class type, can be
converted to an rvalue of type “pointer to cv2 D”, where D is a
class derived (clause 10) from B, if a valid standard conversion from
“pointer to D” to “pointer to B” exists (4.10), cv2 is the same
cvqualification as, or greater cvqualification than, cv1, and B is not
a virtual base class of D. The null pointer value (4.10) is converted
to the null pointer value of the destination type. If the rvalue of
type “pointer to cv1 B” points to a B that is actually a subobject of
an object of type D, the resulting pointer points to the enclosing
object of type D. Otherwise, the result of the cast is undefined.
And one more 9.3.1 (thanks #Agent_L):
If a nonstatic member function of a class X is called for an object
that is not of type X, or of a type derived from X, the behavior is
undefined.
Thanks,
Mike.
The function fun() doesn't actually do anything that matters what the this pointer is, and as it isn't a virtual function, there's nothing special needed to look up the function. Basically, it's called like any normal (non-member) function, with a bad this pointer. It just doesn't crash, which is perfectly valid undefined behavior (if that's not a contradiction).
The comments to the code are incorrect.
Derived *derived = static_cast<Derived*>(&base);
derived->fun(); // Undefined behavior!
Corrected version:
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun(); // Uses result of undefined behavior
The undefined behavior starts with the static_cast. Any subsequent use of this ill-begotten pointer is also undefined behavior. Undefined behavior is a get out of jail free card for compiler vendors. Almost any response by the compiler is compliant with the standard.
There's nothing to stop the compiler from rejecting your cast. A nice compiler might well issue a fatal compilation error for that static_cast. The violation is easy to see in this case. In general it is not easy to see, so most compilers don't bother checking.
Most compilers instead take the easiest way out. In this case, the easy way out is to simply pretend that that pointer to an instance of class Base is a pointer to an instance of class Derived. Since your function Derived::fun() is rather benign, the easy way out in this case yields a rather benign result.
Just because you are getting a nice benign result does not mean everything is cool. It is still undefined behavior. The best bet is to never rely on undefined behavior.
Run the same code infinite number of times on the same machine, maybe you will see it working incorrectly and unexpectedly if you're lucky.
The thing to understand is that undefined behavior (UB) does not mean that it will definitely not run as expected; it might run as expected, 1 time, 2 times, 10 times, even infinite number of times. UB simply means it is just not guaranteed to run as expected.
You have to understand what your code is doing, then you can see it's doing nothing wrong.
"this" is a hidden pointer, generated for you by the compiler.
class Base
{
public:
int data_;
};
class Derived : public Base
{
};
void fun(Derived* pThis)
{
::std::cout << "Hi, I'm " << pThis << ::std::endl;
}
//because you're JUST getting numerical value of a pointer, it can be same as:
void fun(void* pThis)
{
::std::cout << "Hi, I'm " << pThis << ::std::endl;
}
//but hey, even this is still same:
void fun(unsigned int pThis)
{
::std::cout << "Hi, I'm " << pThis << ::std::endl;
}
Now it's obvious: this function cannot fail. You can even pass NULL, or some other, completely unrelated class.
The behaviour is undefined, but there is nothing that can go wrong here.
//Edit: ok, according to Standard, the situations are not equal. ((Derived*)NULL)->fun(); is explicitly declared UB. However, this behaviour is usually defined in compiler docs about calling conventions.
I should have written "For all compilers that I know, nothing can go wrong."
For example, the compiler may optimize the code out.
Consider sligthly different program:
if(some_very_complex_condition)
{
// here is your original snippet:
Base base;
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun();
}
The compiler can
(1) detect the undefined behaviour
(2) assume that the program shouldn't expose undefined behavior
Therefore (the compiler decides that) _some_very_complex_condition_ should be always false. Assuming this, the compiler may eliminate the whole code as not reachable.
[edit] A real world example how the compiler may eliminate code which "serves" UB case:
Why does integer overflow on x86 with GCC cause an infinite loop?
The practical reason why this code often works is that anything which breaks this tends to be optimized out in release/optimized-for-performance builds. However, any compiler setting that focuses on finding errors (such as debug builds) is more likely to trip on this.
In those cases, your assumption ("note, that Derived shouldn't carry any additional data anyway") doesn't hold. It definitely should, to facilitate debugging.
A slightly more complicated example is even trickier:
class Base {
public:
int data_;
virtual void bar() { std::cout << "Base\n"; }
};
class Derived : public Base {
public:
void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
virtual void bar() { std::cout << "Derived\n"; }
};
int main() {
Base base;
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun();
derived->bar();
}
Now a reasonable compiler may decide to skip the vtable and statically call Base::bar() since that's the object you're calling bar() on. Or it may decide that derived must point to a real Derived since you called fun on it, skip the vtable, and call Derived::bar(). As you see, both optimizations are quite reasonable given the circumstances.
And in this we see why Undefined Behavior can be so surprising: compilers can make incorrect assumptions following code with UB, even if the statement itself is compiled right.