Is "assert (this)" a viable pattern? - c++

Suppose the C++ below. Before calling of a->method1() it has an
assert (a) to check if a is sane.
The call a->method2() has no such assertion; instead method2 itself
checks for a valid this by means of assert (this).
It that viable code re. the C++ specification?
Even if it's covered by the standard, it not good style of course, and
it's error prone if the code ever changes, e.g. if the method is
refactored to a virtual method. I am just curios about what the
standard has to say, and whether g++ code words by design or just by
accident.
The code below works as expected with g++, i.e. the assertion in
method2 triggers as intended, because just to call method2 no
this pointer is needed.
#include <iostream>
#include <cassert>
struct A
{
int a;
A (int a) : a(a) {}
void method1 ()
{
std::cout << a << std::endl;
}
void method2 ()
{
assert (this);
std::cout << a << std::endl;
}
};
void func1 (A *a)
{
assert (a);
a->method1();
}
void func2 (A *a)
{
a->method2();
}
int main ()
{
func1 (new A (1));
func2 (new A (2));
func2 (nullptr);
}
Output
1
2
Assertion failed: this, file main.cpp, line 16

Even if it's [permitted] by the standard
It isn't.
it not good style of course
Nope.
and it's error prone if the code ever changes, e.g. if the method is refactored to a virtual method.
I concede that a virtual member function is more likely to cause a "crash" here, but you already have undefined behaviour and that's not just a theoretical concern: you can expect things like the assertion or conditions to be elided, or other weird things to happen.
This pattern is a big no-no.
I am just curios about what the standard has to say
It says:
[expr.ref/2] [..] For the second option (arrow) the first expression shall be a prvalue having pointer type. The expression E1->E2 is converted to the equivalent form (*(E1)).E2 [..]
[expr.unary.op/1] The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. [..]
Notice that it doesn't explicitly say "the object must exist", but by saying that the expression refers to the object, it implicitly tells us that there must be an object. This sort of "gap" falls directly into the definition of undefined behaviour, by design.
whether g++ code words by design or just by accident.
The last one.

Answering your question up front: "C++: Is "assert (this)" a viable pattern?" - No.
assert(this); is pointless. The C++ standard guarantees that the this pointer is never nullptr in valid programs.
If your program has undefined behaviour then all bets are, of course, off and this might be nullptr. But an assert is not the correct fix in that case, fixing the UB is.

this cannot be nullptr, (else there is already undefined behavior).
in your case
a->method2(); // with a == nullptr
invokes undefined behavior, so checking afterward is useless.
Better signature to mean not null pointer is reference:
void func3(A& a)
{
a.method1();
}
int main ()
{
A a1(1); // no new, so no (missing) delete :-)
A a2(2);
func1(&a1);
func2(&a2);
func2(nullptr); :/
func3(a1);
}

Related

Calling non-static member function outside of object's lifetime in C++17

Does the following program have undefined behavior in C++17 and later?
struct A {
void f(int) { /* Assume there is no access to *this here */ }
};
int main() {
auto a = new A;
a->f((a->~A(), 0));
}
C++17 guarantees that a->f is evaluated to the member function of the A object before the call's argument is evaluated. Therefore the indirection from -> is well-defined. But before the function call is entered, the argument is evaluated and ends the lifetime of the A object (see however the edits below). Does the call still have undefined behavior? Is it possible to call a member function of an object outside its lifetime in this way?
The value category of a->f is prvalue by [expr.ref]/6.3.2 and [basic.life]/7 does only disallow non-static member function calls on glvalues referring to the after-lifetime object. Does this imply the call is valid? (Edit: As discussed in the comments I am likely misunderstanding [basic.life]/7 and it probably does apply here.)
Does the answer change if I replace the destructor call a->~A() with delete a or new(a) A (with #include<new>)?
Some elaborating edits and clarifications on my question:
If I were to separate the member function call and the destructor/delete/placement-new into two statements, I think the answers are clear:
a->A(); a->f(0): UB, because of non-static member call on a outside its lifetime. (see edit below, though)
delete a; a->f(0): same as above
new(a) A; a->f(0): well-defined, call on the new object
However in all these cases a->f is sequenced after the first respective statement, while this order is reversed in my initial example. My question is whether this reversal does allow for the answers to change?
For standards before C++17, I initially thought that all three cases cause undefined behavior, already because the evaluation of a->f depends on the value of a, but is unsequenced relative to the evaluation of the argument which causes a side-effect on a. However, this is undefined behavior only if there is an actual side-effect on a scalar value, e.g. writing to a scalar object. However, no scalar object is written to because A is trivial and therefore I would also be interested in what constraint exactly is violated in the case of standards before C++17, as well. In particular, the case with placement-new seems unclear to me now.
I just realized that the wording about the lifetime of objects changed between C++17 and the current draft. In n4659 (C++17 draft) [basic.life]/1 says:
The lifetime of an object o of type T ends when:
if T is a class
type with a non-trivial destructor (15.4), the destructor call starts
[...]
while the current draft says:
The lifetime of an object o of type T ends when:
[...]
if T is a class type, the destructor call starts, or
[...]
Therefore, I suppose my example does have well-defined behavior in C++17, but not he current (C++20) draft, because the destructor call is trivial and the lifetime of the A object isn't ended. I would appreciate clarification on that as well. My original question does still stands even for C++17 for the case of replacing the destructor call with delete or placement-new expression.
If f accesses *this in its body, then there may be undefined behavior for the cases of destructor call and delete expression, however in this question I want to focus on whether the call in itself is valid or not.
Note however that the variation of my question with placement-new would potentially not have an issue with member access in f, depending on whether the call itself is undefined behavior or not. But in that case there might be a follow-up question especially for the case of placement-new because it is unclear to me, whether this in the function would then always automatically refer to the new object or whether it might need to potentially be std::laundered (depending on what members A has).
While A does have a trivial destructor, the more interesting case is probably where it has some side effect about which the compiler may want to make assumptions for optimization purposes. (I don't know whether any compiler uses something like this.) Therefore, I welcome answers for the case where A has a non-trivial destructor as well, especially if the answer differs between the two cases.
Also, from a practical perspective, a trivial destructor call probably does not affect the generated code and (unlikely?) optimizations based on undefined behavior assumptions aside, all code examples will most likely generate code that runs as expected on most compilers. I am more interested in the theoretical, rather than this practical perspective.
This question intends to get a better understanding of the details of the language. I do not encourage anyone to write code like that.
It’s true that trivial destructors do nothing at all, not even end the lifetime of the object, prior to (the plans for) C++20. So the question is, er, trivial unless we suppose a non-trivial destructor or something stronger like delete.
In that case, C++17’s ordering doesn’t help: the call (not the class member access) uses a pointer to the object (to initialize this), in violation of the rules for out-of-lifetime pointers.
Side note: if just one order were undefined, so would be the “unspecified order” prior to C++17: if any of the possibilities for unspecified behavior are undefined behavior, the behavior is undefined. (How would you tell the well-defined option was chosen? The undefined one could emulate it and then release the nasal demons.)
The postfix expression a->f is sequenced before the evaluation of any arguments (which are indeterminately sequenced relative to one another). (See [expr.call])
The evaluation of the arguments is sequenced before the body of the function (even inline functions, see [intro.execution])
The implication, then is that calling the function itself is not undefined behavior. However, accessing any member variables or calling other member functions within would be UB per [basic.life].
So the conclusion is that this specific instance is safe per the wording, but a dangerous technique in general.
You seem to assume that a->f(0) has these steps (in that order for most recent C++ standard, in some logical order for previous versions):
evaluating *a
evaluating a->f (a so called bound member function)
evaluating 0
calling the bound member function a->f on the argument list (0)
But a->f doesn't have either a value or type. It's essentially a non-thing, a meaningless syntax element needed only because the grammar decomposes member access and function call, even on a member function call which by define combines member access and function call.
So asking when a->f is "evaluated" is a meaningless question: there is no such thing as a distinct evaluation step for the a->f value-less, type-less expression.
So any reasoning based on such discussions of order of evaluation of non entity is also void and null.
EDIT:
Actually this is worse than what I wrote, the expression a->f has a phony "type":
E1.E2 is “function of parameter-type-list cv returning T”.
"function of parameter-type-list cv" isn't even something that would be a valid declarator outside a class: one cannot have f() const as a declarator as in a global declaration:
int ::f() const; // meaningless
And inside a class f() const doesn't mean "function of parameter-type-list=() with cv=const”, it means member-function (of parameter-type-list=() with cv=const). There is no proper declarator for proper "function of parameter-type-list cv". It can only exist inside a class; there is no type "function of parameter-type-list cv returning T" that can be declared or that real computable expressions can have.
In addition to what others said:
a->~A(); delete a;
This program has a memory leak which itself is technically not undefined behavior.
However, if you called delete a; to prevent it - that should have been undefined behavior because delete would call a->~A() second time [Section 12.4/14].
a->~A()
Otherwise in reality this is as others suggested - compiler generates machine code along the lines of A* a = malloc(sizeof(A)); a->A(); a->~A(); a->f(0);.
Since no member variables or virtuals all three member functions are empty ({return;}) and do nothing. Pointer a even still points to valid memory.
It will run but debugger may complain of memory leak.
However, using any nonstatic member variables inside f() could have been undefined behavior because you are accessing them after they are (implicitly) destroyed by compiler-generated ~A(). That would likely result in a runtime error if it was something like std::string or std::vector.
delete a
If you replaced a->~A() with expression that invoked delete a; instead then I believe this would have been undefined behavior because pointer a is no longer valid at that point.
Despite that, the code should still run without errors because function f() is empty. If it accessed any member variables it may have crashed or led to random results because the memory for a is deallocated.
new(a) A
auto a = new A; new(a) A; is itself undefined behavior because you are calling A() a second time for the same memory.
In that case calling f() by itself would be valid because a exists but constructing a twice is UB.
It will run fine if A does not contain any objects with constructors allocating memory and such. Otherwise it could lead to memory leaks, etc, but f() would access the "second" copy of them just fine.
I'm not a language lawyer but I took your code snippet and modified it slightly. I wouldn't use this in production code but this seems to produce valid defined results...
#include <iostream>
#include <exception>
struct A {
int x{5};
void f(int){}
int g() { std::cout << x << '\n'; return x; }
};
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g()));
catch(const std::exception& e) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I'm running Visual Studio 2017 CE with compiler language flag set to /std:c++latest and my IDE's version is 15.9.16 and I get the follow console output and exit program status:
console output
5
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
So this does seem to be defined in the case of Visual Studio, I'm not sure how other compilers will treat this. The destructor is being invoked, however the variable a is still in dynamic heap memory.
Let's try another slight modification:
#include <iostream>
#include <exception>
struct A {
int x{5};
void f(int){}
int g(int y) { x+=y; std::cout << x << '\n'; return x; }
};
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g(3)));
catch(const std::exception& e) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
console output
8
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
This time let's not change the class anymore, but let's make call on a's member afterwards...
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g(3)));
a->g(2);
} catch( const std::exception& e ) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
console output
8
10
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
Here it appears that a.x is maintaining its value after a->~A() is called since new was called on A and delete has not yet been called.
Even more if I remove the new and use a stack pointer instead of allocated dynamic heap memory:
int main() {
try {
A b;
A* a = &b;
a->f((a->~A(), a->g(3)));
a->g(2);
} catch( const std::exception& e ) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I'm still getting:
console output
8
10
IDE exit status output
When I change my compiler's language flag setting from /c:std:c++latest to /std:c++17 I'm getting the same exact results.
What I'm seeing from Visual Studio it appears to be well defined without producing any UB within the contexts of what I've shown. However as from a language perspective when it concerns the standard I wouldn't rely on this type of code either. The above also doesn't consider when the class has internal pointers both stack-automatic storage as well as dynamic-heap allocation and if the constructor calls new on those internal objects and the destructor calls delete on them.
There are also a bunch of other factors than just the language setting for the compiler such as optimizations, convention calling, and other various compiler flags. It is hard to say and I don't have an available copy of the full latest drafted standard to investigate this any deeper. Maybe this can help you, others who are able to answer your question more thoroughly, and other readers to visualize this type of behavior in action.

Make argument a reference and not a pointer, if null is not a valid value

It is, as far as I have known, been a good rule that a pointer like argument type to a function should be a pointer if the argument can sensible be null and it should be a reference if the argument should never be null.
Based on that "rule", I have naiively expected that doing something like
someMethodTakingAnIntReference(*aNullPointer) would fail when trying to make the call, but to my surprise the following code is running just fine which kinda makes "the rule" less usable. A developer can still read meaning from the argument type being reference, but the compiler doesn't help and the location of the runtime error does not either.
Am I misunderstanding the point of this rule, or is this undefined behavior, or...?
int test(int& something1, int& something2)
{
return something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs fine returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
While the above works, obviously the following does not, but the same would be true if the references were just pointers; meaning that the rule and the compiler is not really helping.
int test(int& something1, int& something2)
{
return something1+something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
Writing test(*i1, *i2) causes undefined behaviour; specifically the part *i1. This is covered in the C++ Standard by [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
This defines the behaviour of *X only for the case where X points to an object or function. Since i1 does not point to an object or function, the standard does not define the behaviour of *i1, therefore it is undefined behaviour. (This is sometimes known as "undefined by omission", and this same practice handles many other uses of lvalues that don't designate objects).
As described in the linked page, undefined behaviour does not necessitate any sort of diagnostic message. The runtime behaviour could literally be anything. The compiler could, but is not required to, generate a compilation warning or error. In general, it's up to the programmer to comply with the rules of the language. The compiler helps out to some extent but it cannot cover all cases.
You're better off thinking of references as little more than a handy notation for pointers.
They are still pointers, and the runtime error occurs when you use (dereference) a null pointer, not when you pass it to a function.
(An added advantage of references is that they can not be changed to reference something else, once initialized.)

Strange behavior in casting of function pointers in C++

I have recently encountered a behavior in C++ regarding function pointers, that I can't fully understand. I asked Google for help as well as some of my more experienced colleagues, but even they couldn't help.
The following code showcases this mystique behavior:
class MyClass{
private:
int i;
public:
MyClass(): i(0) {}
MyClass(int i): i(i) {}
void PrintText() const { std::cout << "some text " << std::endl;}
};
typedef void (*MyFunction) (void*);
void func(MyClass& mc){
mc.PrintText();
}
int main(){
void* v_mc = new MyClass;
MyFunction f = (MyFunction) func; //It works!
f(v_mc); //It works correctly!!!
return 0;
}
So, first I define a simple class that will be used later (especially, it's member method PrintText). Then, I define name object void (*) (void*) as MyFunction - a pointer to function that has one void* parameter and doesn't return a value.
After that, I define function func() that accepts a reference to MyClass object and calls its method PrintText.
And finally, magic happens in main function. I dynamically allocate memory for new MyClass object casting the returned pointer to void*. Then, I cast pointer to func() function to MyFunction pointer - I didn't expect this to compile at all but it does.
And finally, I call this new object with a void* argument even though underlying function (func()) accepts reference to MyClass object. And everything works correctly!
I tried compiling this code with both Visual Studio 2010 (Windows) and XCode 5 (OSX) and it works in the same manner - no warnings are reported whatsoever. I imagine the reason why this works is that C++ references are actually implemented as pointers behind the scenes but this is not an explanation.
I hope someone can explain this behavior.
The formal explanation is simple: undefined behaviour is undefined. When you call a function through a pointer to a different function type, it's undefined behaviour and the program can legally do anything (crash, appear to work, order pizza online ... anyting goes).
You can try reasoning about why the behaviour you're experiencing happens. It's probably a combination of one or more of these factors:
Your compiler internally implements references as pointers.
On your platform, all pointers have the same size and binary representation.
Since PrintText() doesn't access *this at all, the compiler can effectively ignore the value of mc altogether and just call the PrintText() function inside func.
However, you must remember that while you're currently experiencing the behaviour you've described on your current platform, compiler version and under this phase of the moon, this could change at any time for no apparent reason whatsoever (such as a change in surrounding code triggering different optimisations). Remember that undefined behaviour is simply undefined.
As to why you can cast &func to MyFunction - the standard explicitly allows that (with a reinterpret_cast, to which the C-style cast translates in this context). You can legally cast a pointer to function to any other pointer to function type. However, pretty much the only thing you can legally do with it is move it around or cast it back to the original type. As I said above, if you call through a function pointer of the wrong type, it's undefined behaviour.
I hope someone can explain this behavior.
The behaviour is undefined.
MyFunction f = (MyFunction) func; //It works!
It "works" because you use c-style cast which has the same effect as reinterpret_cast in this case I think. If you had used static_cast or simply not cast at all, the compiler would have warned of your mistake and failed. When you call the wrongly interpreted function pointer, you get undefined behaviour.
It's only by chance that it works. Compilers are not guaranteed to make it work. Behind the scenes, your compiler is treating the reference as a pointer, so your alternative function signature just happens to work.
I'm sorry, to me isn't clear why you call this a strange behavior, I don't see a undefined behavior that depends on moon cycle here, is the way to use function pointers in C.
Adding some debug output you may see that the pointer to the object remain the same in all the calls.
void PrintText() const { std::cout << "some text " << this << std::endl;}
^^^^
void func(MyClass& mc){
std::cout << (void *)&mc << std::endl;
^^^
void *v_mc = new MyClass;
std::cout << (void *)v_mc << std::endl;
^^^^

What is the point of noreturn?

[dcl.attr.noreturn] provides the following example:
[[ noreturn ]] void f() {
throw "error";
// OK
}
but I do not understand what is the point of [[noreturn]], because the return type of the function is already void.
So, what is the point of the noreturn attribute? How is it supposed to be used?
The [[noreturn]] attribute is supposed to be used for functions that don't return to the caller. That doesn't mean void functions (which do return to the caller - they just don't return a value), but functions where the control flow will not return to the calling function after the function finishes (e.g. functions that exit the application, loop forever or throw exceptions as in your example).
This can be used by compilers to make some optimizations and generate better warnings. For example if f has the [[noreturn]] attribute, the compiler could warn you about g() being dead code when you write f(); g();. Similarly the compiler will know not to warn you about missing return statements after calls to f().
noreturn doesn't tell the compiler that the function doesn't return any value. It tells the compiler that control flow will not return to the caller. This allows the compiler to make a variety of optimizations -- it need not save and restore any volatile state around the call, it can dead-code eliminate any code that would otherwise follow the call, etc.
It means that the function will not complete. The control flow will never hit the statement after the call to f():
void g() {
f();
// unreachable:
std::cout << "No! That's impossible" << std::endl;
}
The information can be used by the compiler/optimizer in different ways. The compiler can add a warning that the code above is unreachable, and it can modify the actual code of g() in different ways for example to support continuations.
Previous answers correctly explained what noreturn is, but not why it exists. I don't think the "optimization" comments is the main purpose: Functions which do not return are rare and usually do not need to be optimized. Rather I think the main raison d'être of noreturn is to avoid false-positive warnings. For example, consider this code:
int f(bool b){
if (b) {
return 7;
} else {
abort();
}
}
Had abort() not been marked "noreturn", the compiler might have warned about this code having a path where f does not return an integer as expected. But because abort() is marked no return it knows the code is correct.
Type theoretically speaking, void is what is called in other languages unit or top. Its logical equivalent is True. Any value can be legitimately cast to void (every type is a subtype of void). Think about it as "universe" set; there are no operations in common to all the values in the world, so there are no valid operations on a value of type void. Put it another way, telling you that something belongs to the universe set gives you no information whatsoever - you know it already. So the following is sound:
(void)5;
(void)foo(17); // whatever foo(17) does
But the assignment below is not:
void raise();
void f(int y) {
int x = y!=0 ? 100/y : raise(); // raise() returns void, so what should x be?
cout << x << endl;
}
[[noreturn]], on the other hand, is called sometimes empty, Nothing, Bottom or Bot and is the logical equivalent of False. It has no values at all, and an expression of this type can be cast to (i.e is subtype of) any type. This is the empty set. Note that if someone tells you "the value of the expression foo() belongs to the empty set" it is highly informative - it tells you that this expression will never complete its normal execution; it will abort, throw or hang. It is the exact opposite of void.
So the following does not make sense (pseudo-C++, since noreturn is not a first-class C++ type)
void foo();
(noreturn)5; // obviously a lie; the expression 5 does "return"
(noreturn)foo(); // foo() returns void, and therefore returns
But the assignment below is perfectly legitimate, since throw is understood by the compiler to not return:
void f(int y) {
int x = y!=0 ? 100/y : throw exception();
cout << x << endl;
}
In a perfect world, you could use noreturn as the return value for the function raise() above:
noreturn raise() { throw exception(); }
...
int x = y!=0 ? 100/y : raise();
Sadly C++ does not allow it, probably for practical reasons. Instead it gives you the ability to use [[ noreturn ]] attribute which helps guiding compiler optimizations and warnings.

How can C++ compiler optimize incorrect hierarchical downcast to cause real undefined behavior

Consider the following example:
class Base {
public:
int data_;
};
class Derived : public Base {
public:
void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
};
int main() {
Base base;
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun();
return 0;
}
Function call is obviously undefined behavior according to C++ standard. But on all available machines and compilers (VC2005/2008, gcc on RH Linux and SunOS) it works as expected (prints "Hi!"). Do anyone know configuration this code can work incorrectly on? Or may be, more complicated example with the same idea (note, that Derived shouldn't carry any additional data anyway)?
Update:
From standard 5.2.9/8:
An rvalue of type “pointer to cv1 B”, where B is a class type, can be
converted to an rvalue of type “pointer to cv2 D”, where D is a
class derived (clause 10) from B, if a valid standard conversion from
“pointer to D” to “pointer to B” exists (4.10), cv2 is the same
cvqualification as, or greater cvqualification than, cv1, and B is not
a virtual base class of D. The null pointer value (4.10) is converted
to the null pointer value of the destination type. If the rvalue of
type “pointer to cv1 B” points to a B that is actually a subobject of
an object of type D, the resulting pointer points to the enclosing
object of type D. Otherwise, the result of the cast is undefined.
And one more 9.3.1 (thanks #Agent_L):
If a nonstatic member function of a class X is called for an object
that is not of type X, or of a type derived from X, the behavior is
undefined.
Thanks,
Mike.
The function fun() doesn't actually do anything that matters what the this pointer is, and as it isn't a virtual function, there's nothing special needed to look up the function. Basically, it's called like any normal (non-member) function, with a bad this pointer. It just doesn't crash, which is perfectly valid undefined behavior (if that's not a contradiction).
The comments to the code are incorrect.
Derived *derived = static_cast<Derived*>(&base);
derived->fun(); // Undefined behavior!
Corrected version:
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun(); // Uses result of undefined behavior
The undefined behavior starts with the static_cast. Any subsequent use of this ill-begotten pointer is also undefined behavior. Undefined behavior is a get out of jail free card for compiler vendors. Almost any response by the compiler is compliant with the standard.
There's nothing to stop the compiler from rejecting your cast. A nice compiler might well issue a fatal compilation error for that static_cast. The violation is easy to see in this case. In general it is not easy to see, so most compilers don't bother checking.
Most compilers instead take the easiest way out. In this case, the easy way out is to simply pretend that that pointer to an instance of class Base is a pointer to an instance of class Derived. Since your function Derived::fun() is rather benign, the easy way out in this case yields a rather benign result.
Just because you are getting a nice benign result does not mean everything is cool. It is still undefined behavior. The best bet is to never rely on undefined behavior.
Run the same code infinite number of times on the same machine, maybe you will see it working incorrectly and unexpectedly if you're lucky.
The thing to understand is that undefined behavior (UB) does not mean that it will definitely not run as expected; it might run as expected, 1 time, 2 times, 10 times, even infinite number of times. UB simply means it is just not guaranteed to run as expected.
You have to understand what your code is doing, then you can see it's doing nothing wrong.
"this" is a hidden pointer, generated for you by the compiler.
class Base
{
public:
int data_;
};
class Derived : public Base
{
};
void fun(Derived* pThis)
{
::std::cout << "Hi, I'm " << pThis << ::std::endl;
}
//because you're JUST getting numerical value of a pointer, it can be same as:
void fun(void* pThis)
{
::std::cout << "Hi, I'm " << pThis << ::std::endl;
}
//but hey, even this is still same:
void fun(unsigned int pThis)
{
::std::cout << "Hi, I'm " << pThis << ::std::endl;
}
Now it's obvious: this function cannot fail. You can even pass NULL, or some other, completely unrelated class.
The behaviour is undefined, but there is nothing that can go wrong here.
//Edit: ok, according to Standard, the situations are not equal. ((Derived*)NULL)->fun(); is explicitly declared UB. However, this behaviour is usually defined in compiler docs about calling conventions.
I should have written "For all compilers that I know, nothing can go wrong."
For example, the compiler may optimize the code out.
Consider sligthly different program:
if(some_very_complex_condition)
{
// here is your original snippet:
Base base;
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun();
}
The compiler can
(1) detect the undefined behaviour
(2) assume that the program shouldn't expose undefined behavior
Therefore (the compiler decides that) _some_very_complex_condition_ should be always false. Assuming this, the compiler may eliminate the whole code as not reachable.
[edit] A real world example how the compiler may eliminate code which "serves" UB case:
Why does integer overflow on x86 with GCC cause an infinite loop?
The practical reason why this code often works is that anything which breaks this tends to be optimized out in release/optimized-for-performance builds. However, any compiler setting that focuses on finding errors (such as debug builds) is more likely to trip on this.
In those cases, your assumption ("note, that Derived shouldn't carry any additional data anyway") doesn't hold. It definitely should, to facilitate debugging.
A slightly more complicated example is even trickier:
class Base {
public:
int data_;
virtual void bar() { std::cout << "Base\n"; }
};
class Derived : public Base {
public:
void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
virtual void bar() { std::cout << "Derived\n"; }
};
int main() {
Base base;
Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!
derived->fun();
derived->bar();
}
Now a reasonable compiler may decide to skip the vtable and statically call Base::bar() since that's the object you're calling bar() on. Or it may decide that derived must point to a real Derived since you called fun on it, skip the vtable, and call Derived::bar(). As you see, both optimizations are quite reasonable given the circumstances.
And in this we see why Undefined Behavior can be so surprising: compilers can make incorrect assumptions following code with UB, even if the statement itself is compiled right.