Is calling a function pointer to generated code undefined behavior? - c++

Is there a way, in purely standard C++, to somehow call a function generated within a buffer?
Let's assume:
We've used implementation-specific knowledge (and obv. architecture-specific) to generate the byte sequence implementing some function in an array of char.
This implementation correctly matches the implementation's calling convention, and other relevant parts of the ABI, so that it will actually function correctly if called as C++ function
The buffer is correctly marked as executable
I.e., all the architecture / ABI hurdles have been overcome, all we need to do is actually call this block of code.
The question is: is there a standards-compliant way to create a function pointer to this and call said function pointer without hitting undefined behavior?
As I understand it, it is implementation-defined if we can cast a pointer to an object type to a function pointer type. I also believe that it is implementation-defined if said pointer will point to the same logical address as the original pointer. The obstacle is the call: from the perspective of the implementation, all I've done is try and cast a pointer to an array of chars into a function pointer and then call it. If said ptr was one to an object type, I know this would be violating strict aliasing rules, and would be undefined behavior. But is it undefined behavior or merely implementation-defined what will occur if I attempt to call this function pointer?

As I understand your question, you ask about code such as:
char buffer[200] = "some valid operations";
void (*func)() = reinterpret_cast<void(*)()>(buffer);
func();
I see in the standard in C++ expr.call 7.6.1.3 Function call:
A function call is a postfix expression followed by parentheses containing a possibly empty, comma-separated list of initializer-clauses which constitute the arguments to the function.
[...]
The postfix expression shall have function type or function pointer type.
For a call to a non-member function or to a static member function, the postfix expression shall either be an lvalue that refers to a function [...], or have function pointer type.
The func in the above code is an lvalue that refers to a char array and not to a function. The following points in the standard describe other cases of function call expression, such as virtual function call or call to a destructor of some object. These points also do not apply here. To summarize, the standard does not define what will happen when the function call expression is applied to a lvalue that refers to a object that is a char array. Because it's not defined, hence it's undefined behavior.
is there a standards-compliant way to create a function pointer to this
Just cast it, via reinterpret_cast. The resulting value is implementation-defined.
and call said function pointer without hitting undefined behavior?
No.
Is calling a function pointer to generated code undefined behavior?
Yes.
Is there a way, in purely standard C++, to somehow call a function generated within a buffer?
No.
But is it undefined behavior or merely implementation-defined what will occur if I attempt to call this function pointer?
Undefined behavior.

It is implementation-defined what the conversion means, so the implementation can choose to interpret it as a pointer to some function “defined by the implementation” (as if its standard library contained all possible functions). I’m not sure that implementations actually document support for this technique (as required for implementation-defined behavior), but it’s only reasonable to interpret that as a bug in their documentation.

Related

Is it undefined behavior to compare function pointers reinterpret_cast to void(*)()?

For example, can I make a std::set<void(*)()> and use reinterpret_cast to put arbitrary function pointers into it? I obviously wouldn’t be able to call them, but if I wanted to remember what functions I’ve seen, would seen.count(reinterpret_cast<void(*)()>(fn)) be defined behavior? For now I’m assuming the functions I care about aren’t overloaded.
This is fine. According to [expr.reinterpret.cast],
A function pointer can be explicitly converted to a function pointer of a different type.
It also says that converting a function pointer to a different function pointer type and back will yield the original pointer type.

Array of objects not getting deleted [duplicate]

As, the title says:
Why is calling non virtual member function on deleted pointer an undefined behavior?
Note the Question does not ask if it is an Undefined Behavior, it asks Why it is undefined behavior.
Consider the following program:
#include<iostream>
class Myclass
{
//int i
public:
void doSomething()
{
std::cout<<"Inside doSomething";
//i = 10;
}
};
int main()
{
Myclass *ptr = new Myclass;
delete ptr;
ptr->doSomething();
return 0;
}
In the above code, the compiler does not actually dereference this while calling member function doSomething(). Note that the function is not an virtual function & the compilers convert the member function call to a usual function call by passing this as the first parameter to the function(As I understand this is implementation defined). They can do so because the compiler can exactly determine which function to call at compile time itself. So practically, calling the member function through deleted pointer does not dereference the this. The this is dereferenced only if any member is accessed inside the function body.(i.e: Uncommenting code in above example that accesses i)
If an member is not accessed within the function there is no purpose that the above code should actually invoke undefined behavior.
So why does the standard mandate that calling the non virtual member function through deleted pointer is an undefined behavior, when in fact it can reliably say that dereferencing the this should be the statement which should cause undefined behavior? Is it merely for sake of simplicity for users of the language that standard simply generalizes it or is there some deeper semantic involved in this mandate?
My feeling is that perhaps since it is implementation defined how compilers can invoke the member function may be that is the reason standard cannot enforce the actual point where UB occurs.
Can someone confirm?
Because the number of cases in which it might be reliable are so slim, and doing it is still an ineffably stupid idea. There's no benefit to defining the behaviour.
So why does the standard mandate that calling the non virtual member function through deleted pointer is an undefined behavior, when in fact it can reliably say that dereferencing the this should be the statement which should cause undefined behavior?
[expr.ref] paragraph 2 says that a member function call such as ptr->doSomething() is equivalent to (*ptr).doSomething() so calling a non-static member function is a dereference. If the pointer is invalid that's undefined behaviour.
Whether the generated code actually needs to dereference the pointer for specific cases is not relevant, the abstract machine that the compiler models does do a dereference in principle.
Complicating the language to define exactly which cases would be allowed as long as they don't access any members would have almost zero benefit. In the case where you can't see the function definition you have no idea if calling it would be safe, because you can't know if the function uses this or not.
Just don't do it, there's no good reason to, and it's a Good Thing that the language forbids it.
In C++ language (according to C++03) the very attempt to use the value of an invalid pointer is causing undefined behavior already. There's no need to dereference it for the UB to happen. Just reading the pointer value is enough. The concept of "invalid value" that causes UB when you merely attempt to read that value actually extends to almost all scalar types, not just to pointers.
After delete the pointer is generally invalid in that specific sense, i.e. reading a pointer that supposedly points to something that has just been "deleted" leads to undefined behavior.
int *p = new int();
delete p;
int *p1 = p; // <- undefined behavior
Calling a member function through an invalid pointer is just a specific case of the above. The pointer is used as an argument for the implicit parameter this. Passing a pointer is an non-reference argument is an act of reading it, which is why the behavior is undefined in your example.
So, your question really boils down to why reading invalid pointer values causes undefined behavior.
Well, there could be many platform-specific reasons for that. For example, on some platforms the act of reading a pointer might lead to the pointer value being loaded into some dedicated address-specific register. If the pointer is invalid, the hardware/OS might detect it immediately and trigger a program fault. In fact, this is how our popular x86 platform works with regard to segment registers. The only reason we don't hear much about it is that the popular OSes stick to flat memory model that simply does not actively use segment registers.
C++11 actually states that dereferencing invalid pointer values causes undefined behavior, while all other uses of invalid pointer value cause implementation-defined behavior. It also notes that implementation-defined behavior in case of "copying an invalid pointer" might lead to "a system-generated runtime fault". So it might actually be possible to carefully maneuver one's way through the labyrinth of C++11 specification and successfully arrive at the conclusion that calling a non-virtual method through an invalid pointer should result in implementation-defined behavior mentioned above. By in any case the possibility of "a system-generated runtime fault" will always be there.
Dereferencing of this in this case is effectively an implementation detail. I'm not saying that the this pointer is not defined by the standard, because it is, but from a semantically abstracted standpoint what is the purpose of allowing the use of objects that have been destroyed, just because there is a corner case in which in practice it will be "safe"? None. So it's not. No object exists, so you may not call a function on it.

Pointer arguments with compile-time ampersand checks

Can I get the compiler to check that my function which expects a pointer argument has been called with &someValidVariable rather than NULL, some variable, or some literal address?
I'd like to use pointer arguments over reference arguments because those ampersands make, IMO, code easier to understand but I'm lazy to do non-NULL checks.
Can I get the best of both worlds?
For nullptr you can add an overload that takes a dummy std::nullptr_t as argument.
But other than that it's not really possible, with the exception for arrays which you can do e.g.
template<std::size_t N>
void your_function(int (&array)[N]) { ... }
instead of letting it decay to a pointer.
In C++11, you can declare an overload of the function that accepts a std::nullptr_t, and not define that function. This will typically cause a linker error (as distinct from a compiler error). (Although that won't stop the caller from doing something like your_function((YourVariable *)nullptr) - which will call your function with a NULL pointer).
Other than that (in any version of C++) it is not possible, apart from some special cases like passing a reference to an array (such a function will not be passed NULL or (in C++11) nullptr). The reason is that a basic property of pointers is that they are passed by value, and the compiler permits that if the type (or permitted type conversions) is valid. Once the value is passed, the only way to check is at run-time within the function. The only exception is passing the value of an uninitialised pointer which, in itself, causes undefined behaviour (so anything can happen, and all bets are off).
But, really, the real solution to your problem is to pass references. One of the purposes is providing a guarantee that they reference an actual object (dereferencing a NULL to create a reference gives undefined behaviour, as does using a dangling reference to an object that has been destroyed). So you really need to work to better understand what references are and how to use them properly, rather than trying to avoid using them.

Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard?

An blog author has brought up the discussion about null pointer dereferecing:
http://www.viva64.com/en/b/0306/
I've put some counter arguments here:
http://bit.ly/1L98GL4
His main line of reasoning quoting the standard is this:
The '&podhd->line6' expression is undefined behavior in the C language
when 'podhd' is a null pointer.
The C99 standard says the following about the '&' address-of operator
(6.5.3.2 "Address and indirection operators"):
The operand of the unary & operator shall be either a function
designator, the result of a [] or unary * operator, or an lvalue that
designates an object that is not a bit-field and is not declared with
the register storage-class specifier.
The expression 'podhd->line6' is clearly not a function designator,
the result of a [] or * operator. It is an lvalue expression. However,
when the 'podhd' pointer is NULL, the expression does not designate an
object since 6.3.2.3 "Pointers" says:
If a null pointer constant is converted to a pointer type, the
resulting pointer, called a null pointer, is guaranteed to compare
unequal to a pointer to any object or function.
When "an lvalue does not designate an object when it is evaluated, the
behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays, and function
designators"):
An lvalue is an expression with an object type or an incomplete type
other than void; if an lvalue does not designate an object when it is
evaluated, the behavior is undefined.
So, the same idea in brief:
When -> was executed on the pointer, it evaluated to an lvalue where
no object exists, and as a result the behavior is undefined.
This question is purely language based, I'm not asking regarding whether a given system allows one to tamper with what lies at address 0 in any language.
As far as I can see, there's no restriction in dereferencing a pointer variable whose value is equal to nullptr, even thought comparisons of a pointer against the nullptr (or (void *) 0) constant can vanish in optimizations in certain situations because of the stated paragraphs, but this looks like another issue, it doesn't prevent dereferencing a pointer whose value is equal to nullptr. Notice that I've checked other SO questions and answers, I particularly like this set of quotations, as well as the standard quotes above, and I didn't stumbled upon something that clearly infers from standard that if a pointer ptr compares equal to nullptr, dereferencing it would be undefined behavior.
At most what I get is that deferencing the constant (or its cast to any pointer type) is what is UB, but nothing saying about a variable that's bit equal to the value that comes up from nullptr.
I'd like to clearly separate the nullptr constant from a pointer variable that holds a value equals to it. But an answer that address both cases is ideal.
I do realise that optimizations can quick in when there're comparisons against nullptr, etc and may simply strip code based on that.
If the conclusion is that, if ptr equals to the value of nullptr dereferencing it is definitely UB, another question follows:
Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?
As you quote C, dereferencing a null pointer is clearly undefined behavior from this Standard quote (emphasis mine):
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the
behavior of the unary * operator is undefined.102)"
102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."
Exact same quote in C99 and similar in C89 / C90.
C++
dcl.ref/5.
There shall be no references to references, no arrays of references, and no pointers to references. The
declaration of a reference shall contain an initializer (8.5.3) except when the declaration contains an explicit
extern specifier (7.1.1), is a class member (9.2) declaration within a class definition, or is the declaration
of a parameter or a return type (8.3.5); see 3.1. A reference shall be initialized to refer to a valid object or
function. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way
to create such a reference would be to bind it to the “object” obtained by indirection through a null pointer,
which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field.
— end note ]
The note is of interest, as it explicitly says dereferencing a null pointer is undefined.
I'm sure it says it somewhere else in a more relevant context, but this is good enough.
The answer to this that I see, as to what degree a NULL value may be dereferenced, is it is deliberately left platform-dependent in an unspecified manner, due to what is left implementation-defined in C11 6.3.2.3p5 and p6. This is mostly to support freestanding implementations used for developing boot code for a platform, as OP indicates in his rebuttal link, but has applications for a hosted implementation too.
Re:
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)"
102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."
This is phrased as it is, afaict, because each of the cases in the footnote may NOT be invalid for specific platforms a compiler is targeting. If there's a defect there, it's "invalid value" should be italicized and qualified by "implementation-defined". For the alignment case a platform may be able to access any type using any address so has no alignment requirements, especially if address rollover is supported; and a platform may assume an object's lifetime only ends after the application has exited, allocating a new frame via malloc() for automatic variables on each function call.
For null pointers, at boot time a platform may have expectations that structures the processor uses have specific physical addresses, including at address 0, and get represented as object pointers in source code, or may require the function defining the boot process to use a base address of 0. If the standard didn't permit dereferences like '&podhd->line6', where a platform required podhd to have a base address of 0, then assembly language would be needed to access that structure. Similarly, a soft reboot function might need to dereference a 0 valued pointer as a void function invocation. A hosted implementation may consider 0 the base of an executable image, and map a NULL pointer in source code to the header of that image, after loading, as the struct required to be at logical address 0 for that instance of the C virtual machine.
What the standard calls pointers are more handles into the virtual address space of the virtual machine, where object handles have more requirements on what operations are permitted for them. How the compiler emits code that takes the requirements of these handles into account for a specific processor is left undefined. What is efficient for one processor may not be for another, after all.
The requirement on (void *)0 is more that the compiler emit code that guarantees expressions where the source uses (void *)0, explicitly or by referencing NULL, that the actual value stored will be one that says this can't point to any valid function definitions or objects by any mapping code. This does not have to be a 0! Similarly, for casts of (void *)0 to (obj_type) and (func_type), these are only required to get assigned values that evaluate as addresses the compiler guarantees are not being used then for objects or code. The difference with the latter is these are unused, not invalid, so are capable of being dereferenced in the defined manner.
The code that tests for pointer equality would then check if one operand is one of these values that the other is one of the 3, not just the same bit pattern, because this scoreboards them with the RTTI of being a (null *) type, distinct from void, obj, and func pointer types to defined entities. The standard could be more explicit it is a distinct type, if unnamed because compilers only use it internally, but I suppose this is considered obvious by "null pointer" being italicized. Effectively, imo, a '0' in these contexts is an additional keyword token of the compiler, due to the additional requirement of it identifying the (null *) type, but isn't characterized as such because this would complicate the definition of < identifiers >.
This stored value can be SIZE_MAX as easily as a 0, for a (void *)0, in emitted application code when implementations, for example, define the range 0 to SIZE_MAX-4*sizeof(void *) of virtual machine handles as what is valid for code and data. The NULL macro may even be defined as(void *)SIZE_MAX, and it would be up to the compiler to figure out from context this has the same semantics as 0. The casting code is responsible for noting it is the chosen value, in pointer <--> pointer casts, and supply what is appropriate as an object or function pointer. Casts from pointer <--> integer, implicit or explicit, have similar check and supply requirements; especially in unions where a (u)intptr_t field overlays a (type *) field. Portable code can guard against compilers not doing this properly with an explicit *(ptr==NULL?(type *)0:ptr) expression.

Why is calling non virtual member function on deleted pointer an undefined behavior?

As, the title says:
Why is calling non virtual member function on deleted pointer an undefined behavior?
Note the Question does not ask if it is an Undefined Behavior, it asks Why it is undefined behavior.
Consider the following program:
#include<iostream>
class Myclass
{
//int i
public:
void doSomething()
{
std::cout<<"Inside doSomething";
//i = 10;
}
};
int main()
{
Myclass *ptr = new Myclass;
delete ptr;
ptr->doSomething();
return 0;
}
In the above code, the compiler does not actually dereference this while calling member function doSomething(). Note that the function is not an virtual function & the compilers convert the member function call to a usual function call by passing this as the first parameter to the function(As I understand this is implementation defined). They can do so because the compiler can exactly determine which function to call at compile time itself. So practically, calling the member function through deleted pointer does not dereference the this. The this is dereferenced only if any member is accessed inside the function body.(i.e: Uncommenting code in above example that accesses i)
If an member is not accessed within the function there is no purpose that the above code should actually invoke undefined behavior.
So why does the standard mandate that calling the non virtual member function through deleted pointer is an undefined behavior, when in fact it can reliably say that dereferencing the this should be the statement which should cause undefined behavior? Is it merely for sake of simplicity for users of the language that standard simply generalizes it or is there some deeper semantic involved in this mandate?
My feeling is that perhaps since it is implementation defined how compilers can invoke the member function may be that is the reason standard cannot enforce the actual point where UB occurs.
Can someone confirm?
Because the number of cases in which it might be reliable are so slim, and doing it is still an ineffably stupid idea. There's no benefit to defining the behaviour.
So why does the standard mandate that calling the non virtual member function through deleted pointer is an undefined behavior, when in fact it can reliably say that dereferencing the this should be the statement which should cause undefined behavior?
[expr.ref] paragraph 2 says that a member function call such as ptr->doSomething() is equivalent to (*ptr).doSomething() so calling a non-static member function is a dereference. If the pointer is invalid that's undefined behaviour.
Whether the generated code actually needs to dereference the pointer for specific cases is not relevant, the abstract machine that the compiler models does do a dereference in principle.
Complicating the language to define exactly which cases would be allowed as long as they don't access any members would have almost zero benefit. In the case where you can't see the function definition you have no idea if calling it would be safe, because you can't know if the function uses this or not.
Just don't do it, there's no good reason to, and it's a Good Thing that the language forbids it.
In C++ language (according to C++03) the very attempt to use the value of an invalid pointer is causing undefined behavior already. There's no need to dereference it for the UB to happen. Just reading the pointer value is enough. The concept of "invalid value" that causes UB when you merely attempt to read that value actually extends to almost all scalar types, not just to pointers.
After delete the pointer is generally invalid in that specific sense, i.e. reading a pointer that supposedly points to something that has just been "deleted" leads to undefined behavior.
int *p = new int();
delete p;
int *p1 = p; // <- undefined behavior
Calling a member function through an invalid pointer is just a specific case of the above. The pointer is used as an argument for the implicit parameter this. Passing a pointer is an non-reference argument is an act of reading it, which is why the behavior is undefined in your example.
So, your question really boils down to why reading invalid pointer values causes undefined behavior.
Well, there could be many platform-specific reasons for that. For example, on some platforms the act of reading a pointer might lead to the pointer value being loaded into some dedicated address-specific register. If the pointer is invalid, the hardware/OS might detect it immediately and trigger a program fault. In fact, this is how our popular x86 platform works with regard to segment registers. The only reason we don't hear much about it is that the popular OSes stick to flat memory model that simply does not actively use segment registers.
C++11 actually states that dereferencing invalid pointer values causes undefined behavior, while all other uses of invalid pointer value cause implementation-defined behavior. It also notes that implementation-defined behavior in case of "copying an invalid pointer" might lead to "a system-generated runtime fault". So it might actually be possible to carefully maneuver one's way through the labyrinth of C++11 specification and successfully arrive at the conclusion that calling a non-virtual method through an invalid pointer should result in implementation-defined behavior mentioned above. By in any case the possibility of "a system-generated runtime fault" will always be there.
Dereferencing of this in this case is effectively an implementation detail. I'm not saying that the this pointer is not defined by the standard, because it is, but from a semantically abstracted standpoint what is the purpose of allowing the use of objects that have been destroyed, just because there is a corner case in which in practice it will be "safe"? None. So it's not. No object exists, so you may not call a function on it.