I know this code would cause undefined behaviour due to breaking the strict aliasing rule,
as we are point to the same memory location with a type int and float and dereferencing it, code could break after compiler optimizations take place:
int main(){
int a = 5;
float f = *reinterpret_cast<float*>(&a);
return (int) f;
}
But how about this snippet?
#include <iostream>
int main(){
intptr_t p = 1234; // let's assume this is a valid address in memory.
float f = *reinterpret_cast<float*>(p);
return (int) f;
}
In the above, if we assume p is a valid memory address (will not cause a segfault) will it still have UB and break the strict aliasing rule? there is no other code pointing to that chunk of memory.
Edit
My second example can be written like this using bit_cast:
intptr_t p = 1234; // let's assume this is a valid address in memory.
float f = *std::bit_cast<float*>(p);
Yes, it will still break the strict aliasing rule, as you will be trying to dereference a pointer to float, but float object never lived at this address.
Luckily, in C++ 20 you can use std::bit_cast for this purpose. Pre-C++20 you can just cast :), as even though this is UB, there is no sane compiler which would produce the results which would be different from the ones you are expecting, since this technique is omni-present everywhere.
Related
The from_base function returns the memory address from the base to a selected
value in a program. I want to retrieve this value and return it in a function, however, I am getting a warning that says integer to pointer cast pessimism optimization opportunities.
DWORD chat::client() {
return *reinterpret_cast<DWORD*>(core::from_base(offsets::chat::client));
}
I am also getting this warning when casting a function from the program:
auto og_print = reinterpret_cast<chat::fn_print_chat>(core::from_base(offsets::chat::print));
I don't understand why I am getting a warning from clang-tidy about integer to pointer cast pessimism optimization opportunities
performance-no-int-to-ptr
I looked it up, but I can't figure it out. The code works, and gets the correct value. I am just concerned about the warning.
If a program performs a computation like:
char x[10],y[10];
int test(ptrdiff_t i)
{
int *p = x+i;
*p = 1;
y[1] = 2;
return *p;
}
a compiler would be reasonably entitled to assume that because p was formed via pointer arithmetic using x, it could not possible equal y+1, and thus the function would always return 1. If, however, the code had been written as:
char x[10],y[10];
int test(ptrdiff_t i)
{
int *p = (char*)((uintptr_t)x + i);
*p = 1;
y[1] = 2;
return *p;
}
then such an assumption would be far less reasonable, since unsigned numerical semantics would define the behavior of uintptr_t z = (uintptr_t)(y+1)-(uintptr_t)x as yielding a value such that x+z would equal (uintptr_t)(y+1).
I find the apparent caution clang exhibits here a bit surprising, given that clang is prone to assume that, given some pointer char*p, it's not possible for p to equal y if (uintptr_t)p to equal (uintptr_t)(x+10) and yet for p to equal y. The Standard doesn't forbid such an assumption, but then again it also wouldn't forbid an assumption that code will never use the result of any integer-to-pointer conversion for any purpose other than comparisons with other pointers. Implementations that support type uintptr_t should of course offer stronger guarantees about round-tripped pointers which than merely saying they may be compared for equality with the originals, but the Standard doesn't require such treatment.
Suppose I want to dynamically allocate space for an int and write the maximum representable value into that memory. This code comes to mind:
auto rawMem = std::malloc(sizeof(int)); // rawMem's type is void*
*(reinterpret_cast<int*>(rawMem)) = INT_MAX; // INT_MAX from <limits.h>
Does this code violate C++'s rules about strict aliasing? Neither g++ nor clang++ complain with -Wall -pedantic.
If the code doesn't violate strict aliasing, why not? std::malloc returns void*, so while I don't know what the static and dynamic types of the memory returned by std::malloc are, there's no reason to think either is int. And we're not accessing the memory as a char or unsigned char.
I'd like to think the code is kosher, but if it is, I'd like to know why.
As long as I'm in the neighborhood, I'd also like to know the static and dynamic types of the memory returned by the memory allocation functions (std::malloc and std::operator new).
Strict aliasing rule allows the compiler to assume that the same location in memory cannot be accessed through two or more pointers of different types.
Consider the following code:
int* pi = ...;
double* pd = ...;
const int i1 = *pi; // (1)
*pd = 123.456; // (2)
const int i2 = *pi; // (3)
Analysis of this code with the strict aliasing rule in mind suggests that i2 == i1, since the location pointed by pi should not be modified between (1) and (3). The compiler can therefore eliminate one of the variables i1 or i2 (provided that the program doesn't take the address of either of them). In general, strict aliasing rule gives more freedom to the compiler while optimizing the code.
In your example you obtain a memory location through malloc(). The compiler doesn't assume any type for that memory location (i.e. both the static and dynamic type of that memory location is ... ummm... untyped raw memory, however due to the special status of the char[] type in the strict aliasing rule we can also legally treat that memory location as an array of chars). The strict aliasing rule doesn't yet apply to the new memory location for the simple reason that there is no typed pointer that can be involved in the analysis. It is you that designate a type to that location by initializing it with an object of the desired type. In case of a primitive type or a POD type, a reinterpret_cast followed by assignment (just like in your example) is a valid way to initialize that memory location, but for types with a non-trivial constructor you would need to construct an object with placement new. From that very moment, the memory location stops being raw memory, and is subject to the strict aliasing rule.
Consider the following code:
int square(volatile int *p)
{
return *p * *p;
}
Now, the volatile keyword indicates that the value in a
memory location can be altered in ways unknown to the compiler or have
other unknown side effects (e.g. modification via a signal interrupt,
hardware register, or memory mapped I/O) even though nothing in the
program code modifies the contents.
So what exactly happens when we declare a pointer as volatile?
Will the above mentioned code always work, or is it any different from this:
int square(volatile int *p)
{
int a = *p;
int b = *p
return a*b;
}
Can we end up multiplying different numbers, as pointers are volatile?
Or is there better way to do so?
Can a pointer be volatile?
Absolutely; any type, excluding function and references, may be volatile-qualified.
Note that a volatile pointer is declared T *volatile, not volatile T*, which instead declares a pointer-to-volatile.
A volatile pointer means that the pointer value, that is its address and not the value pointed to by, may have side-effects that are not visible to the compiler when it's accessed; therefore, optimizations deriving from the "as-if rule" may not be taken into account for those accesses.
int square(volatile int *p) { return *p * *p; }
The compiler cannot assume that reading *p fetches the same value, so caching its value in a variable is not allowed. As you say, the result may vary and not be the square of *p.
Concrete example: let's say you have two arrays of ints
int a1 [] = { 1, 2, 3, 4, 5 };
int a2 [] = { 5453, -231, -454123, 7565, -11111 };
and a pointer to one of them
int * /*volatile*/ p = a1;
with some operation on the pointed elements
for (int i = 0; i < sizeof(a1)/sizeof(a1[0]); ++i)
*(p + i) *= 2;
here p has to be read each iteration if you make it volatile because, perhaps, it may actually point to a2 due to external events.
Yes, you can of course have a volatile pointer.
Volatile means none more and none less than that every access on the volatile object (of whatever type) is treated as a visible side-effect, and is therefore exempted from optimization (in particular, this means that accesses may not be reordered or collapsed or optimized out alltogether). That's true for reading or writing a value, for calling member functions, and of course for dereferencing, too.
Note that when the previous paragraph says "reordering", a single thread of execution is assumed. Volatile is no substitute for atomic operations or mutexes/locks.
In more simple words, volatile generally translates to roughly "Don't optimize, just do exactly as I say".
In the context of a pointer, refer to the exemplary usage pattern given by Chris Lattner's well-known "What every programmer needs to know about Undefined Behavior" article (yes, that article is about C, not C++, but the same applies):
If you're using an LLVM-based compiler, you can dereference a "volatile" null pointer to get a crash if that's what you're looking for, since volatile loads and stores are generally not touched by the optimizer.
Yes. int * volatile.
In C++, keywords according to type/pointer/reference go after the token, like int * const is constant pointer to integer, int const * is pointer to constant integer, int const * const is constant pointer to constant integer e.t.c. You can write keyword before the type only if it's for the first token: const int x is equal to int const x.
The volatile keyword is a hint for the compiler (7.1.6.1/7):
Note:
volatile
is a hint to the implementation to avoid aggressive optimization involving the object
because the value of the object might be changed by means undetectable by an implementation. Furthermore,
for some implementations,
volatile
might indicate that special hardware instructions are required to access
the object. See
1.9
for detailed semantics. In general, the semantics of
volatile
are intended to be the
same in C
++
as they are in C.
— end note
]
What does it mean? Well, take a look at this code:
bool condition = false;
while(!condition)
{
...
}
by default, the compiler will easilly optimize the condition out (it doesn't change, so there is no need to check it at every iteration). If you, however, declare the condition as volatile, the optimization will not be made.
So of course you can have a volatile pointer, and it is possible to write code that will crash because of it, but the fact that a variable is volative doesn't mean that it is necessarily going to be changed due to some external interference.
Yes, a pointer can be volatile if the variable that it points to can change unexpectedly even though how this might happen is not evident from the code.
An example is an object that can be modified by something that is external to the controlling thread and that the compiler should not optimize.
The most likely place to use the volatile specifier is in low-level code that deals directly with the hardware and where unexpected changes might occur.
You may be end up multiplying different numbers because it's volatile and could be changed unexpectedly. So, you can try something like this:
int square(volatile int *p)
{
int a = *p;
return a*a;
}
int square(volatile int *p)
{
int a = *p;
int b = *p
return a*b;
}
Since it is possible for the value of *ptr to change unexpectedly, it is possible for a and b to be different. Consequently, this code could return a number that is not a square! The correct way to code this is:
long square(volatile int *p)
{
int a;
a = *p;
return a * a;
}
I am reading a C++ book and have a problem with the static casting. Here is a function:
void fun(int*pi)
{
void *pv = pi
int *pi2 = static_cast<int*>(pv); //explicit conversion back to int*
double *pd3 = static_cast<double*>(pv); //unsafe
}
The last statement:
double*pd3 = static_cast<double*>(pv);
is considered as unsafe. I don't get why it is considered unsafe.
The cast reinterprets the bits of the pointed to int, plus possibly bits of some following memory (if there is!), as a double value.
A double is (1) typically larger than an int, and (2) has some internal structure.
Point (1) means that any use of the dereferenced result pointer, may access memory that just isn't accessible, beyond the int.
Point (2) means that the arbitrary bitpattern, may be invalid as a double bitpattern, and may cause a hardware exception a.k.a. a "trap" when it's used. From a C++ point of view that's Undefined Behavior. From a practical programming point of view it's typically a "crash".
In contrast, accessing the bits of a double as an int is usually in-practice safe, even though it's formally UB, because (1) an int is typically smaller or equal in size to double, and (2) an int usually does not have any invalid bit patterns. However, depending on the compiler options the compiler may not be happy about doing that directly.
Above I forgot to mention alignment, as Loki Astari pointed out in a comment. And that's a reason (3) for unsafety. As an example, with some given implementation an int may be allowed to have an address that is a multiple of 4, while a double may be required to reside at an address that is a multiple of 8. Then the dereferenced pointer may access a double at an address such that isn't a multiple of 8, causing a trap (more formally, UB where anything can happen).
Because the size of a double pointer is not the same as an int pointer, and if you try to use it, you might get a segmentation fault. They are not necessarily compatible types.
You can try casting the value pointed by pi2.
void fun(int*pi)
{
void *pv = pi;
int *pi2 = static_cast<int*>(pv);
double d = static_cast<double>(*pi2);
std::cout << d; // 42
}
int main()
{
int i = 42;
fun(&i);
}
I've read the essay Surviving the Release Version.
Under the "Aliasing bugs" clause it says:
You can get tighter code if you tell
the compiler that it can assume no
aliasing....
I've also read Aliasing (computing).
What exactly is a variable alias? I understand it means using a pointer to a variable is an alias, but, how/why does it affect badly, or in other words - why telling the compiler that it can assume no aliasing would get me a "tighter code"
Aliasing is when you have two different references to the same underlying memory. Consider this made up example:
int doit(int *n1, int *n2)
{
int x = 0;
if (*n1 == 1)
{
*n2 = 0;
x += *n1 // line of interest
}
return x;
}
int main()
{
int x = 1;
doit(&x, &x); // aliasing happening
}
If the compiler has to allow for aliasing, it needs to consider the possibility that n1 == n2. Therefore, when it needs to use the value of *n1 at "line of interest", it needs to allow for the possibility it was changed by the line *n2 = 0.
If the compiler can assume no aliasing, it can assume at "line of interest" that *n1 == 1 (because otherwise we would not be inside the if). The optimizer can then use this information to optimize the code (in this case, change "line of interest" from following the pointer and doing a general purpose addition to using a simple increment).
Disallowing aliasing means if you have a pointer char* b, you can assume that b is the only pointer in the program that points to that particular memory location, which means the only time that memory location is going to change is when the programmer uses b to change it. The generated assembly thus doesn't need to reload the memory pointed to by b into a register as long as the compiler knows nothing has used b to modify it. If aliasing is allowed it's possible there's another pointer char* c = b; that was used elsewhere to mess with that memory