Integer to pointer cast pessimism optimization opportunities

Integer to pointer cast pessimism optimization opportunities - c++

The from_base function returns the memory address from the base to a selected
value in a program. I want to retrieve this value and return it in a function, however, I am getting a warning that says integer to pointer cast pessimism optimization opportunities.
DWORD chat::client() {
return *reinterpret_cast<DWORD*>(core::from_base(offsets::chat::client));
}
I am also getting this warning when casting a function from the program:
auto og_print = reinterpret_cast<chat::fn_print_chat>(core::from_base(offsets::chat::print));
I don't understand why I am getting a warning from clang-tidy about integer to pointer cast pessimism optimization opportunities
performance-no-int-to-ptr
I looked it up, but I can't figure it out. The code works, and gets the correct value. I am just concerned about the warning.

If a program performs a computation like:
char x[10],y[10];
int test(ptrdiff_t i)
{
int *p = x+i;
*p = 1;
y[1] = 2;
return *p;
}
a compiler would be reasonably entitled to assume that because p was formed via pointer arithmetic using x, it could not possible equal y+1, and thus the function would always return 1. If, however, the code had been written as:
char x[10],y[10];
int test(ptrdiff_t i)
{
int *p = (char*)((uintptr_t)x + i);
*p = 1;
y[1] = 2;
return *p;
}
then such an assumption would be far less reasonable, since unsigned numerical semantics would define the behavior of uintptr_t z = (uintptr_t)(y+1)-(uintptr_t)x as yielding a value such that x+z would equal (uintptr_t)(y+1).
I find the apparent caution clang exhibits here a bit surprising, given that clang is prone to assume that, given some pointer char*p, it's not possible for p to equal y if (uintptr_t)p to equal (uintptr_t)(x+10) and yet for p to equal y. The Standard doesn't forbid such an assumption, but then again it also wouldn't forbid an assumption that code will never use the result of any integer-to-pointer conversion for any purpose other than comparisons with other pointers. Implementations that support type uintptr_t should of course offer stronger guarantees about round-tripped pointers which than merely saying they may be compared for equality with the originals, but the Standard doesn't require such treatment.

Related

Internal logic of operator [] when dealing with pointers

I've been studying C++ for couple of months now and just recently decided to look more deeply into the logic of pointers and arrays. What I've been taught in uni is pretty basic - pointers contain the address of a variable. When an array is created, basically a pointer to its first element is created.
So I started experimenting a bit. (and got to a conclusion which I need confirmation for). First of all I created
int arr[10];
int* ptr = &arr[5];
And as you would imagine
cout << ptr[3];
gave me the 8th element of the array. Next I tried
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
which to my great delight (not irony) returned the same addresses. Even though num wasn't an array.
The conclusion to which I got: array is not something special in C++. It's just a pointer to the first element (already typed that). More important: Can I think about every pointer in the manner of object of a class variable*. Is the operator [] just overloaded in the class int*? For example to be something along the lines of:
int operator[] (int index){
return *(arrayFirstaddress + index);
}
What was interesting to me in these experiments is that operator [] works for EVERY pointer. (So it's exactly like overloading an operator for all instances of the said class)
Of course, I can be as wrong as possible. I couldn't find much information in the web, since I didn't know how to word my question so I decided to ask here.
It would be extremely helpful if you explained to me if I'm right/wrong/very wrong and why.

You find the definition of subscripting, i.e. an expression like ptr2[5] in the c++ standard, e.g. like in this online c++ draft standard:
5.2.1 Subscripting [expr.sub]
(1) ... The expression E1[E2] is identical (by definition) to
*((E1)+(E2))
So your "discovery" sounds correct, although your examples seem to have some bugs (e.g. ptr2[5] should not return an address but an int value, whereas ptr2+5 is an address an not an int value; I suppose you meant &ptr2[5]).
Further, your code is not a prove of this discovery as it is based on undefined behaviour. It may yield something that supports your "discovery", but your discovery could still be not valid, and it could also do the opposite (really!).
The reason why it is undefined behaviour is that even pointer arithmetics like ptr2+5 is undefined behaviour if the result is out of the range of the allocated memory block ptr2 points to (which is definitely the case in your example):
5.7 Additive operators
(6) ... Unless both pointers point to elements of the same array
object, or one past the last element of the array object, the behavior
is undefined.
Different compilers, different optimization settings, and even slight modifications anywhere in your program may let the compiler do other things here.

An array in C++ is a collection of objects. A pointer is a variable that can store the address of something. The two are not the same thing.
Unfortunately, your sample
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
exhibits undefined behaviour, both in the evaluation of ptr2[5] and ptr2 + 5. Pointer expressions are special - arithmetic involving pointers only has defined behaviour if the pointer being acted on (ptr2 in this case) and the result (ptr2 + 5) are within the same object. Or one past the end (although dereferencing a "one past the end" pointer - trying to access the value it points at - also gives undefined behaviour).
Semantically, *(ptr + n) and ptr[n] are equivalent (i.e. they have the same meaning) if ptr is a pointer and n is an integral value. So if evaluating ptr + n gives undefined behaviour, so does evaluating ptr[n]. Similarly, &ptr[n] and ptr + n are equivalent.
In expressions, depending on context, the name of an array is converted to a pointer, and that pointer is equal to the address of that array's first element. So, given
int x[5];
int *p;
// the following all have the same effect
p = x + 2;
p = &x[0] + 2;
p = &x[2];
That does not mean an array is a pointer though.

Can a pointer be volatile?

Consider the following code:
int square(volatile int *p)
{
return *p * *p;
}
Now, the volatile keyword indicates that the value in a
memory location can be altered in ways unknown to the compiler or have
other unknown side effects (e.g. modification via a signal interrupt,
hardware register, or memory mapped I/O) even though nothing in the
program code modifies the contents.
So what exactly happens when we declare a pointer as volatile?
Will the above mentioned code always work, or is it any different from this:
int square(volatile int *p)
{
int a = *p;
int b = *p
return a*b;
}
Can we end up multiplying different numbers, as pointers are volatile?
Or is there better way to do so?

Can a pointer be volatile?
Absolutely; any type, excluding function and references, may be volatile-qualified.
Note that a volatile pointer is declared T *volatile, not volatile T*, which instead declares a pointer-to-volatile.
A volatile pointer means that the pointer value, that is its address and not the value pointed to by, may have side-effects that are not visible to the compiler when it's accessed; therefore, optimizations deriving from the "as-if rule" may not be taken into account for those accesses.
int square(volatile int *p) { return *p * *p; }
The compiler cannot assume that reading *p fetches the same value, so caching its value in a variable is not allowed. As you say, the result may vary and not be the square of *p.
Concrete example: let's say you have two arrays of ints
int a1 [] = { 1, 2, 3, 4, 5 };
int a2 [] = { 5453, -231, -454123, 7565, -11111 };
and a pointer to one of them
int * /*volatile*/ p = a1;
with some operation on the pointed elements
for (int i = 0; i < sizeof(a1)/sizeof(a1[0]); ++i)
*(p + i) *= 2;
here p has to be read each iteration if you make it volatile because, perhaps, it may actually point to a2 due to external events.

Yes, you can of course have a volatile pointer.
Volatile means none more and none less than that every access on the volatile object (of whatever type) is treated as a visible side-effect, and is therefore exempted from optimization (in particular, this means that accesses may not be reordered or collapsed or optimized out alltogether). That's true for reading or writing a value, for calling member functions, and of course for dereferencing, too.
Note that when the previous paragraph says "reordering", a single thread of execution is assumed. Volatile is no substitute for atomic operations or mutexes/locks.
In more simple words, volatile generally translates to roughly "Don't optimize, just do exactly as I say".
In the context of a pointer, refer to the exemplary usage pattern given by Chris Lattner's well-known "What every programmer needs to know about Undefined Behavior" article (yes, that article is about C, not C++, but the same applies):
If you're using an LLVM-based compiler, you can dereference a "volatile" null pointer to get a crash if that's what you're looking for, since volatile loads and stores are generally not touched by the optimizer.

Yes. int * volatile.
In C++, keywords according to type/pointer/reference go after the token, like int * const is constant pointer to integer, int const * is pointer to constant integer, int const * const is constant pointer to constant integer e.t.c. You can write keyword before the type only if it's for the first token: const int x is equal to int const x.

The volatile keyword is a hint for the compiler (7.1.6.1/7):
Note:
volatile
is a hint to the implementation to avoid aggressive optimization involving the object
because the value of the object might be changed by means undetectable by an implementation. Furthermore,
for some implementations,
volatile
might indicate that special hardware instructions are required to access
the object. See
1.9
for detailed semantics. In general, the semantics of
volatile
are intended to be the
same in C
++
as they are in C.
— end note
]
What does it mean? Well, take a look at this code:
bool condition = false;
while(!condition)
{
...
}
by default, the compiler will easilly optimize the condition out (it doesn't change, so there is no need to check it at every iteration). If you, however, declare the condition as volatile, the optimization will not be made.
So of course you can have a volatile pointer, and it is possible to write code that will crash because of it, but the fact that a variable is volative doesn't mean that it is necessarily going to be changed due to some external interference.

Yes, a pointer can be volatile if the variable that it points to can change unexpectedly even though how this might happen is not evident from the code.
An example is an object that can be modified by something that is external to the controlling thread and that the compiler should not optimize.
The most likely place to use the volatile specifier is in low-level code that deals directly with the hardware and where unexpected changes might occur.

You may be end up multiplying different numbers because it's volatile and could be changed unexpectedly. So, you can try something like this:
int square(volatile int *p)
{
int a = *p;
return a*a;
}

int square(volatile int *p)
{
int a = *p;
int b = *p
return a*b;
}
Since it is possible for the value of *ptr to change unexpectedly, it is possible for a and b to be different. Consequently, this code could return a number that is not a square! The correct way to code this is:
long square(volatile int *p)
{
int a;
a = *p;
return a * a;
}

Casting pointers in C++?

Running this code returns what I assume to be the integer value of realPtr's address.
I'm still new to C++ and I was wondering - is it possible to convert from one data type pointer to another and the pointer variable whose value is assigned will still print out the right value (i.e. *integerPtr prints out 2)?
Code snippet
double real = 2.0;
double *realPtr = ℜ
int *integerPtr;
integerPtr = ((int*)&realPtr);
cout << *integerPtr << endl;
Output
1606416424
Thanks!

There are 3 totally different types that you are dealing with here:
Pointers point to any location in memory. Every pointers are basically the same type, but they are treated as different types depending on what you declare them to point to.
int, and double are types that represent integer and real numeric value; An integer and a double representing the same numerical value (e.g. 2) will not have the same binary content as stipulated by the relevant standard defining how integers and floating point numerics are stored in memory.
Since every pointers are essentially the same type, you may cast a kind of pointer to any other kind. Let's say you have your own Foo class that has nothing to do with representing numerical value. You may still do this:
int* p_int = 2;
Foo* p_foo = (Foo*) p_int;
This is legal, but this will most probably lead to an error, unless the memory representation of a Foo object is akin to that of an int.
If you want to cast an int to a double, you must cast the data, casting the pointer won't do anything. That's why there are several cast with different names in C++11, and it is considered good practice to use them since by doing so you express explicitly what you want to be done. In your case, you may want one of two different things
Reinterpret_cast
Casting a pointer to an int to a pointer to a double, which means essentially doing nothing except telling the compiler that it can safely assume that the data pointed to can be considered as a double. The compiler will assume so, and it is your responsibility to assure that it is the case. This type of cast is considered the most dangerous since as we all know, programmers can't be trusted. It is called a reinterpret_cast:
int* p_int = 2;
Foo* p_foo = reinterpret_cast<int>(p_int);
Same code as above, but we express the danger in the scary "reinterpret_cast". p_int and p_foo have the same value, we did nothing except expressing the fact that we now consider the address of our integer as an address to a foo.
Static_cast
If you want a real cast, you have to operate on the data, not on the pointer. Casting a type of data to another by whatever means the compilers know of is called static_cast. This is probably what you want to do here:
int i = 2;
p_int = &i;
double d = static_cast<double>(i);
p_double = &d; //p_int and p_double have different values since they point to different objects.
The compiler will look for a conversion function from int to double, and yell at you if it doesn't find any.
Of course, there is nothing wrong with doing the exact same thing by using only pointers, although it makes the code slightly less readable (you should be wary of using pointers at all, and do it for a good reason):
int* p_i = 2;
int* p_d = static_cast<double>(*p_i) // p_d is a pointer to a double initialized to the value obtained after converting the int pointed to by p_i

The internal representation of float/dobule and integer data types are different. In a 32 bit PC Integer will take 4 bytes while a double take 8 bytes.
Also there is a serious bug in your code.
double real = 2.0;
double *realPtr = ℜ
int *integerPtr;
integerPtr = ((int*)&realPtr);
cout << *integerPtr << endl;
In the above code see the bold lines. There you declared a pointer to an integer called "integerPtr". But the actual value stored address of a pointer to a double. ie &realPtr is double ** (Pointer to a pointer which holds a double type value). And then you are trying to print the value using *integerPtr.
So I changed your code as follows and it gives a value of 0.
double real = 2.0;
double *realPtr = ℜ
int *integerPtr;
integerPtr = ((int*)realPtr);
std::cout << *integerPtr << std::endl;

Is it legal to cast a pointer to array reference using static_cast in C++?

I have a pointer T * pValues that I would like to view as a T (&values)[N]
In this SO answer https://stackoverflow.com/a/2634994/239916, the proposed way of doing this is
T (&values)[N] = *static_cast<T(*)[N]>(static_cast<void*>(pValues));
The concern I have about this is. In his example, pValues is initialized in the following way
T theValues[N];
T * pValues = theValues;
My question is whether the cast construct is legal if pValues comes from any of the following constructs:
1:
T theValues[N + M]; // M > 0
T * pValues = theValues;
2:
T * pValues = new T[N + M]; // M >= 0

Short answer: You are right. The cast is safe only if pValues is of type T[N] and both of the cases you mention (different size, dynamically allocated array) will most likely lead to undefined behavior.
The nice thing about static_cast is that some additional checks are made in compile time so if it seems that you are doing something wrong, compiler will complain about it (compared to ugly C-style cast that allows you to do almost anything), e.g.:
struct A { int i; };
struct C { double d; };
int main() {
A a;
// C* c = (C*) &a; // possible to compile, but leads to undefined behavior
C* c = static_cast<C*>(&a);
}
will give you: invalid static_cast from type ‘A*’ to type ‘C*’
In this case you cast to void*, which from the view of checks that can be made in compile time is legal for almost anything, and vice versa: void* can be cast back to almost anything as well, which makes the usage of static_cast completely useless at first place since these checks become useless.
For the previous example:
C* c = static_cast<C*>(static_cast<void*>(&a));
is no better than:
C* c = (C*) &a;
and will most likely lead to incorrect usage of this pointer and undefined behavior with it.
In other words:
A arr[N];
A (&ref)[N] = *static_cast<A(*)[N]>(&arr);
is safe and just fine. But once you start abusing static_cast<void*> there are no guarantees at all about what will actually happen because even stuff like:
C *pC = new C;
A (&ref2)[N] = *static_cast<A(*)[N]>(static_cast<void*>(&pC));
becomes possible.

Since C++17 at least the shown expression isn't safe, even if pValues is a pointer to the first element of the array and the array is of exactly matching type (including excat size), whether obtained from a variable declaration or a call to new. (If theses criteria are not satisfied it is UB regardless of the following.)
Arrays and their first element are not pointer-interconvertible and therefore reinterpret_cast (which is equivalent to two static_casts through void*) cannot cast the pointer value of one to a pointer value of the other.
Consequently static_cast<T(*)[N]>(static_cast<void*>(pValues)) will still point at the first element of the array, not the array object itself.
Derferencing this pointer is then undefined behavior, because of the type/value mismatch.
This can be potentially remedied with std::launder, which may change the pointer value where reinterpret_cast can't. Specifically the following may be well-defined:
T (&values)[N] = *std::launder(static_cast<T(*)[N]>(static_cast<void*>(pValues)));
or equivalently
T (&values)[N] = *std::launder(reinterpret_cast<T(*)[N]>(pValues));
but only if the pointer that would be returned by std::launder cannot be used to access any bytes that weren't accessible through the original pValues pointer. This is satified if the array is a complete object, but e.g. not satisfied if the array is a subarray of a two-dimensional array.
For the exact reachability condition, see https://en.cppreference.com/w/cpp/utility/launder.

Type Casting in C++

I am reading a C++ book and have a problem with the static casting. Here is a function:
void fun(int*pi)
{
void *pv = pi
int *pi2 = static_cast<int*>(pv); //explicit conversion back to int*
double *pd3 = static_cast<double*>(pv); //unsafe
}
The last statement:
double*pd3 = static_cast<double*>(pv);
is considered as unsafe. I don't get why it is considered unsafe.

The cast reinterprets the bits of the pointed to int, plus possibly bits of some following memory (if there is!), as a double value.
A double is (1) typically larger than an int, and (2) has some internal structure.
Point (1) means that any use of the dereferenced result pointer, may access memory that just isn't accessible, beyond the int.
Point (2) means that the arbitrary bitpattern, may be invalid as a double bitpattern, and may cause a hardware exception a.k.a. a "trap" when it's used. From a C++ point of view that's Undefined Behavior. From a practical programming point of view it's typically a "crash".
In contrast, accessing the bits of a double as an int is usually in-practice safe, even though it's formally UB, because (1) an int is typically smaller or equal in size to double, and (2) an int usually does not have any invalid bit patterns. However, depending on the compiler options the compiler may not be happy about doing that directly.
Above I forgot to mention alignment, as Loki Astari pointed out in a comment. And that's a reason (3) for unsafety. As an example, with some given implementation an int may be allowed to have an address that is a multiple of 4, while a double may be required to reside at an address that is a multiple of 8. Then the dereferenced pointer may access a double at an address such that isn't a multiple of 8, causing a trap (more formally, UB where anything can happen).

Because the size of a double pointer is not the same as an int pointer, and if you try to use it, you might get a segmentation fault. They are not necessarily compatible types.
You can try casting the value pointed by pi2.
void fun(int*pi)
{
void *pv = pi;
int *pi2 = static_cast<int*>(pv);
double d = static_cast<double>(*pi2);
std::cout << d; // 42
}
int main()
{
int i = 42;
fun(&i);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js