Simple function-size; Understanding pointer-pointer difference - c++

I'm testing some ways of calculating the size,in bytes of a function(I'm familiar with opcodes on x86). The code is quite self-explanatory:
void exec(void* addr){
int (WINAPI *msg)(HWND,LPCSTR,LPCSTR,UINT)=(int(WINAPI *)(HWND,LPCSTR,LPCSTR,UINT))addr;
msg(0,"content","title",0);
}
void dump(){};
int main()
{
cout<<(char*)dump-(char*)exec; // this is 53
return 0;
}
It is supposed to substract the address of 'exec' from 'dump'. This works but I noticed the values differ when using other types of pointers like DWORD*:
void exec(void* addr){
int (WINAPI *msg)(HWND,LPCSTR,LPCSTR,UINT)=(int(WINAPI *)(HWND,LPCSTR,LPCSTR,UINT))addr;
msg(0,"content","title",0);
}
void dump(){};
int main()
{
cout<<(DWORD*)dump-(DWORD*)exec; // this is 13
return 0;
}
From my understanding no matter the pointer type ,it is always the largest possible data type(so that it can handle large adresses),in my case of 4 bytes (x86 system). The only thing that changes between pointers is the data type it points to.
What is the explanation?

Pointer arithmetic in C/C++ is designed for accessing elements of an array. In fact, array indexing is merely a simpler syntax for pointer arithmetic. For example, if you have an array named array, array[1] is the same thing as *(array+1), regardless of the data type of the elements in array.
(I'm assuming here that no operator overloading is going on; that could change everything.)
If you have a char* or unsigned char*, the pointer points to a single byte, and incrementing the pointer advances it to the next byte.
In Windows, DWORD is a 32-bit value (four bytes), and DWORD* points to a 32-bit value. If you increment a DWORD*, the pointer is advanced by four bytes, just as array[1] gives you the second element of the array, which is four bytes (one DWORD) after the first element. Similarly, if you add 10 to a DWORD*, it advances 40 bytes, not 10 bytes.
Either way, incrementing or adding to a pointer is only valid if the resulting pointer points into the same array as the original one, or one element past the end. Otherwise it is undefined behavior.
Pointer subtraction works just like addition. When you subtract one pointer from another, they must be the same type, and must be pointers into the same array or one past the end.
What you're doing is counting the number of elements between the two pointers, as if they were pointers into the same array (or one past the end). But when the two pointers don't point into the same array (or again, one past the end), the result is undefined behavior.
Here is a reference from Carnegie Mellon University about this:
ARR36-C. Do not subtract or compare two pointers that do not refer to the same array - SEI CERT C Coding Standard

Pointer subtraction tells you the number of elements between the two addresses, so using DWORD * it will be in DWORD sized units.

You have:
cout<<(char*)dump-(char*)exec;
where dump and exec are the names of functions. Each cast converts a function pointer to char*.
I'm not sure about the status of such a conversion in C++. I think it either has undefined behavior or is illegal (making your program ill-formed). When I compile with g++ 4.8.4 with options -pedantic -std=c++11, it complains:
warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Wpedantic]
(There's a similar diagnostic for C, which I believe is not strictly correct, but that's another story.)
There's no guarantee that there's any meaningful relationship between object pointers and function pointers.
Apparently your compiler lets you get away with the casts, and presumably the result is a char* representation of the address of the function. Subtracting two pointers yields the distance between the two addresses in units of the type the pointers point to. Subtracting two char* pointers yields a ptrdiff_t result that is the difference in bytes. Subtracting two DWORD* pointers yields the difference in unit of sizeof (DWORD) (probably 4 bytes?). That explains why you get different results. If two DWORD pointers don't point to addresses that aren't a whole number of DWORDs apart in memory, the results are unpredictable, but in your example getting 13 rather than 53 (truncating) is plausible.
However, pointer subtraction is defined only when both pointer operands point to elements of the same array object, or just past the end of it. For any other operands, the behavior is undefined.
For an implementation that permits the casts, that uses the same representation for object pointers and for function pointers, and on which the value of a function pointer refers to a memory address in the same way that the value of an object pointer does, you can likely determine the size of a function by converting its address to char* and subtracting the result from the converted address of an adjacent function. But a compiler and/or linker is free to generate code for functions in any order it likes, including perhaps inserting code for other functions between two functions whose definitions are adjacent in your source code.
If you want to determine the size in bytes, use pointers to byte-sized types such as char. And be aware that the method you're using is not portable and is not guaranteed to work.
If you really need the size of a function, see if you can get your linker to generate some kind of map showing the allocated sizes and locations of your functions. There's no portable way to do it from within C++.

Related

Pointers and data types

I have the following question.
Given that a pointer holds the value of a memory address, why is it permitted to add an integer
data type value to a pointer variable but not a double data type?
My thoughts: Is it because we assume that the pointer is an int as well, or maybe because if we add a double will increase its length?
Thank you for your time.
You almost answered your question yourself: a pointer is a memory address. A memory address is an integer. You can add integers to integers and get integers as a result. Adding a float to an integer gives you a float, which cannot be used as a memory address.
For example, char *x = 0; is the address of a single byte; What would char *y = 0.5; mean? A byte that's somehow made up of the second half of the byte at address 0 and the first half of the byte at address 1?? This may make sense, but what about char *x = 3.1415926; or any similar floating-point number??
My thoughts: Is it because we assume that the pointer is an int as well, or maybe because if we add a double will increase its length?
If you look to documentation it says:
Certain addition, subtraction, increment, and decrement operators are defined for pointers to elements of arrays: such pointers satisfy the LegacyRandomAccessIterator requirements and allow the C++ library algorithms to work with raw arrays.
(emphasis is mine) and you should remember that:
*(ptr + 1)
is equal to:
ptr[1]
and indexes for arrays are integers so language does not define operations on pointers with floating point operands as that does not make any sense.
You can not add a double* (pointer) to an int* (pointer) via the conventions of C. A pointer holds a value of a memory address ["stores/points to the address of another variable"] that value in essence is determined by its type in this case int(4 byte-block of memory if I recall). A double is a double-precision, 64-bit floating-point data type. Just can't do it from the most "hardware" of levels.

Does casting pointers to integers define a total order on pointers?

(related to my previous question)
In QT, the QMap documentation says:
The key type of a QMap must provide operator<() specifying a total order.
However, in qmap.h, they seem to use something similar to std::less to compare pointers:
/*
QMap uses qMapLessThanKey() to compare keys. The default
implementation uses operator<(). For pointer types,
qMapLessThanKey() casts the pointers to integers before it
compares them, because operator<() is undefined on pointers
that come from different memory blocks. (In practice, this
is only a problem when running a program such as
BoundsChecker.)
*/
template <class Key> inline bool qMapLessThanKey(const Key &key1, const Key &key2)
{
return key1 < key2;
}
template <class Ptr> inline bool qMapLessThanKey(const Ptr *key1, const Ptr *key2)
{
Q_STATIC_ASSERT(sizeof(quintptr) == sizeof(const Ptr *));
return quintptr(key1) < quintptr(key2);
}
They just cast the pointers to quintptrs (which is the QT-version of uintptr_t, that is, an unsigned int that is capable of storing a pointer) and compare the results.
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to a pointer to void, and the result will compare equal to the original pointer: uintptr_t
Do you think this implementation of qMapLessThanKey() on pointers is ok?
Of course, there is a total order on integral types. But I think this is not sufficient to conclude that this operation defines a total order on pointers.
I think that it is true only if p1 == p2 implies quintptr(p1) == quintptr(p2), which, AFAIK, is not specified.
As a counterexample of this condition, imagine a target using 40 bits for pointers; it could convert pointers to quintptr, setting the 40 lowest bits to the pointer address and leaving the 24 highest bits unchanged (random). This is sufficient to respect the convertibility between quintptr and pointers, but this does not define a total order for pointers.
What do you think?
The Standard guarantees that converting a pointer to an uintptr_t will yield a value of some unsigned type which, if cast to the original pointer type, will yield the original pointer. It also mandates that any pointer can be decomposed into a sequence of unsigned char values, and that using such a sequence of unsigned char values to construct a pointer will yield the original. Neither guarantee, however, would forbid an implementation from including padding bits within pointer types, nor would either guarantee require that the padding bits behave in any consistent fashion.
If code avoided storing pointers, and instead cast to uintptr_t every pointer returned from malloc, later casting those values back to pointers as required, then the resulting uintptr_t values would form a ranking. The ranking might not have any relationship to the order in which objects were created, nor to their arrangement in memory, but it would be a ranking. If any pointer gets converted to uintptr_t more than once, however, the resulting values might rank entirely independently.
I think that you can't assume that there is a total order on pointers. The guarantees given by the standard for pointer to int conversions are rather limited:
5.2.10/4: A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined.
5.2.10/5: A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer
of sufficient size (...) and back to the same pointer type will have
its original value; mappings between pointers and integers are
otherwise implementation-defined.
From a practical point of view, most of the mainstream compilers will convert a pointer to an integer in a bitwise manner, and you'll have a total order.
The theoretical problem:
But this is not guaranteed. It might not work on past platforms (x86 real and protected mode), on exotic platform (embedded systems ?) , and -who knows- on some future platforms (?).
Take the example of segmented memory of the 8086: The real address is given by the combination of a segment (e.g. DS register for data segment, an SS for stack segment,...) and an offest:
Segment: XXXX YYYY YYYY YYYY 0000 16 bits shifted by 4 bits
Offset: 0000 ZZZZ ZZZZ ZZZZ ZZZZ 16 bits not sifted
------------------------
Address: AAAA AAAA AAAA AAAA AAAA 20 bits address
Now imagine that the compiler would convert the pointer to int, by simply doing the address math and put 20 bits in the integer: your safe and have a total order.
But another equally valid approach would be to store the segment on 16 upper bits and the offset on the 16 lower bits. In fact, this way would significantly facilitate/accelerate the load of pointer values into cpu registers.
This approach is compliant with standard c++ requirements, but each single address could be represented by 16 different pointers: your total order is lost !!
**Are there alternatives for the order ? **
One could imagine using pointer arithmetics. There are strong constraints on pointer arithmetics for elements in a same array:
5.7/6: When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two
array elements.
And subscripts are ordered.
Array can be of maximum size_t elements. So, naively, if sizeof(pointer) <= sizof(size_t) one could assume that taking an arbitrary reference pointer and doing some pointer arithmetic should lead to a total order.
Unfortunately, here also, the standard is very prudent:
5.7.7: For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T is different from the
cv-unqualified array element type, the behavior is undefined.
So pointer arithmetic won't do the trick for arbitrary pointers either. Again, back to the segmented memory models, helps to understand: arrays could have maximum 65535 bytes to fit completely in one segment. But different arrays could use different segments so that pointer arithmetic wouldn't be reliable for a total order either.
Conclusion
There's a subtle note in the standard on the mapping between pointer and interal value:
It is intended to be unsurprising to those who know the addressing
structure of the underlying machine.
This means that must be be possible to determine a total order. But keep in mind that it'll be non portable.

Is this type casting or some kind of pointer arithmetics?

I came across a line of code written in C++:
long *lbuf = (long*)spiReadBuffer;
And it turns out that "spiReadBuffer" is a byte array with 12 elements. But I am a little confused. I think I am familiar with defining pointers and I can see that "lbuf" is a type "long" pointer. Also I thought for casting we can do something like this:
y = (int) x;
But what if I put a "*" after the "int" just like my first example, where there is one after "long"?
I apologize if this is a really trivial question, but as I went through the type casting and pointers topics I did not come across my case and I did not really understand it.
I would appreciate it if you could guide me or introduce me to any relevant materials or resources.
This is called type punning. It tricks the compiler into reading the memory occupied by an object as if it was of another type.
In your case, the array spiReadBuffer decays to a pointer to its first element, then the pointer is cast and stored. When you dereference this pointer, you will access the beginning of the array as if it were a long.
The problem with this approach is that it triggers undefined behaviour (see strict aliasing). So even though it works in a lot of situations, it can also break without notice.
There are two ways (that I know of) to type-pun safely. The first one is standard-compliant : std::memcpy.
char spiReadBuffer[12];
long rbAsLong;
std::memcpy(&rbAsLong, &spiReadBuffer, sizeof rbAsLong);
// rbAsLong contains the first four bytes of spiReadBuffer, reinterpreted as a long.
The second one involves an extension that is often provided by compilers (but you should check), that extends the behaviour of unions.
union {
char buf[12];
long asLong;
} spiReadBuffer;
The standard states that writing to a member of a union then reading from another member is undefined behaviour. These compiler extensions choose to define it as a safe reinterpretation.
in C/C++ arrays are treated the same way by the compiler:
char spiReadBuffer[12];
char* pBuffer;
the compiler will treat both spiReadBuffer and pBuffer as pointers.
The code snippet
long *lbuf = (long*)spiReadBuffer;
is an example of type casting, only it's for pointer types. A char* is converted to a long*; You could say this is a type of pointer arithmetic because now, you can read sizeof(long) bytes from spiReadBuffer using the long* ( instead of one byte at a time ).
The second snippet you showed : y = (int) x; is also a cast, but not for pointers;
Consider this snippet:
char spiReadBuffer[] = {1,2,3,4,5,6,7,8};
long *lbuf = (long*)spiReadBuffer;
printf ("%08x\n", lbuf[0]);
It will print 04030201 on a little endian architecture or 01020304 on a little endian architecture.
After the long *lbuf = (long*)spiReadBuffer statement lBuf points to the beginning of the spiReadBuffer and lbuf[0] (or *lBuf) allows you to read the first 4 bytes of spiReadBuffer as a long.

Can std::uintptr_t be used to avoid undefined behavior of out-of-bounds pointer arithmetic?

Now we know that doing out-of-bounds-pointer-arithmetic has undefined behavior as described in this SO question.
My question is: can we workaround such restriction by casting to std::uintptr_t for arithmetic operations and then cast back to pointer? is that guaranteed to work?
For example:
char a[5];
auto u = reinterpret_cast<std::uintptr_t>(a) - 1;
auto p = reinterpret_cast<char*>(u + 1); // OK?
The real world usage is for optimizing offsetted memory access -- instead of p[n + offset], I want to do offset_p[n].
EDIT To make the question more explicit:
Given a base pointer p of a char array, if p + n is a valid pointer, will reinterpret_cast<char*>(reinterpret_cast<std::uintptr_t>(p) + n) be guaranteed to yield the same valid pointer?
No, uintptr_t cannot be meaningfully used to avoid undefined behavior when performing pointer arithmetic.
For one thing, at least in C there is no guarantee that uintptr_t even exists. The requirement is that any value of type void* may be converted to uintptr_t and back again, yielding the original value without loss of information. In principle, there might not be any unsigned integer type wide enough to hold all pointer values. (I presume the same applies to C++, since C++ inherits most of the C standard library and defines it by reference to the C standard.)
Even if uintptr_t does exist, there is no guarantee that a given arithmetic operation on a uintptr_t value does the same thing as the corresponding operation on a pointer value.
For example, I've worked on systems (Cray vector systems, T90 and SV1) on which byte pointers are implemented in software. A native address is a 64-bit address that refers to a 64-bit word; there is no hardware support for byte addressing. A char* or void* pointer consists of a word pointer with a 3-bit offset stored in the otherwise unused high-order bits. Conversion between integers and pointers simply copies the bits. So incrementing a char* would advance it to point to the next 8-bit byte in memory; incrementing a uintptr_t obtained by converting a char* would advance it to point to the next 64-bit word.
That's just one example. More generally, conversions between pointers and integers are implementation-defined, and the language standard makes no guarantee about the semantics of those conversions (other than, in some cases, converting back to a pointer).
So yes, you can convert a pointer value to uintptr_t (if that type exists) and perform arithmetic on it without risking undefined behavior -- but the result may or may not be meaningful.
It happens that, on most systems, the mapping between pointers and integers is simpler, and you probably can get away with that kind of game. But you're better off using pointer arithmetic directly, and just being very careful to avoid any invalid operations.
Yes, that is legal, but you must reinterpret_cast exactly the same uintptr_t value back to char*.
(Therefore, what it you're intending to do is illegal; that is, converting a different value back to a pointer.)
5.2.10 Reinterpret cast
4 . A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined.
5 . A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted
to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type
will have its original value;
(Note that there'd be no way, in general, for the compiler to know that you subtracted one and then added it back.)

Any type of pointer can point to anything?

Is this statement correct? Can any "TYPE" of pointer can point to any other type?
Because I believe so, still have doubts.
Why are pointers declared for definite types? E.g. int or char?
The one explanation I could get was: if an int type pointer was pointing to a char array, then when the pointer is incremented, the pointer will jump from 0 position to the 2 position, skipping 1 position in between (because int size=2).
And maybe because a pointer just holds the address of a value, not the value itself, i.e. the int or double.
Am I wrong? Was that statement correct?
Pointers may be interchangeable, but are not required to be.
In particular, on some platforms, certain types need to be aligned to certain byte-boundaries.
So while a char may be anywhere in memory, an int may need to be on a 4-byte boundary.
Another important potential difference is with function-pointers.
Pointers to functions may not be interchangeable with pointers to data-types on many platforms.
It bears repeating: This is platform-specific.
I believe Intel x86 architectures treat all pointers the same.
But you may well encounter other platforms where this is not true.
Every pointer is of some specific type. There's a special generic pointer type void* that can point to any object type, but you have to convert a void* to some specific pointer type before you can dereference it. (I'm ignoring function pointer types.)
You can convert a pointer value from one pointer type to another. In most cases, converting a pointer from foo* to bar* and back to foo* will yield the original value -- but that's not actually guaranteed in all cases.
You can cause a pointer of type foo* to point to an object of type bar, but (a) it's usually a bad idea, and (b) in some cases, it may not work (say, if the target types foo and bar have different sizes or alignment requirements).
You can get away with things like:
int n = 42;
char *p = (char*)&n;
which causes p to point to n -- but then *p doesn't give you the value of n, it gives you the value of the first byte of n as a char.
The differing behavior of pointer arithmetic is only part of the reason for having different pointer types. It's mostly about type safety. If you have a pointer of type int*, you can be reasonably sure (unless you've done something unsafe) that it actually points to an int object. And if you try to treat it as an object of a different type, the compiler will likely complain about it.
Basically, we have distinct pointer types for the same reasons we have other distinct types: so we can keep track of what kind of value is stored in each object, with help from the compiler.
(There have been languages that only have untyped generic pointers. In such a language, it's more difficult to avoid type errors, such as storing a value of one type and accidentally accessing it as if it were of another type.)
Any pointer can refer to any location in memory, so technically the statement is correct. With that said, you need to be careful when reinterpreting pointer types.
A pointer basically has two pieces of information: a memory location, and the type it expects to find there. The memory location could be anything. It could be the location where an object or value is stored; it could be in the middle of a string of text; or it could just be an arbitrary block of uninitialised memory.
The type information in a pointer is important though. The array and pointer arithmetic explanation in your question is correct -- if you try to iterate over data in memory using a pointer, then the type needs to be correct, otherwise you may not iterate correctly. This is because different types have different sizes, and may be aligned differently.
The type is also important in terms of how data is handled in your program. For example, if you have an int stored in memory, but you access it by dereferencing a float* pointer, then you'll probably get useless results (unless you've programmed it that way for a specific reason). This is because an int is stored in memory differently from the way a float is stored.
Can any "TYPE" of pointer can point to any other type?
Generally no. The types have to be related.
It is possible to use reinterpret_cast to cast a pointer from one type to another, but unless those pointers can be converted legally using a static_cast, the reinterpret_cast is invalid. Hence you can't do Foo* foo = ...; Bar* bar = (Bar*)foo; unless Foo and Bar are actually related.
You can also use reinterpret_cast to cast from an object pointer to a void* and vice versa, and in that sense a void* can point to anything -- but that's not what you seem to be asking about.
Further you can reinterpret_cast from object pointer to integral value and vice versa, but again, not what you appear to be asking.
Finally, a special exception is made for char*. You can initialize a char* variable with the address of any other type, and perform pointer math on the resulting pointer. You still can't dereference thru the pointer if the thing being pointed to isn't actually a char, but it can then be casted back to the actual type and used that way.
Also keep in mind that every time you use reinterpret_cast in any context, you are dancing on the precipice of a cliff. Dereferencing a pointer to a Foo when the thing it actually points to is a Bar yields Undefined Behavior when the types are not related. You would do well to avoid these types of casts at all costs.
Some pointers are more equal than others...
First of all, not all pointers are necessarily the same thing. Function pointers can be something very different from data pointers, for instance.
Aside: Function pointers on PPC
On the PPC platform, this was quite obvious: A function pointer was actually two pointers under the hood, so there was simply no way to meaningfully cast a function pointer to a data pointer or back. I.e. the following would hold:
int* dataP;
int (*functionP)(int);
assert(sizeof(dataP) == 4);
assert(sizeof(functionP) == 8);
assert(sizeof(dataP) != sizeof(functionP));
//impossible:
//dataP = (int*)functionP; //would loose information
//functionP = (int (*)(int))dataP; //part of the resulting pointer would be garbage
Alignment
Furthermore, there is problems with alignment: Depending on the platform some data types may need to be aligned in memory. This is especially common with vector data types, but could apply to any type larger than a byte. For instance, if an int must be 4 byte aligned, the following code might crash:
char a[4];
int* alias = (int*)a;
//int foo = *alias; //may crash because alias is not aligned properly
This is not an issue if the pointer comes from a malloc() call, as that is guaranteed to return sufficiently aligned pointers for all types:
char* a = malloc(sizeof(int));
int* alias = (int*)a;
*alias = 0; //perfectly legal, the pointer is aligned
Strict aliasing and type punning
Finally, there are strict aliasing rules: You must not access an object of one type through a pointer to another type. Type punning is forbidden:
assert(sizeof(float) == sizeof(uint32_t));
float foo = 42;
//uint32_t bits = *(uint32_t*)&foo; //type punning is illegal
If you absolutely must reinterpret a bit pattern as another type, you must use memcpy():
assert(sizeof(float) == sizeof(uint32_t));
float foo = 42;
uint32_t bits;
memcpy(&bits, &foo, sizeof(bits)); //bit pattern reinterpretation is legal when copying the data
To allow memcpy() and friends to actually be implementable, the C/C++ language standards provide for an exception for char types: You can cast any pointer to a char*, copy the char data over to another buffer, and then access that other buffer as some other type. The results are implementation defined, but the standards allow it. Use cases are mostly general data manipulation routines like I/O, etc.
TL;DR:
Pointers are much less interchangeable than you think. Don't reinterpret pointers in any other way than to/from char* (check alignment in the "from" case). And even that does not work for function pointers.