How to determine if a variable is a pointer? - c++

I was reading the Wolfenstein 3D code, and I encountered ISPOINTER macro:
#define ISPOINTER(x) ((((uintptr_t)(x)) & ~0xffff) != 0)
I know we have std::is_pointer, but how does this macro work? I tried and failed with strange behavior which I couldn't explained why it's happend:
#define ISPOINTER(x) ((((uintptr_t)(x)) & ~0xffff) != 0)
int main()
{
int* ptr;
int val;
if (ISPOINTER(ptr)) {
std::cout << "`ptr`: Is Pointer" << std::endl;
}
if (ISPOINTER(val)) {
std::cout << "`val`: Is Pointer" << std::endl;
}
}
I don't have any output, but if I add another pointer:
#define ISPOINTER(x) ((((uintptr_t)(x)) & ~0xffff) != 0)
int main()
{
int* ptr;
int val;
int* ptr2;
if (ISPOINTER(ptr)) {
std::cout << "`ptr`: Is Pointer" << std::endl;
}
if (ISPOINTER(val)) {
std::cout << "`val`: Is Pointer" << std::endl;
}
if (ISPOINTER(ptr2)) {
std::cout << "`ptr2`: Is Pointer" << std::endl;
}
}
The output will be:
`ptr`: Is Pointer
What does ISPOINTER doing? It's undefined behavior?

Let's do this in steps:
((uintptr_t)(x)) is simply a cast from whatever x is into a uintptr_t (an unsigned integer type capable of storing pointer values)
~0xffff is a bit-wise complement of 0xffff (which is 16 bits of all 1s). The result of that is a number that is all 1s except the last 16 bits.
((uintptr_t)(x)) & ~0xffff is a bit-wise AND of the pointer value with the above number. This will effectively zero-out the 16 lowest bits of whatever the pointer value is.
The full expression now just checks if the result is zero or not. So the whole expression basically checks if any bits except the least-significant 16 are set and if so it considers it a pointer.
Since this came from Wolfenstein 3D, they probably made the assumption that all dynamically allocated memory lives in high memory addresses (higher than 2^16). So this is NOT a check if a type is a pointer or not using the type system like std::is_pointer does. This is an assumption based on the target architecture Wolfenstein 3D will likely run on.
Keep in mind that this is not a safe assumption, since "normal" values above 2^16 would also be considered pointers and the memory layout of your process can be very different depending on a lof of factors (e.g. ASLR)

It might be similar to certain parameters in Windows, which can be an ordinal value or a pointer. An ordinal value will be <64K. On that operating system, any legal pointer will be >64K. (On older 16-bit versions of Windows, only 4K was reserved.)
This code may be doing the same thing. A "resource" may be a built-in or registered value referred to by a small number, or a pointer to an ad-hoc object. This macro is used to decide whether to use it as-is or look it up in the table instead.

Related

Why is this pointer null

In Visual Studio, it seems like pointer to member variables are 32 bit signed integers behind the scenes (even in 64 bit mode), and a null-pointer is -1 in that context. So if I have a class like:
#include <iostream>
#include <cstdint>
struct Foo
{
char arr1[INT_MAX];
char arr2[INT_MAX];
char ch1;
char ch2;
};
int main()
{
auto p = &Foo::ch2;
std::cout << (p?"Not null":"null") << '\n';
}
It compiles, and prints "null". So, am I causing some kind of undefined behavior, or was the compiler supposed to reject this code and this is a bug in the compiler?
Edit:
It appears that I can keep the "2 INT_MAX arrays plus 2 chars" pattern and only in that case the compiler allows me to add as many members as I wish and the second character is always considered to be null. See demo. If I changed the pattern slightly (like 1 or 3 chars instead of 2 at some point) it complains that the class is too large.
The size limit of an object is implementation defined, per Annex B of the standard [1]. Your struct is of an absurd size.
If the struct is:
struct Foo
{
char arr1[INT_MAX];
//char arr2[INT_MAX];
char ch1;
char ch2;
};
... the size of your struct in a relatively recent version of 64-bit MSVC appears to be around 2147483649 bytes. If you then add in arr2, suddenly sizeof will tell you that Foo is of size 1.
The C++ standard (Annex B) states that the compiler must document limitations, which MSVC does [2]. It states that it follows the recommended limit. Annex B, Section 2.17 provides a recommended limit of 262144(?) for the size of an object. While it's clear that MSVC can handle more than that, it documents that it follows that minimum recommendation so I'd assume you should take care when your object size is more than that.
[1] http://eel.is/c++draft/implimits
[2] https://learn.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019
It's clearly a collision between an optimization on pointer-to-member representation (use only 4 bytes of storage when no virtual bases are present) and the pigeonhole principle.
For a type X containing N subobjects of type char, there are N+1 possible valid pointer-to-members of type char X::*... one for each subobject, and one for null-pointer-to-member.
This works when there are at least N+1 distinct values in the pointer-to-member representation, which for a 4-byte representation implies that N+1 <= 232 and therefore the maximum object size is 232 - 1.
Unfortunately the compiler in question made the maximum object-type size (before it rejects the program) equal to 232 which is one too large and creates a pigeonhole problem -- at least one pair of pointer-to-members must be indistinguishable. It's not necessary that the null pointer-to-member be one half of this pair, but as you've observed in this implementation it is.
The expression &Foo::ch2 is of type char Foo::*, which is pointer to member of class Foo. By rules, a pointer to member converted to bool should be evaluated as false ONLY if it is a null pointer, i.e. it had nullptr assigned to it.
The fault here appears to be a implementation's flaw. i.e. on gcc compilers with -march=x86-64 any assigned pointer to member evaluates to non-null (1) unless it had nullptr assigned to it with following code:
struct foo
{
char arr1[LLONG_MAX];
char arr2[LLONG_MAX];
char ch1;
char ch2;
};
int main()
{
char foo::* p1 = &foo::ch1;
char foo::* p2 = &foo::ch2;
std::cout << (p1?"Not null ":"null ") << '\n';
std::cout << (p2?"Not null ":"null ") << '\n';
std::cout << LLONG_MAX + LLONG_MAX << '\n';
std::cout << ULLONG_MAX << '\n';
std::cout << offsetof(foo, ch1) << '\n';
}
Output:
Not null
null
-2
18446744073709551615
18446744073709551614
Likely it's related to fact that class size is exceeding platform limitations, leading to offset of member being wrapped around of 0 (internal value of nullptr). Compiler doesn't detect it because it becomes a victim of... integer overflow with signed value and it's programmer's fault to cause UB within compiler by using signed literals as array size: LLONG_MAX + LLONG_MAX = -2 would be "size" of two arrays combined.
Essentially size of first two members is calculated as negative and offset of ch1 is -2 represented as unsigned 18446744073709551614.
And -2 therefore pointer is not null. Another compiler may clamp value to 0 producing a nullptr, or actually detect existing problem as clang does.
If offset of ch1 is -2, then offset of ch2 is -1? Let's add this:
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch1)) << '\n';
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch2)) << '\n';
Additional output:
-2
-1
And offset for first member is obviously 0 and if pointer represent offsets, then it needs another value to represent nullptr. it's logical to assume that this particular compiler considers only -1 to be a null value, which may or may not be case for other implementations.
When I test the code, VS shows that Foo: the class is too large.
When I add char arr3[INT_MAX], Visual Studio will report Error C2089 'Foo': 'struct' too large. Microsoft Docs explains it as The specified structure or union exceeds the 4GB limit.

Hexadecimals addition end result is always wrong? C++

When I want to calculate the address of a function, I do the following:
HMODULE base = GetModuleHandle(L"program.exe"); // Module base addr
// Adding an offset to the base
std::cout << (base + 0x8F0A0) << std::endl; -> Wrong!
I'm not sure why the result is wrong. I've tested it via online hex calcs and also have debugger to check both values.
Could base be considered decimal and other being hex, produce wrong results?
How can I get a result in hex?
As explained here, depending on whether STRICT is defined, HMODULE is essentially either a void* or a <unique type>*, the purpose of this being to make each handle type a different C++ type, meaning compiler errors when you mix and match. In the former case, pointer arithmetic won't compile. In the latter case, it will compile, but you can't rely on anything happening because pointer arithmetic takes the type's size into account and because pointer arithmetic is undefined if you leave the object/array being pointed to.
You should treat this pointer as pointing to nothing in particular, and therefore not do pointer arithmetic. You have to reinterpret_cast it to an integral type that you're sure is large enough (std::uintptr_t) and then do arithmetic on that integral value.
In my local header, this unique type contains an int member, so adding 1 will actually move the pointer ahead by 4 bytes (you know, except for the undefined behaviour and all). It just so happens that 0x00DE0000 + 4 * 0x8F0A0 is your 0x0101C280 value.
You're problem lies with the value GetModuleHandle(L"program.exe") returning: 00DE0000. You need to utilise C hexadecimal syntax, so you need to add and precede "0x" to your hex number 00DE0000.
Hence, your base number should be casted to a numeric value: 0x00DE0000
0x00DE0000 is equal to 00DE0000
Try using std::string to_string(int value); to convert it to string, then convert your hex values (base) to C hexadecimal syntax (add "0x" at the beginning of your hex value). To finish off, convert your base value back to a numeric value (e.g. use std::stoi) and perform the addition using std::hex.
Try this code here.
#include <iostream>
int main () {
int hex1 = 0x8F0A0;
int hex2 = 0x00DE0000; // Using int values
std::cout << std::hex << hex1 + hex2 << std::endl;
}
As Chris has said, I had the same case, solving the thing like this:
int offset = 0x8F0A0;
std::uintptr_t base = reinterpret_cast<uintptr_t>(GetModuleHandle(L"program.exe"));
// Here added 4 bytes to the offset.
std::cout << std::hex << (base + (offset + 4096)) << std::endl;

Why do std::hex and void* on an integer result in the same value

For example in C++:
int number = 10;
cout << std::hex << number << endl;
cout << (void *)(int)number << endl;
Why do the two outputs have the same hex value?
What does the (void *)(int)number really mean here?
This is, I believe, implementation defined behavior.
As a practical matter:
Formatted output of pointers tends to give the address they point to.
At the level of machine code, pointers tend to be stored as a 32-bit or 64-bit integer representing a memory address.
The cast tends to be implemented by simply reinterpreting the machine language integer as representing a pointer.

Type casting struct to integer and vice versa in C++

So, I've seen this thread Type casting struct to integer c++ about how to cast between integers and structs (bitfields) and undoubtly, writing a proper conversion function or overloading the relevant casting operators is the way to go for any cases where there is an operating system involved.
However, when writing firmware for a small embedded system where only one flash image is run, the case might be different insofar, as security isn't so much of a concern while performance is.
Since I can test whether the code works properly (meaning the bits of a bitfield are arranged the way I would expect them to be) each time when compiling my code, the answer might be different here.
So, my question is, whether there is a 'proper' way to convert between bitfield and unsigned int that does compile to no operations in g++ (maybe shifts will get optimised away when the compiler knows the bits are arranged correctly in memory).
This is an excerpt from the original question:
struct {
int part1 : 10;
int part2 : 6;
int part3 : 16;
} word;
I can then set part2 to be equal to whatever value is requested, and set the other parts as 0.
word.part1 = 0;
word.part2 = 9;
word.part3 = 0;
I now want to take that struct, and convert it into a single 32 bit integer. I do have it compiling by forcing the casting, but it does not seem like a very elegant or secure way of converting the data.
int x = *reinterpret_cast<int*>(&word);
EDIT:
Now, quite some time later, I have learned some things:
1) Type punning (changing the interpretation of data) by means of pointer casting is, undefined behaviour since C99 and C++98. These language changes introduced strict aliasing rules (They allow the compiler to reason that data is only accessed through pointers of compatible type) to allow for better optimisations. In effect, the compiler will not need to keep the ordering between accesses (or do the off-type access at all). For most cases, this does not seem to present a [immediate] problem, but when using higher optimisation settings (for gcc that is -O which includes -fstrict-aliasing) this will become a problem.
For examples see https://blog.regehr.org/archives/959
2) Using unions for type punning also seems to involve undefined behaviour in C++ but not C (See https://stackoverflow.com/a/25672839/4360539), however GCC (and probably others) does explicitly allow it: (See https://gcc.gnu.org/bugs/#nonbugs).
3) The only really reliable way of doing type punning in C++ seems to be using memcpy to copy the data to a new location and perform whatever is to be done and then to use another memcpy to return the changes. I did read somewhere on SO, that GCC (or most compilers probably) should be able to optimise the memcpy to a mere register copy for register-sized data types, but I cannot find it again.
So probably the best thing to do here is to use the union if you can be sure the code is compiled by a compiler supporting type punning through a union. For the other cases, further investigation would be needed how the compiler treats bigger data structures and memcpy and if this really involves copying back and forth, probably sticking with bitwise operations is the best idea.
union {
struct {
int part1: 10;
int part2: 6;
int part3: 16;
} parts;
int whole;
} word;
Then just use word.whole.
I had the same problem. I am guessing this is not very relevant today. But this is how I solved it:
#include <iostream>
struct PACKED{
int x:10;
int y:10;
int z:12;
PACKED operator=(int num )
{
*( int* )this = num;
return *this;
}
operator int()
{
int *x;
x = (int*)this;
return *x;
}
} __attribute__((packed));
int main(void) {
std::cout << "size: " << sizeof(PACKED) << std::endl;
PACKED bf;
bf = 0xFFF00000;
std::cout << "Values ( x, y, z ) = " << bf.x << " " << bf.y << " " << bf.z << std::endl;
int testint;
testint = bf;
std::cout << "As integer: " << testint << std::endl;
return 0;
}
This now fits on a int, and is assignable by standard ints. However I do not know how portable this solution is. The output of this is then:
size: 4
Values ( x, y, z ) = 0 0 -1
As integer: -1048576

Data member offset in C++

"Inside the C++ Object Model" says that the offset of a data member in a class is always 1 more than the actual offset in order to distinguish between the pointer to 0 and the pointer to the first data member, here is the example:
class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
//to be used after, ignore it for the first question
int main(void) {
/*cout << "&Point3d::x = " << &Point3d::x << endl;
cout << "&Point3d::y = " << &Point3d::y << endl;
cout << "&Point3d::z = " << &Point3d::z << endl;*/
printf("&Point3d::x = %p\n", &Point3d::x);
printf("&Point3d::y = %p\n", &Point3d::y);
printf("&Point3d::z = %p\n", &Point3d::z);
getchar();
}
So in order to distinguish the two pointers below, the offset of a data member is always 1 more.
float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;
The main function above is attempt to get the offset of the members to verify this argument, which is supposed to output: 5, 9, 13(Consider the vptr of 4bytes at the beginning). In MS Visual Studio 2012 however, the output is:
&Point3d::x = 00000004
&Point3d::y = 00000008
&Point3d::z = 0000000C
Question: So is MS C++ compiler did some optimization or something to prevent this mechanism?
tl;dr
Inside the C++ Object Model is a very old book, and most of its contents are implementation details of a particular compiler anyway. Don't worry about comparing your compiler to some ancient compiler.
Full version
An answer to the question linked to in a comment on this question addresses this quite well.
The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.
[...]
Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.
And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.
The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit.
[...]
The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.
The language doesn't specify how member-pointers are represented, so anything you read in a book will just be an example of how they might be represented.
In this case, as you say, it sounds like the vptr occupies the first four bytes of the object; again, this is not something specified by the language. If that is the case, no accessible members would have an offset of zero, so there's no need to adjust the offsets to avoid zero; a member-pointer could simply be represented by the member's offset, with zero reserved for "null". It sounds like that is what your compiler does.
You may find that the offsets for non-polymorphic types are adjusted as you describe; or you may find that the representation of "null" is not zero. Either would be valid.
class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
Since your class contains a virtual destructor, and (most of) the compiler(s) typically puts a pointer to the virtual function table as the first element in the object, it makes sense that the first of your data is at offset 4 (I'm guessing your compiler is a 32-bit compiler).
Note however that the C++ standard does not stipulate how data members should be stored inside the class, and even less how much space, if any, the virtual function table should take up.
[And yes, it's invalid (undefined behaviour) to take the address of a element that is not to a "real" member object, but I don't think this is causing an issue in this particular example - it may with a different compiler or on a different processor architecture, etc]
Unless you specify a different alignment, your expectation of the offset bing 5, ... would be wwong anyway. Normaly the adresses of bigger elements than char are usually aligned on even adresses and I guess even to the next 4-byte boundary. The reason is efficiency of accessing the memory in the CPU.
On some architectures, accessing an odd address could cause an exception (i.e. Motorola 68000), depending on the member, or at least a performance slowdown.
While it's true that the a null pointer of type "pointer to member of a given type" must be different from any non-null value of that type, offsetting non-null pointers by one is not the only way that a compiler can ensure this. For example, my compiler uses a non-zero representation of null pointer-to-members.
namespace {
struct a {
int x, y;
};
}
#include <iostream>
int main() {
int a::*p = &a::x, a::*q = &a::y, a::*r = nullptr;
std::cout << "sizeof(int a::*) = " << sizeof(int a::*)
<< ", sizeof(unsigned long) = " << sizeof(long);
std::cout << "\n&a::x = " << *reinterpret_cast<long*>(&p)
<< "\n&a::y = " << *reinterpret_cast<long*>(&q)
<< "\nnullptr = " << *reinterpret_cast<long*>(&r)
<< '\n';
}
Produces the following output:
sizeof(int a::*) = 8, sizeof(unsigned long) = 8
&a::x = 0
&a::y = 4
nullptr = -1
Your compiler is probably doing something similar, if not identical. This scheme is probably more efficient for most 'normal' use cases for the implementation because it won't have to do an extra "subtract 1" every time you use a non-null pointer-to-member.
That book (available at this link) should make it much clearer that it is just describing a particular implementation of a C++ compiler. Details like the one you mention are not part of the C++ language specification -- it's just how Stanley B. Lippman and his colleagues decided to implement a particular feature. Other compilers are free to do things a different way.