Why is this pointer null - c++

In Visual Studio, it seems like pointer to member variables are 32 bit signed integers behind the scenes (even in 64 bit mode), and a null-pointer is -1 in that context. So if I have a class like:
#include <iostream>
#include <cstdint>
struct Foo
{
char arr1[INT_MAX];
char arr2[INT_MAX];
char ch1;
char ch2;
};
int main()
{
auto p = &Foo::ch2;
std::cout << (p?"Not null":"null") << '\n';
}
It compiles, and prints "null". So, am I causing some kind of undefined behavior, or was the compiler supposed to reject this code and this is a bug in the compiler?
Edit:
It appears that I can keep the "2 INT_MAX arrays plus 2 chars" pattern and only in that case the compiler allows me to add as many members as I wish and the second character is always considered to be null. See demo. If I changed the pattern slightly (like 1 or 3 chars instead of 2 at some point) it complains that the class is too large.

The size limit of an object is implementation defined, per Annex B of the standard [1]. Your struct is of an absurd size.
If the struct is:
struct Foo
{
char arr1[INT_MAX];
//char arr2[INT_MAX];
char ch1;
char ch2;
};
... the size of your struct in a relatively recent version of 64-bit MSVC appears to be around 2147483649 bytes. If you then add in arr2, suddenly sizeof will tell you that Foo is of size 1.
The C++ standard (Annex B) states that the compiler must document limitations, which MSVC does [2]. It states that it follows the recommended limit. Annex B, Section 2.17 provides a recommended limit of 262144(?) for the size of an object. While it's clear that MSVC can handle more than that, it documents that it follows that minimum recommendation so I'd assume you should take care when your object size is more than that.
[1] http://eel.is/c++draft/implimits
[2] https://learn.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019

It's clearly a collision between an optimization on pointer-to-member representation (use only 4 bytes of storage when no virtual bases are present) and the pigeonhole principle.
For a type X containing N subobjects of type char, there are N+1 possible valid pointer-to-members of type char X::*... one for each subobject, and one for null-pointer-to-member.
This works when there are at least N+1 distinct values in the pointer-to-member representation, which for a 4-byte representation implies that N+1 <= 232 and therefore the maximum object size is 232 - 1.
Unfortunately the compiler in question made the maximum object-type size (before it rejects the program) equal to 232 which is one too large and creates a pigeonhole problem -- at least one pair of pointer-to-members must be indistinguishable. It's not necessary that the null pointer-to-member be one half of this pair, but as you've observed in this implementation it is.

The expression &Foo::ch2 is of type char Foo::*, which is pointer to member of class Foo. By rules, a pointer to member converted to bool should be evaluated as false ONLY if it is a null pointer, i.e. it had nullptr assigned to it.
The fault here appears to be a implementation's flaw. i.e. on gcc compilers with -march=x86-64 any assigned pointer to member evaluates to non-null (1) unless it had nullptr assigned to it with following code:
struct foo
{
char arr1[LLONG_MAX];
char arr2[LLONG_MAX];
char ch1;
char ch2;
};
int main()
{
char foo::* p1 = &foo::ch1;
char foo::* p2 = &foo::ch2;
std::cout << (p1?"Not null ":"null ") << '\n';
std::cout << (p2?"Not null ":"null ") << '\n';
std::cout << LLONG_MAX + LLONG_MAX << '\n';
std::cout << ULLONG_MAX << '\n';
std::cout << offsetof(foo, ch1) << '\n';
}
Output:
Not null
null
-2
18446744073709551615
18446744073709551614
Likely it's related to fact that class size is exceeding platform limitations, leading to offset of member being wrapped around of 0 (internal value of nullptr). Compiler doesn't detect it because it becomes a victim of... integer overflow with signed value and it's programmer's fault to cause UB within compiler by using signed literals as array size: LLONG_MAX + LLONG_MAX = -2 would be "size" of two arrays combined.
Essentially size of first two members is calculated as negative and offset of ch1 is -2 represented as unsigned 18446744073709551614.
And -2 therefore pointer is not null. Another compiler may clamp value to 0 producing a nullptr, or actually detect existing problem as clang does.
If offset of ch1 is -2, then offset of ch2 is -1? Let's add this:
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch1)) << '\n';
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch2)) << '\n';
Additional output:
-2
-1
And offset for first member is obviously 0 and if pointer represent offsets, then it needs another value to represent nullptr. it's logical to assume that this particular compiler considers only -1 to be a null value, which may or may not be case for other implementations.

When I test the code, VS shows that Foo: the class is too large.
When I add char arr3[INT_MAX], Visual Studio will report Error C2089 'Foo': 'struct' too large. Microsoft Docs explains it as The specified structure or union exceeds the 4GB limit.

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".
It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;
There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

Hexadecimals addition end result is always wrong? C++

When I want to calculate the address of a function, I do the following:
HMODULE base = GetModuleHandle(L"program.exe"); // Module base addr
// Adding an offset to the base
std::cout << (base + 0x8F0A0) << std::endl; -> Wrong!
I'm not sure why the result is wrong. I've tested it via online hex calcs and also have debugger to check both values.
Could base be considered decimal and other being hex, produce wrong results?
How can I get a result in hex?
As explained here, depending on whether STRICT is defined, HMODULE is essentially either a void* or a <unique type>*, the purpose of this being to make each handle type a different C++ type, meaning compiler errors when you mix and match. In the former case, pointer arithmetic won't compile. In the latter case, it will compile, but you can't rely on anything happening because pointer arithmetic takes the type's size into account and because pointer arithmetic is undefined if you leave the object/array being pointed to.
You should treat this pointer as pointing to nothing in particular, and therefore not do pointer arithmetic. You have to reinterpret_cast it to an integral type that you're sure is large enough (std::uintptr_t) and then do arithmetic on that integral value.
In my local header, this unique type contains an int member, so adding 1 will actually move the pointer ahead by 4 bytes (you know, except for the undefined behaviour and all). It just so happens that 0x00DE0000 + 4 * 0x8F0A0 is your 0x0101C280 value.
You're problem lies with the value GetModuleHandle(L"program.exe") returning: 00DE0000. You need to utilise C hexadecimal syntax, so you need to add and precede "0x" to your hex number 00DE0000.
Hence, your base number should be casted to a numeric value: 0x00DE0000
0x00DE0000 is equal to 00DE0000
Try using std::string to_string(int value); to convert it to string, then convert your hex values (base) to C hexadecimal syntax (add "0x" at the beginning of your hex value). To finish off, convert your base value back to a numeric value (e.g. use std::stoi) and perform the addition using std::hex.
Try this code here.
#include <iostream>
int main () {
int hex1 = 0x8F0A0;
int hex2 = 0x00DE0000; // Using int values
std::cout << std::hex << hex1 + hex2 << std::endl;
}
As Chris has said, I had the same case, solving the thing like this:
int offset = 0x8F0A0;
std::uintptr_t base = reinterpret_cast<uintptr_t>(GetModuleHandle(L"program.exe"));
// Here added 4 bytes to the offset.
std::cout << std::hex << (base + (offset + 4096)) << std::endl;

Char to Int Pointer Casting Not Working

I'm confusing about casting of char to int pointer. I'm checking how pointer's casting works and the below code int to char is working fine.
#include <iostream>
using namespace std;
int main(){
int a=65;
void *p=&a;
cout << *static_cast<char*>(p);
}
Output
A
But when I try to cast from char to int it's not showing correct value.
#include <iostream>
using namespace std;
int main(){
char a='A';
void *p=&a;
cout << *static_cast<int*>(p);
}
What is the problem in the above code? Output is about garbage value.
First, you have to understand that the x86 architecture is what is called little-endian. This means that in multibyte variables, the bytes are ordered in memory from least to most significant. If you don't understand what that means, it'll become clear in a second.
A char is 8 bits -- one byte. When you store 'A' into one, it gets the value 0x41 and is happy. An int is larger; on many architectures it is 32 bits -- 4 bytes. When you assign the value 'A' to an int, it gets the value 0x00000041. This is numerically exactly the same, but there are three extra bytes of zeros in the int.
So your int contains 0x00000041. In memory, that is arranged in bytes, and because you're on a little-endian architecture, those bytes are arranged from least to most significant -- the opposite of how we normally write them! The memory actually looks like this:
+----+----+----+----+
int: | 41 | 00 | 00 | 00 |
+----+----+----+----+
+----+
char: | 41 |
+----+
When you take a pointer to the int and cast it to a char*, and then dereference it, the compiler will take the first byte of the int -- because chars are only one byte wide -- and print it out. The other three bytes get ignored! Now look back and notice that if the order of the bytes in the int were reversed, as on a big-endian architecture, you would have retrieved the value zero instead! So the behavior of this code -- the fact that the cast from int* to char* worked as you expected -- was strictly dependent on the machine you were running it on.
On the other hand, when you take a pointer to the char and cast it to an int*, and then defererence it, the compiler will grab the one byte in the char as you'd expect, but then it will also read three more bytes past it, because ints are four bytes wide! What is in those three bytes? You don't know! Your memory looks like this:
+----+
char: | 41 |
+----+
+----+----+----+----+
int: | 41 | ?? | ?? | ?? |
+----+----+----+----+
You get a garbage value in your int because you're reading memory that is uninitialized. On a different platform or under a different planetary alignment, your code might work perfectly fine, or it might segfault and crash. There's no telling. This is what is known as undefined behavior, and it is a dangerous game that we play with our compilers. We have to be very careful when working with memory on like this; there's nothing scarier than nondeterministic code.
You can safely represent anything as an array of char. It doesn't work the other way. This is part of the STRICT ALIASING rule.
You can read up on strict aliasing in other questions:
What is the strict aliasing rule?
More closely related to your question:
Once again: strict aliasing rule and char*
Quoting the answer given here: What is the strict aliasing rule?
[...] dereferencing a pointer that aliases another of an incompatible type is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.
Also related to your question: Once again: strict aliasing rule and char*
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
(I must give credit for this second link to #Let_Me_Be)
Here when you are doing:
cout << *static_cast<int*>(p);
you are actually saying that p is pointing to an integer (represented by 4 bytes in memory) but you just written a char in it before (represented by 1 bytes in memory) so when you cast it to an integer you expanded your variable to 3 garbage bytes.
But if you cast it back to a char you will get your 'A' because you are slicing your int to a char:
cout << (char) *static_cast<int*>(p);
Otherwise if you just want the ASCII value, cast your void* to an char* (so when you dereference it you are only accessing 1 byte) and cast what is inside it to int.
char a = 'A';
void *p=&a;
cout << static_cast<int>(*((char*)p));
The fact is that static cast is able to understand that you want to cast an char to int (and get his ASCII value) but when asking a char* to int* he just change the number of bytes read when you dereference it.
According to Standards, casting a char (or multiple chars) to int is undefined behavior and therefore any result is allowed. Most compilers will try to do what makes sense, and so the following is a likely reason for the behavior you are seeing on your specific architecture:
Assuming a 32 bit int, an int is the same size as 4 chars
Different architectures will treat those four bytes differently to translate their value to an int, most commonly this is either little endian or big endian
Looking at:
[Byte1][Byte2][Byte3][Byte4]
The int value would either be:
(Little Endina) Byte1+Byte2*256+Byte3*256^2+Byte4*256^3
(Big Endian ) Byte4+Byte3*256+Byte2*256^2+Byte1*256^3
In your case either Byte1 or Byte4 is being set, the remaining bytes are whatever happens to be in memory since you are only reserving one byte where you need 4
Try the following:
int main(){
char a[4]={'A', 0, 0, 0};
void *p=a;
cout << *static_cast<int*>(p);
}
You may have to switch the initialization to {0,0,0, 'A'} to get what you want based on architecture
As noted, this is undefined behavior, but should work with most compilers and give you a better idea of what is going on under the hood
Consider following code:
#include <iostream>
#include <iomanip>
using namespace std;
int main(){
{
int a=65;
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << setfill('0') << setw(2 * sizeof(int)) << *static_cast<int*>(p) << "\n";
}
{
char a='A';
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << *static_cast<int*>(p) << "\n";
}
}
There is indeed 'A' character code (0x41) in the output, but its padded to the size of int with uninitialized values. You can see it, when you output the hexadecimal values of variables.

Size of references in 64bit environments

Came across this one while browsing the response to another question on SO (References Vs Variable Gets).
My question is that for all 64bit environments is it guaranteed that a reference to a variable will be of 64 bits even if the original had a lesser size? As in char references in 64bit environment would be >sizeof(char)? Is there any section in the standard which specifies this explicitly?
EDIT: For more clarity --
char c1 = 'a';
char& c2 = c1;
My question is sizeof(c2) > sizeof(c1) in 64bit machines?
The Standard (ISO C++-03) says the following thing about references
It is unspecified whether or not a reference requires storage (3.7).
Please someone correct me if I am wrong or if I have not understood his question correctly.
EDIT:
My question is sizeof(c2) > sizeof(c1) in 64bit machines?
No, as #Chubsdad noticed sizeof(c2) = sizeof (c1), the relevant quote from the Standard is
When applied to a reference or a reference type, the result is the size of the referenced type. (ISO C++ $5.3.3/2)
$8.3.2/3 - It is unspecified whether or not a reference requires storage.
sizeof applied to references is basically the size of the referrand.
So if 'r' is a integer reference to 'i', it is unspecified if there is an actual storage for 'r'. However sizeof(r) internally stands for sizeof(i).
If 'r' is a reference to a 'char', the sizeof(r) will be always sizeof(char) == 1 by definition.
Although sizeof(ref_var) returns the size of the referenced object, space is still required to store a reference in a structure, for instance, and in common implementations the space allocated to store a reference is the same as the space allocated to store a pointer. That may not be required by the standard, but this code at least shows the effect:
#include <iostream>
using namespace std;
char c1 = 'a';
char &c2 = c1;
struct x
{
char c1;
char c2;
char c3;
char c4;
int i4a;
char &r1;
int i4b;
int i4c;
x() : r1(c1) { }
};
struct y
{
char c1;
char c2;
char c3;
char c4;
int i4a;
int i4b;
int i4c;
};
int main()
{
cout << sizeof(c2) << endl;
cout << sizeof(y) << endl;
cout << sizeof(x) << endl;
return 0;
}
I make no pretense that it is 'great code' - it isn't - but it demonstrates a point. Compiled on MacOS X 10.6.4 with the C++ compiler from the GNU Compiler Collection (GCC 4.5.1) in default (64-bit) mode, the output is:
1
16
24
When compiled in 32-bit mode, the output is:
1
16
20
The first line of output demonstrates that 'sizeof(ref_var)' does indeed return the size of the referenced object. The second line shows that a structure with no reference in it has a size of 16 bytes. The third line shows that a very similar structure with a reference embedded in it at an 8-byte boundary (on a system where sizeof(int) == 4) is 8 bytes larger than the simpler structure under a 64-bit compilation and 4 bytes larger under a 32-bit compilation. By inference, the reference part of the structure occupies more than 4 bytes and not more than 8 bytes under the 64-bit compilation, and occupies not more than 4 bytes under the 32-bit compilation. This suggests that (in at least one popular implementation of C++) that a reference in a structure occupies the same amount of space as a pointer - as asserted in some of the other answers.
So, it may be implementation dependent, but the comment that a reference occupies the same space as a pointer holds true in at least one (rather widely used) implementation.

What does sizeof do?

What is the main function of sizeof (I am new to C++). For instance
int k=7;
char t='Z';
What do sizeof (k) or sizeof (int) and sizeof (char) mean?
sizeof(x) returns the amount of memory (in bytes) that the variable or type x occupies. It has nothing to do with the value of the variable.
For example, if you have an array of some arbitrary type T then the distance between elements of that array is exactly sizeof(T).
int a[10];
assert(&(a[0]) + sizeof(int) == &(a[1]));
When used on a variable, it is equivalent to using it on the type of that variable:
T x;
assert(sizeof(T) == sizeof(x));
As a rule-of-thumb, it is best to use the variable name where possible, just in case the type changes:
int x;
std::cout << "x uses " << sizeof(x) << " bytes." << std::endl
// If x is changed to a char, then the statement doesn't need to be changed.
// If we used sizeof(int) instead, we would need to change 2 lines of code
// instead of one.
When used on user-defined types, sizeof still returns the amount of memory used by instances of that type, but it's worth pointing out that this does not necessary equal the sum of its members.
struct Foo { int a; char b; };
While sizeof(int) + sizeof(char) is typically 5, on many machines, sizeof(Foo) may be 8 because the compiler needs to pad out the structure so that it lies on 4 byte boundaries. This is not always the case, and it's quite possible that on your machine sizeof(Foo) will be 5, but you can't depend on it.
To add to Peter Alexander's answer: sizeof yields the size of a value or type in multiples of the size of a char---char being defined as the smallest unit of memory addressable (by C or C++) for a given architecture (and, in C++ at least, at least 8 bits in size according to the standard). This is what's generally meant by "bytes" (smallest addressable unit for a given architecture) but it never hurts to clarify, and there are occasionally questions about the variability of sizeof (char), which is of course always 1.
sizeof() returns the size of the argument passed to it.
sizeof() cpp reference
sizeof is a compile time unary operator that returns size of data type.
For example:
sizeof(int)
will return the size of int in byte.
Also remember that type sizes are platform dependent.
Check this page for more details: sizeof in C/C++