Size of references in 64bit environments

Size of references in 64bit environments - c++

Came across this one while browsing the response to another question on SO (References Vs Variable Gets).
My question is that for all 64bit environments is it guaranteed that a reference to a variable will be of 64 bits even if the original had a lesser size? As in char references in 64bit environment would be >sizeof(char)? Is there any section in the standard which specifies this explicitly?
EDIT: For more clarity --
char c1 = 'a';
char& c2 = c1;
My question is sizeof(c2) > sizeof(c1) in 64bit machines?

The Standard (ISO C++-03) says the following thing about references
It is unspecified whether or not a reference requires storage (3.7).
Please someone correct me if I am wrong or if I have not understood his question correctly.
EDIT:
My question is sizeof(c2) > sizeof(c1) in 64bit machines?
No, as #Chubsdad noticed sizeof(c2) = sizeof (c1), the relevant quote from the Standard is
When applied to a reference or a reference type, the result is the size of the referenced type. (ISO C++ $5.3.3/2)

$8.3.2/3 - It is unspecified whether or not a reference requires storage.
sizeof applied to references is basically the size of the referrand.
So if 'r' is a integer reference to 'i', it is unspecified if there is an actual storage for 'r'. However sizeof(r) internally stands for sizeof(i).
If 'r' is a reference to a 'char', the sizeof(r) will be always sizeof(char) == 1 by definition.

Although sizeof(ref_var) returns the size of the referenced object, space is still required to store a reference in a structure, for instance, and in common implementations the space allocated to store a reference is the same as the space allocated to store a pointer. That may not be required by the standard, but this code at least shows the effect:
#include <iostream>
using namespace std;
char c1 = 'a';
char &c2 = c1;
struct x
{
char c1;
char c2;
char c3;
char c4;
int i4a;
char &r1;
int i4b;
int i4c;
x() : r1(c1) { }
};
struct y
{
char c1;
char c2;
char c3;
char c4;
int i4a;
int i4b;
int i4c;
};
int main()
{
cout << sizeof(c2) << endl;
cout << sizeof(y) << endl;
cout << sizeof(x) << endl;
return 0;
}
I make no pretense that it is 'great code' - it isn't - but it demonstrates a point. Compiled on MacOS X 10.6.4 with the C++ compiler from the GNU Compiler Collection (GCC 4.5.1) in default (64-bit) mode, the output is:
1
16
24
When compiled in 32-bit mode, the output is:
1
16
20
The first line of output demonstrates that 'sizeof(ref_var)' does indeed return the size of the referenced object. The second line shows that a structure with no reference in it has a size of 16 bytes. The third line shows that a very similar structure with a reference embedded in it at an 8-byte boundary (on a system where sizeof(int) == 4) is 8 bytes larger than the simpler structure under a 64-bit compilation and 4 bytes larger under a 32-bit compilation. By inference, the reference part of the structure occupies more than 4 bytes and not more than 8 bytes under the 64-bit compilation, and occupies not more than 4 bytes under the 32-bit compilation. This suggests that (in at least one popular implementation of C++) that a reference in a structure occupies the same amount of space as a pointer - as asserted in some of the other answers.
So, it may be implementation dependent, but the comment that a reference occupies the same space as a pointer holds true in at least one (rather widely used) implementation.

Related

Why is this pointer null

In Visual Studio, it seems like pointer to member variables are 32 bit signed integers behind the scenes (even in 64 bit mode), and a null-pointer is -1 in that context. So if I have a class like:
#include <iostream>
#include <cstdint>
struct Foo
{
char arr1[INT_MAX];
char arr2[INT_MAX];
char ch1;
char ch2;
};
int main()
{
auto p = &Foo::ch2;
std::cout << (p?"Not null":"null") << '\n';
}
It compiles, and prints "null". So, am I causing some kind of undefined behavior, or was the compiler supposed to reject this code and this is a bug in the compiler?
Edit:
It appears that I can keep the "2 INT_MAX arrays plus 2 chars" pattern and only in that case the compiler allows me to add as many members as I wish and the second character is always considered to be null. See demo. If I changed the pattern slightly (like 1 or 3 chars instead of 2 at some point) it complains that the class is too large.

The size limit of an object is implementation defined, per Annex B of the standard [1]. Your struct is of an absurd size.
If the struct is:
struct Foo
{
char arr1[INT_MAX];
//char arr2[INT_MAX];
char ch1;
char ch2;
};
... the size of your struct in a relatively recent version of 64-bit MSVC appears to be around 2147483649 bytes. If you then add in arr2, suddenly sizeof will tell you that Foo is of size 1.
The C++ standard (Annex B) states that the compiler must document limitations, which MSVC does [2]. It states that it follows the recommended limit. Annex B, Section 2.17 provides a recommended limit of 262144(?) for the size of an object. While it's clear that MSVC can handle more than that, it documents that it follows that minimum recommendation so I'd assume you should take care when your object size is more than that.
[1] http://eel.is/c++draft/implimits
[2] https://learn.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019

It's clearly a collision between an optimization on pointer-to-member representation (use only 4 bytes of storage when no virtual bases are present) and the pigeonhole principle.
For a type X containing N subobjects of type char, there are N+1 possible valid pointer-to-members of type char X::*... one for each subobject, and one for null-pointer-to-member.
This works when there are at least N+1 distinct values in the pointer-to-member representation, which for a 4-byte representation implies that N+1 <= 232 and therefore the maximum object size is 232 - 1.
Unfortunately the compiler in question made the maximum object-type size (before it rejects the program) equal to 232 which is one too large and creates a pigeonhole problem -- at least one pair of pointer-to-members must be indistinguishable. It's not necessary that the null pointer-to-member be one half of this pair, but as you've observed in this implementation it is.

The expression &Foo::ch2 is of type char Foo::*, which is pointer to member of class Foo. By rules, a pointer to member converted to bool should be evaluated as false ONLY if it is a null pointer, i.e. it had nullptr assigned to it.
The fault here appears to be a implementation's flaw. i.e. on gcc compilers with -march=x86-64 any assigned pointer to member evaluates to non-null (1) unless it had nullptr assigned to it with following code:
struct foo
{
char arr1[LLONG_MAX];
char arr2[LLONG_MAX];
char ch1;
char ch2;
};
int main()
{
char foo::* p1 = &foo::ch1;
char foo::* p2 = &foo::ch2;
std::cout << (p1?"Not null ":"null ") << '\n';
std::cout << (p2?"Not null ":"null ") << '\n';
std::cout << LLONG_MAX + LLONG_MAX << '\n';
std::cout << ULLONG_MAX << '\n';
std::cout << offsetof(foo, ch1) << '\n';
}
Output:
Not null
null
-2
18446744073709551615
18446744073709551614
Likely it's related to fact that class size is exceeding platform limitations, leading to offset of member being wrapped around of 0 (internal value of nullptr). Compiler doesn't detect it because it becomes a victim of... integer overflow with signed value and it's programmer's fault to cause UB within compiler by using signed literals as array size: LLONG_MAX + LLONG_MAX = -2 would be "size" of two arrays combined.
Essentially size of first two members is calculated as negative and offset of ch1 is -2 represented as unsigned 18446744073709551614.
And -2 therefore pointer is not null. Another compiler may clamp value to 0 producing a nullptr, or actually detect existing problem as clang does.
If offset of ch1 is -2, then offset of ch2 is -1? Let's add this:
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch1)) << '\n';
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch2)) << '\n';
Additional output:
-2
-1
And offset for first member is obviously 0 and if pointer represent offsets, then it needs another value to represent nullptr. it's logical to assume that this particular compiler considers only -1 to be a null value, which may or may not be case for other implementations.

When I test the code, VS shows that Foo: the class is too large.
When I add char arr3[INT_MAX], Visual Studio will report Error C2089 'Foo': 'struct' too large. Microsoft Docs explains it as The specified structure or union exceeds the 4GB limit.

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards

With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.

While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...

This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

Char to Int Pointer Casting Not Working

I'm confusing about casting of char to int pointer. I'm checking how pointer's casting works and the below code int to char is working fine.
#include <iostream>
using namespace std;
int main(){
int a=65;
void *p=&a;
cout << *static_cast<char*>(p);
}
Output
A
But when I try to cast from char to int it's not showing correct value.
#include <iostream>
using namespace std;
int main(){
char a='A';
void *p=&a;
cout << *static_cast<int*>(p);
}
What is the problem in the above code? Output is about garbage value.

First, you have to understand that the x86 architecture is what is called little-endian. This means that in multibyte variables, the bytes are ordered in memory from least to most significant. If you don't understand what that means, it'll become clear in a second.
A char is 8 bits -- one byte. When you store 'A' into one, it gets the value 0x41 and is happy. An int is larger; on many architectures it is 32 bits -- 4 bytes. When you assign the value 'A' to an int, it gets the value 0x00000041. This is numerically exactly the same, but there are three extra bytes of zeros in the int.
So your int contains 0x00000041. In memory, that is arranged in bytes, and because you're on a little-endian architecture, those bytes are arranged from least to most significant -- the opposite of how we normally write them! The memory actually looks like this:
+----+----+----+----+
int: | 41 | 00 | 00 | 00 |
+----+----+----+----+
+----+
char: | 41 |
+----+
When you take a pointer to the int and cast it to a char*, and then dereference it, the compiler will take the first byte of the int -- because chars are only one byte wide -- and print it out. The other three bytes get ignored! Now look back and notice that if the order of the bytes in the int were reversed, as on a big-endian architecture, you would have retrieved the value zero instead! So the behavior of this code -- the fact that the cast from int* to char* worked as you expected -- was strictly dependent on the machine you were running it on.
On the other hand, when you take a pointer to the char and cast it to an int*, and then defererence it, the compiler will grab the one byte in the char as you'd expect, but then it will also read three more bytes past it, because ints are four bytes wide! What is in those three bytes? You don't know! Your memory looks like this:
+----+
char: | 41 |
+----+
+----+----+----+----+
int: | 41 | ?? | ?? | ?? |
+----+----+----+----+
You get a garbage value in your int because you're reading memory that is uninitialized. On a different platform or under a different planetary alignment, your code might work perfectly fine, or it might segfault and crash. There's no telling. This is what is known as undefined behavior, and it is a dangerous game that we play with our compilers. We have to be very careful when working with memory on like this; there's nothing scarier than nondeterministic code.

You can safely represent anything as an array of char. It doesn't work the other way. This is part of the STRICT ALIASING rule.
You can read up on strict aliasing in other questions:
What is the strict aliasing rule?
More closely related to your question:
Once again: strict aliasing rule and char*

Quoting the answer given here: What is the strict aliasing rule?
[...] dereferencing a pointer that aliases another of an incompatible type is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.
Also related to your question: Once again: strict aliasing rule and char*
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
(I must give credit for this second link to #Let_Me_Be)

Here when you are doing:
cout << *static_cast<int*>(p);
you are actually saying that p is pointing to an integer (represented by 4 bytes in memory) but you just written a char in it before (represented by 1 bytes in memory) so when you cast it to an integer you expanded your variable to 3 garbage bytes.
But if you cast it back to a char you will get your 'A' because you are slicing your int to a char:
cout << (char) *static_cast<int*>(p);
Otherwise if you just want the ASCII value, cast your void* to an char* (so when you dereference it you are only accessing 1 byte) and cast what is inside it to int.
char a = 'A';
void *p=&a;
cout << static_cast<int>(*((char*)p));
The fact is that static cast is able to understand that you want to cast an char to int (and get his ASCII value) but when asking a char* to int* he just change the number of bytes read when you dereference it.

According to Standards, casting a char (or multiple chars) to int is undefined behavior and therefore any result is allowed. Most compilers will try to do what makes sense, and so the following is a likely reason for the behavior you are seeing on your specific architecture:
Assuming a 32 bit int, an int is the same size as 4 chars
Different architectures will treat those four bytes differently to translate their value to an int, most commonly this is either little endian or big endian
Looking at:
[Byte1][Byte2][Byte3][Byte4]
The int value would either be:
(Little Endina) Byte1+Byte2*256+Byte3*256^2+Byte4*256^3
(Big Endian ) Byte4+Byte3*256+Byte2*256^2+Byte1*256^3
In your case either Byte1 or Byte4 is being set, the remaining bytes are whatever happens to be in memory since you are only reserving one byte where you need 4
Try the following:
int main(){
char a[4]={'A', 0, 0, 0};
void *p=a;
cout << *static_cast<int*>(p);
}
You may have to switch the initialization to {0,0,0, 'A'} to get what you want based on architecture
As noted, this is undefined behavior, but should work with most compilers and give you a better idea of what is going on under the hood

Consider following code:
#include <iostream>
#include <iomanip>
using namespace std;
int main(){
{
int a=65;
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << setfill('0') << setw(2 * sizeof(int)) << *static_cast<int*>(p) << "\n";
}
{
char a='A';
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << *static_cast<int*>(p) << "\n";
}
}
There is indeed 'A' character code (0x41) in the output, but its padded to the size of int with uninitialized values. You can see it, when you output the hexadecimal values of variables.

confusion in union concept

#include<stdio.h>
union node {
int i;
char c[2];
};
main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
printf("%d\n", n.i);
return 0;
}
I think it gives 512 output becouse c[0] value stores in first byte and c[1] value stores in second byte, but gives 1965097472. Why ?.
I compiled this program in codeblocks in windows.

Your union allocates four bytes, starting off as:
[????] [????] [????] [????]
You set the least two significant bytes:
[????] [????] [0x02] [0x00]
You then print out all four bytes as an integer. You're not going to get 512, necessarily, because anything can be in those most significant two bytes. In this case, you had:
[0x75] [0x21] [0x02] [0x00]

Because undefined behavior. Accessing an union member that wasn't set does that, simple as that. It can do anything, print anything, and even crash.

Undefined behavior is, well... undefined.
We can try to answer why a specific result was given (and the other answers do that by guessing compiler implementation details), but we cannot say why another result was not given. For all that we know, the compiler could have printed 0, formatted your hard drive, set your house on fire or transferred 100,000,000 USD to your bank account.

The intis compiled as a 32 bit number, little endian. By setting the two lower bytes to 2 and 0 respectively and then reading the int you get 1965097472. If you look at the hexadecimal representation 7521 0200 you see your bytes again, and besides it is undefined behaviour and part of it depends on the memory architecture of the platform the program is running on.

Note that your int is likely to be at least 4 bytes (not 2, like it was in the good ol' days). To let the sizes match, change the type of i to uint16_t.
Even after this, the standard does not really permit setting one union member, and then accessing a different one in an attempt to reinterpret the bytes. However, you could get the same effect with a reinterpret_cast.
union node {
uint16_t i;
uint8_t c[2];
};
int main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
std::cout << *reinterpret_cast<uint16_t *>(&n) << std::endl;
return 0;
}

What does sizeof do?

What is the main function of sizeof (I am new to C++). For instance
int k=7;
char t='Z';
What do sizeof (k) or sizeof (int) and sizeof (char) mean?

sizeof(x) returns the amount of memory (in bytes) that the variable or type x occupies. It has nothing to do with the value of the variable.
For example, if you have an array of some arbitrary type T then the distance between elements of that array is exactly sizeof(T).
int a[10];
assert(&(a[0]) + sizeof(int) == &(a[1]));
When used on a variable, it is equivalent to using it on the type of that variable:
T x;
assert(sizeof(T) == sizeof(x));
As a rule-of-thumb, it is best to use the variable name where possible, just in case the type changes:
int x;
std::cout << "x uses " << sizeof(x) << " bytes." << std::endl
// If x is changed to a char, then the statement doesn't need to be changed.
// If we used sizeof(int) instead, we would need to change 2 lines of code
// instead of one.
When used on user-defined types, sizeof still returns the amount of memory used by instances of that type, but it's worth pointing out that this does not necessary equal the sum of its members.
struct Foo { int a; char b; };
While sizeof(int) + sizeof(char) is typically 5, on many machines, sizeof(Foo) may be 8 because the compiler needs to pad out the structure so that it lies on 4 byte boundaries. This is not always the case, and it's quite possible that on your machine sizeof(Foo) will be 5, but you can't depend on it.

To add to Peter Alexander's answer: sizeof yields the size of a value or type in multiples of the size of a char---char being defined as the smallest unit of memory addressable (by C or C++) for a given architecture (and, in C++ at least, at least 8 bits in size according to the standard). This is what's generally meant by "bytes" (smallest addressable unit for a given architecture) but it never hurts to clarify, and there are occasionally questions about the variability of sizeof (char), which is of course always 1.

sizeof() returns the size of the argument passed to it.
sizeof() cpp reference

sizeof is a compile time unary operator that returns size of data type.
For example:
sizeof(int)
will return the size of int in byte.
Also remember that type sizes are platform dependent.
Check this page for more details: sizeof in C/C++

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js