Data member offset in C++ - c++

"Inside the C++ Object Model" says that the offset of a data member in a class is always 1 more than the actual offset in order to distinguish between the pointer to 0 and the pointer to the first data member, here is the example:
class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
//to be used after, ignore it for the first question
int main(void) {
/*cout << "&Point3d::x = " << &Point3d::x << endl;
cout << "&Point3d::y = " << &Point3d::y << endl;
cout << "&Point3d::z = " << &Point3d::z << endl;*/
printf("&Point3d::x = %p\n", &Point3d::x);
printf("&Point3d::y = %p\n", &Point3d::y);
printf("&Point3d::z = %p\n", &Point3d::z);
getchar();
}
So in order to distinguish the two pointers below, the offset of a data member is always 1 more.
float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;
The main function above is attempt to get the offset of the members to verify this argument, which is supposed to output: 5, 9, 13(Consider the vptr of 4bytes at the beginning). In MS Visual Studio 2012 however, the output is:
&Point3d::x = 00000004
&Point3d::y = 00000008
&Point3d::z = 0000000C
Question: So is MS C++ compiler did some optimization or something to prevent this mechanism?

tl;dr
Inside the C++ Object Model is a very old book, and most of its contents are implementation details of a particular compiler anyway. Don't worry about comparing your compiler to some ancient compiler.
Full version
An answer to the question linked to in a comment on this question addresses this quite well.
The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.
[...]
Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.
And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.
The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit.
[...]
The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

The language doesn't specify how member-pointers are represented, so anything you read in a book will just be an example of how they might be represented.
In this case, as you say, it sounds like the vptr occupies the first four bytes of the object; again, this is not something specified by the language. If that is the case, no accessible members would have an offset of zero, so there's no need to adjust the offsets to avoid zero; a member-pointer could simply be represented by the member's offset, with zero reserved for "null". It sounds like that is what your compiler does.
You may find that the offsets for non-polymorphic types are adjusted as you describe; or you may find that the representation of "null" is not zero. Either would be valid.

class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
Since your class contains a virtual destructor, and (most of) the compiler(s) typically puts a pointer to the virtual function table as the first element in the object, it makes sense that the first of your data is at offset 4 (I'm guessing your compiler is a 32-bit compiler).
Note however that the C++ standard does not stipulate how data members should be stored inside the class, and even less how much space, if any, the virtual function table should take up.
[And yes, it's invalid (undefined behaviour) to take the address of a element that is not to a "real" member object, but I don't think this is causing an issue in this particular example - it may with a different compiler or on a different processor architecture, etc]

Unless you specify a different alignment, your expectation of the offset bing 5, ... would be wwong anyway. Normaly the adresses of bigger elements than char are usually aligned on even adresses and I guess even to the next 4-byte boundary. The reason is efficiency of accessing the memory in the CPU.
On some architectures, accessing an odd address could cause an exception (i.e. Motorola 68000), depending on the member, or at least a performance slowdown.

While it's true that the a null pointer of type "pointer to member of a given type" must be different from any non-null value of that type, offsetting non-null pointers by one is not the only way that a compiler can ensure this. For example, my compiler uses a non-zero representation of null pointer-to-members.
namespace {
struct a {
int x, y;
};
}
#include <iostream>
int main() {
int a::*p = &a::x, a::*q = &a::y, a::*r = nullptr;
std::cout << "sizeof(int a::*) = " << sizeof(int a::*)
<< ", sizeof(unsigned long) = " << sizeof(long);
std::cout << "\n&a::x = " << *reinterpret_cast<long*>(&p)
<< "\n&a::y = " << *reinterpret_cast<long*>(&q)
<< "\nnullptr = " << *reinterpret_cast<long*>(&r)
<< '\n';
}
Produces the following output:
sizeof(int a::*) = 8, sizeof(unsigned long) = 8
&a::x = 0
&a::y = 4
nullptr = -1
Your compiler is probably doing something similar, if not identical. This scheme is probably more efficient for most 'normal' use cases for the implementation because it won't have to do an extra "subtract 1" every time you use a non-null pointer-to-member.

That book (available at this link) should make it much clearer that it is just describing a particular implementation of a C++ compiler. Details like the one you mention are not part of the C++ language specification -- it's just how Stanley B. Lippman and his colleagues decided to implement a particular feature. Other compilers are free to do things a different way.

Related

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards
With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.
While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...
This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

How to explain the value of sizeof(std::vector<int>)?

In order to understand the memory consumption of std::vector<int> I wrote:
std::cout << sizeof(std::vector<int>) << std::endl;
This yields 32. I tried to understand where this value comes from. Some look in the source code revieled that std::vector stores pointers _MyFirst, _MyLastand _MyEnd which explaines 24 bytes of memory consumption (on my 64 bit system).
What about the last 8 byte? As I understand, the stored allocator does not use any memory. Also this might be implementation defined (is it?), so maybe this helps: I am working with MSVC 2017.5. I do not guarantee to have found all the members by looking into the code; the code looks very obfuscated to me.
Everything seems to be nicely aligned, but may the answer be the following?: Why isn't sizeof for a struct equal to the sum of sizeof of each member?. But I tested it with a simple struct Test { int *a, *b, *c; }; which satisfiessizeof(Test) == 24.
Some background
In my program, I will have a lot of vectors and it seems that most of them will be empty. This means that the ciritical memory consumption comes from there empty-state, i.e. the heap allocated memory is not so very important.
A simple "just for this usecase"-vector is implemented pretty quickly, so I wondered if I am missing anything and I will need 32 bytes of memory anyway, even with my own implementation (note: I will most probably not implement my own, this is just curiosity).
Update
I tested it again with the following struct:
struct Test
{
int *a, *b, *c;
std::allocator<int> alloc;
};
which now gave sizeof(Test) == 32. It seems that even though std::allocator has no memory consuming members (I think), its presence raises Test's size to 32 byte.
I recognized that sizeof(std::allocator<int>) yields 1, but I thought this is how a compiler deals with empty structs and that this is optimized away when it is used as a member. But this seems to be a problem with my compiler.
The compiler cannot optimise away an empty member. It is explicitly forbidden by the standard.
Complete objects and member subobjects of an empty class type shall have nonzero size
An empty base class subobject, on the other hand, may have zero size. This is exactly how GCC/libstdc++ copes with the problem: it makes the vector implementation inherit the allocator.
There doesn't to be something standarized about the data members of std::vector, thus you can assume it's implementation defined.
You mention the three pointers, thus you can check the size of a class (or a struct) with three pointers as its data members.
I tried running this:
std::cout << sizeof(classWith3PtrsOnly) << " " << sizeof(std::vector<int>) << std::endl;
on Wandbox, and got:
24 24
which pretty much implies that the extra 8 bytes come from "padding added to satisfy alignment constraints".
I've occurred the same question recently. Though I still not figure out how std::vector does this optimization, I found out a way get through by C++20.
C++ attribute: no_unique_address (since C++20)
struct Empty {};
struct NonEmpty {
int* p;
};
template<typename MayEmpty>
struct Test {
int* a;
[[no_unique_address]] MayEmpty mayEmpty;
};
static_assert(sizeof(Empty) == 1);
static_assert(sizeof(NonEmpty) == 8);
static_assert(sizeof(Test<Empty>) == 8);
static_assert(sizeof(Test<NonEmpty>) == 16);
If you ran the above test with Windows at DEBUG level, then be aware that "vector" implementation inherits from "_Vector_val" which has an additional pointer member at its _Container_base class (in addition to Myfirst, Mylast, Myend):
_Container_proxy* _Myproxy
It increases the vector class size from 24 to 32 bytes in DEBUG build only (where _ITERATOR_DEBUG_LEVEL == 2)

Type casting struct to integer and vice versa in C++

So, I've seen this thread Type casting struct to integer c++ about how to cast between integers and structs (bitfields) and undoubtly, writing a proper conversion function or overloading the relevant casting operators is the way to go for any cases where there is an operating system involved.
However, when writing firmware for a small embedded system where only one flash image is run, the case might be different insofar, as security isn't so much of a concern while performance is.
Since I can test whether the code works properly (meaning the bits of a bitfield are arranged the way I would expect them to be) each time when compiling my code, the answer might be different here.
So, my question is, whether there is a 'proper' way to convert between bitfield and unsigned int that does compile to no operations in g++ (maybe shifts will get optimised away when the compiler knows the bits are arranged correctly in memory).
This is an excerpt from the original question:
struct {
int part1 : 10;
int part2 : 6;
int part3 : 16;
} word;
I can then set part2 to be equal to whatever value is requested, and set the other parts as 0.
word.part1 = 0;
word.part2 = 9;
word.part3 = 0;
I now want to take that struct, and convert it into a single 32 bit integer. I do have it compiling by forcing the casting, but it does not seem like a very elegant or secure way of converting the data.
int x = *reinterpret_cast<int*>(&word);
EDIT:
Now, quite some time later, I have learned some things:
1) Type punning (changing the interpretation of data) by means of pointer casting is, undefined behaviour since C99 and C++98. These language changes introduced strict aliasing rules (They allow the compiler to reason that data is only accessed through pointers of compatible type) to allow for better optimisations. In effect, the compiler will not need to keep the ordering between accesses (or do the off-type access at all). For most cases, this does not seem to present a [immediate] problem, but when using higher optimisation settings (for gcc that is -O which includes -fstrict-aliasing) this will become a problem.
For examples see https://blog.regehr.org/archives/959
2) Using unions for type punning also seems to involve undefined behaviour in C++ but not C (See https://stackoverflow.com/a/25672839/4360539), however GCC (and probably others) does explicitly allow it: (See https://gcc.gnu.org/bugs/#nonbugs).
3) The only really reliable way of doing type punning in C++ seems to be using memcpy to copy the data to a new location and perform whatever is to be done and then to use another memcpy to return the changes. I did read somewhere on SO, that GCC (or most compilers probably) should be able to optimise the memcpy to a mere register copy for register-sized data types, but I cannot find it again.
So probably the best thing to do here is to use the union if you can be sure the code is compiled by a compiler supporting type punning through a union. For the other cases, further investigation would be needed how the compiler treats bigger data structures and memcpy and if this really involves copying back and forth, probably sticking with bitwise operations is the best idea.
union {
struct {
int part1: 10;
int part2: 6;
int part3: 16;
} parts;
int whole;
} word;
Then just use word.whole.
I had the same problem. I am guessing this is not very relevant today. But this is how I solved it:
#include <iostream>
struct PACKED{
int x:10;
int y:10;
int z:12;
PACKED operator=(int num )
{
*( int* )this = num;
return *this;
}
operator int()
{
int *x;
x = (int*)this;
return *x;
}
} __attribute__((packed));
int main(void) {
std::cout << "size: " << sizeof(PACKED) << std::endl;
PACKED bf;
bf = 0xFFF00000;
std::cout << "Values ( x, y, z ) = " << bf.x << " " << bf.y << " " << bf.z << std::endl;
int testint;
testint = bf;
std::cout << "As integer: " << testint << std::endl;
return 0;
}
This now fits on a int, and is assignable by standard ints. However I do not know how portable this solution is. The output of this is then:
size: 4
Values ( x, y, z ) = 0 0 -1
As integer: -1048576

Pointer to data member address

I have read (Inside C++ object model) that address of pointer to data member in C++ is the offset of data member plus 1?
I am trying this on VC++ 2005 but i am not getting exact offset values.
For example:
Class X{
public:
int a;
int b;
int c;
}
void x(){
printf("Offsets of a=%d, b=%d, c=%d",&X::a,&X::b,&X::c);
}
Should print Offsets of a=1, b=5, c=9. But in VC++ 2005 it is coming out to be a=0,b=4,c=8.
I am not able to understand this behavior.
Excerpt from book:
"That expectation, however, is off by one—a somewhat traditional error
for both C and C++ programmers.
The physical offset of the three coordinate members within the class
layout are, respectively, either 0, 4, and 8 if the vptr is placed at
the end or 4, 8, and 12 if the vptr is placed at the start of the
class. The value returned from taking the member's address, however,
is always bumped up by 1. Thus the actual values are 1, 5, and 9, and
so on. The problem is distinguishing between a pointer to no data
member and a pointer to the first data member. Consider for example:
float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;
// oops: how to distinguish?
if ( p1 == p2 ) {
cout << " p1 & p2 contain the same value — ";
cout << " they must address the same member!" << endl;
}
To distinguish between p1 and p2, each actual member offset value is
bumped up by 1. Hence, both the compiler (and the user) must remember
to subtract 1 before actually using the value to address a member."
The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.
Think in terms of your structure being at memory location 100:
100: class X { int a;
104: int b;
108: int c;
As you can see, the address of a is the same as the address of the entire structure, so its offset (what you have to add to the structure address to get the item address) is 0.
Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.
And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.
The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit. In fact, the introduction even states "Explains the basic implementation of the object-oriented features ..." (my italics).
And the fact that nobody can find anything in the ISO standard saying this behaviour is required, along the fact that neither MSVC not gcc act that way, leads me to believe that, even if this was true of one particular implementation far in the past, it's not true (or required to be true) of all.
The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.
Firstly, the internal representation of values of a pointer to a data member type is an implementation detail. It can be done in many different ways. You came across a description of one possible implementation, where the pointer contains the offset of the member plus 1. It is rather obvious where that "plus 1" come from: that specific implementation wants to reserve the physical zero value (0x0) for null pointer, so the offset of the first data member (which could easily be 0) has to be transformed to something else to make it different from a null pointer. Adding 1 to all such pointers solves the problem.
However, it should be noted that this is a rather cumbersome approach (i.e. the compiler always has to subtract 1 from the physical value before performing access). That implementation was apparently trying very hard to make sure that all null-pointers are represented by a physical zero-bit pattern. To tell the truth, I haven't encountered implementations that follow this approach in practice these days.
Today, most popular implementations (like GCC or MSVC++) use just the plain offset (not adding anything to it) as the internal representation of the pointer to a data member. The physical zero will, of course, no longer work for representing null pointers, so they use some other physical value to represent null pointers, like 0xFFFF... (this is what GCC and MSVC++ use).
Secondly, I don't understand what you were trying to say with your p1 and p2 example. You are absolutely wrong to assume that the pointers will contain the same value. They won't.
If we follow the approach described in your post ("offset + 1"), then p1 will receive the physical value of null pointer (apparently a physical 0x0), while the p2 whill receive physical value of 0x1 (assuming x has offset 0). 0x0 and 0x1 are two different values.
If we follow the approach used by modern GCC and MSVC++ compilers, then p1 will receive the physical value of 0xFFFF.... (null pointer), while p2 will be assigned a physical 0x0. 0xFFFF... and 0x0 are again different values.
P.S. I just realized that the p1 and p2 example is actually not yours, but a quote from a book. Well, the book, once again, is describing the same problem I mentioned above - the conflict of 0 offset with 0x0 representation for null pointer, and offers one possible viable approach to solving that conflict. But, once again, there are alternative ways to do it, and many compilers today use completely different approaches.
The behavior you're getting looks quite reasonable to me. What sounds wrong is what you read.
To complement AndreyT's answer: Try running this code on your compiler.
void test()
{
using namespace std;
int X::* pm = NULL;
cout << "NULL pointer to member: "
<< " value = " << pm
<< ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
pm = &X::a;
cout << "pointer to member a: "
<< " value = " << pm
<< ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
pm = &X::b;
cout << "pointer to member b: "
<< " value = " << pm
<< ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
}
On Visual Studio 2008 I get:
NULL pointer to member: value = 0, raw byte value = 0xffffffff
pointer to member a: value = 1, raw byte value = 0x0
pointer to member b: value = 1, raw byte value = 0x4
So indeed, this particular compiler is using a special bit pattern to represent a NULL pointer and thus leaving an 0x0 bit pattern as representing a pointer to the first member of an object.
This also means that wherever the compiler generates code to translate such a pointer to an integer or a boolean, it must be taking care to look for that special bit pattern. Thus something like if(pm) or the conversion performed by the << stream operator is actually written by the compiler as a test against the 0xffffffff bit pattern (instead of how we typically like to think of pointer tests being a raw test against address 0x0).
I have read that address of pointer to
data member in C++ is the offset of
data member plus 1?
I have never heard that, and your own empirical evidence shows it's not the case. I think you misunderstood an odd property of structs & class in C++. If they are completely empty, they nevertheless have a size of 1 (so that each element of an array of them has a unique address)
$9.2/12 is interesting
Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).
This explains that such behavior is implementation defined. However the fact that 'a', 'b' and 'c' are at increasing addresses is in accordance with the Standard.

Explicit Address Manipulation in C++

Please check out the following func and its output
void main()
{
Distance d1;
d1.setFeet(256);
d1.setInches(2.2);
char *p=(char *)&d1;
*p=1;
cout<< d1.getFeet()<< " "<< d1.getInches()<< endl;
}
The class Distance gets its values thru setFeet and setInches, passing int and float arguments respectively. It displays the values through through the getFeet and getInches methods.
However, the output of this function is 257 2.2. Why am I getting these values?
This is a really bad idea:
char *p=(char *)&d1;
*p=1;
Your code should never make assumptions about the internal structure of the class. If your class had any virtual functions, for example, that code would cause a crash when you called them.
I can only conclude that your Distance class looks like this:
class Distance {
short feet;
float inches;
public:
void setFeet(...
};
When you setFeet(256), it sets the high byte (MSB) to 1 (256 = 1 * 2^8) and the low byte (LSB) to 0. When you assign the value 1 to the char at the address of the Distance object, you're forcing the first byte of the short representing feet to 1. On a little-endian machine, the low byte is at the lower address, so you end up with a short with both bytes set to 1, which is 1 * 2^8 + 1 = 257.
On a big-endian machine, you would still have the value 256, but it would be purely coincidental because you happen to be forcing a value of 1 on a byte that would already be 1.
However, because you're using undefined behavior, depending on the compiler and the compile options, you might end up with literally anything. A famous expression from comp.lang.c is that such undefined behavior could "cause demons to fly out of your nose".
You are illegally munging memory via the 'p' pointer.
The output of the program is undefined; as you are directly manipulating memory that is owned by an object through a pointer of another type without regard to the underlying types.
Your code is somewhat like this:
struct Dist
{
int x;
float y;
};
union Plop
{
Dist s; // Your class
char p; // The type you are pretending to use via 'p'
};
int main()
{
Plop p;
p.s.x = 5; // Set up the Dist structure.
p.s.y = 2.3;
p.p = 1; // The value of s is now undefined.
// As you have scribbled over the memory used by s.
}
The behaviour based on the code given is going to be very unpredictable. Setting the first byte of d1's data could potentially clobber a vptr, compiler-specific memory, the sign/exponent of a floating point value, or LSB or MSB of an integer, all depending on the definition of Distance.
I assume you think doing *p = 1 will set one of the internal data members (presumably 'feet') in the Distance object. It may work, but (afaik) you've got no guarantees that the feet member is at the first address of the object, is of the correct size (unless its type is also char) or that it's aligned correctly.
If you want to do that why not make the 'feet' member public and do:
d1.feet = 1;
Another thing, to comment on the program: don't use void main(). It isn't standard, and it offers you no benefits. It will make people not take you as seriously when asking C or C++ questions, and could cause programs to not compile, or not work properly.
The C++ Standard, in 3.6.1 paragraph 2, says that main() always returns int, although the implementation may offer variations with different arguments.
This would be a good time to break the habit. If you're learning from a book that uses void main(), the book is unreliable. See about getting another book, if only for reference.
It looks like you are new to programming and could use some help with basic concepts.
It's good that you are looking for that, but SO may not be the right place to get it.
Good luck.
The Definition of class is
class Distance{
int feet;
float inches;
public:
//...functions
};
now the int feet would be 00000001 00000000 (2 bytes) where the zeros would occupy lower address in Little Endian so the char *p will be 00000000.. when u make *p=1, the lower byte becomes 00000001 so the int variable now is 00000001 00000001 which is exactly 257!