C/C++ Struct memory layout equivalency - c++

Consider the following C struct and C++ struct declarations:
extern "C" { // if this matters
typedef struct Rect1 {
int x, y;
int w, h;
} Rect1;
}
struct Vector {
int x;
int y;
}
struct Rect2 {
Vector pos;
Vector size;
}
Are the memory layouts of Rect1 and Rect2 objects always identical?
Specifically, can I safely reinterpret_cast from Rect2* to Rect1* and assume that all four int values in the Rect2 object are matched one on one to the four ints in Rect1?
Does it make a difference if I change Rect2 to a non-POD type, e.g. by adding a constructor?

I would think so, but I also think there could (legally) be padding between Rect2::pos and Rect2::size. So to make sure, I would add compiler-specific attributes to "pack" the fields, thereby guaranteeing all the ints are adjacent and compact. This is less about C vs. C++ and more about the fact that you are likely using two "different" compilers when compiling in the two languages, even if those compilers come from a single vendor.
Using reinterpret_cast to convert a pointer to one type to a pointer to another, you are likely to violate "strict aliasing" rules. Assuming you do dereference the pointer afterward, which you would in this case.
Adding a constructor will not change the layout (though it will make the class non-POD), but adding access specifiers like private between the two fields may change the layout (in practice, not only in theory).

Are the memory layouts of Rect1 and Rect2 objects always identical?
Yes. As long as certain obvious requirements hold, they are guaranteed to be identical. Those obvious requirements are about the target platform/architecture being the same in terms of alignment and word sizes. In other words, if you are foolish enough to compile the C and C++ code for different target platforms (e.g., 32bit vs. 64bit) and try to mix them, then you'll be in trouble, otherwise, you don't have to worry, the C++ compiler is basically required to produce the same memory layout as if it was in C, and ABI is fixed in C for a given word size and alignment.
Specifically, can I safely reinterpret_cast from Rect2* to Rect1* and assume that all four int values in the Rect2 object are matched one on one to the four ints in Rect1?
Yes. That follows from the first answer.
Does it make a difference if I change Rect2 to a non-POD type, e.g. by adding a constructor?
No, or at least, not any more. The only important thing is that the class remains a standard-layout class, which is not affected by constructors or any other non-virtual member. That's valid since the C++11 (2011) standard. Before that, the language was about "POD-types", as explained in the link I just gave for standard-layout. If you have a pre-C++11 compiler, then it is very likely still working by the same rules as the C++11 standard anyway (the C++11 standard rules (for standard-layout and trivial types) were basically written to match what all compiler vendors did already).

For a standard-layout class like yours you could easily check how members of a structure are positioned from the structure beginning.
#include <cstddef>
int x_offset = offsetof(struct Rect1,x); // probably 0
int y_offset = offsetof(struct Rect1,y); // probably 4
....
pos_offset = offsetof(struct Rect2,pos); // probably 0
....
http://www.cplusplus.com/reference/cstddef/offsetof/

Yes, they will always be the same.
You could try running the below example here cpp.sh
It runs as you expect.
// Example program
#include <iostream>
#include <string>
typedef struct Rect1 {
int x, y;
int w, h;
} Rect1;
struct Vector {
int x;
int y;
};
struct Rect2 {
Vector pos;
Vector size;
};
struct Rect3 {
Rect3():
pos(),
size()
{}
Vector pos;
Vector size;
};
int main()
{
Rect1 r1;
r1.x = 1;
r1.y = 2;
r1.w = 3;
r1.h = 4;
Rect2* r2 = reinterpret_cast<Rect2*>(&r1);
std::cout << r2->pos.x << std::endl;
std::cout << r2->pos.y << std::endl;
std::cout << r2->size.x << std::endl;
std::cout << r2->size.y << std::endl;
Rect3* r3 = reinterpret_cast<Rect3*>(&r1);
std::cout << r3->pos.x << std::endl;
std::cout << r3->pos.y << std::endl;
std::cout << r3->size.x << std::endl;
std::cout << r3->size.y << std::endl;
}

Related

Zero sized array in struct managed by shared pointer

Consider the following structure:
struct S
{
int a;
int b;
double arr[0];
} __attribute__((packed));
As you can see, this structure is packed and has Zero sized array at the end.
I'd like to send this as binary data over the network (assume I took care of endianity).
In C/C++ I could just use malloc to allocate as much space as I want and use free later.
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
Sure, there is:
shared_ptr<S> make_buffer(size_t s)
{
auto buffer = malloc(s); // allocate as usual
auto release = [](void* p) { free(p); }; // a deleter
shared_ptr<void> sptr(buffer, release); // make it shared
return { sptr, new(buffer) S }; // an aliased pointer
}
This works with any objects that are placed in a malloced buffer, not just when there are zero-sized arrays, provided that the destructor is trivial (performs no action) because it is never called.
The usual caveats about zero-sized arrays and packed structures still apply as well, of course.
double arr[0];
} __attribute__((packed));
Zero sized arrays are not allowed as data members (nor as any other variable) in C++. Furthermore, there is no such attribute as packed in C++; it is a language extension (as such, it may be considered to be a special hack). __attribute__ itself is a language extension. The standard syntax for function attributes uses nested square brackets like this: [[example_attribute]].
I'd like to send this as binary data over the network
You probably should properly serialise the data. There are many serialisation specifications although none of them is universally ubiquitous and none of them is implemented in the C++ standard library. Indeed, there isn't a standard API for network commnication either.
A straightforward solution is to pick an existing serialisation format and use an existing library that implements it.
First, let me explain again why I have this packed structure:
it is used for serialization of data over the network so there's a header file with all network packet structures.
I know it generates bad assembly due to alignment issues, but I guess that this problem persists with regular serialization (copy to char * buffer with memcpy).
Zero size arrays are supported both by gcc and clang which I use.
Here's an example of a full program with a solution to my question and it's output (same output for gcc and g++).
compiled with -O3 -std=c++17 flags
#include <iostream>
#include <memory>
#include <type_traits>
#include <cstddef>
struct S1
{
~S1() {std::cout << "deleting s1" << std::endl;}
char a;
int b;
int c[0];
} __attribute__((packed));
struct S2
{
char a;
int b;
int c[0];
};
int main(int argc, char **argv)
{
auto s1 = std::shared_ptr<S1>(static_cast<S1 *>(::operator
new(sizeof(S1) + sizeof(int) * 1e6)));
std::cout << "is standart: " << std::is_standard_layout<S1>::value << std::endl;
for (int i = 0; i < 1e6; ++i)
{
s1->c[i] = i;
}
std::cout << sizeof(S1) << ", " << sizeof(S2) << std::endl;
std::cout << offsetof(S1, c) << std::endl;
std::cout << offsetof(S2, c) << std::endl;
return 0;
}
This is the output:
is standart: 1
5, 8
5
8
deleting s1
Is there anything wrong with doing this?
I made sure using valgrind all allocations/frees work properly.

Cast an object value without pointers

Let's assume that A and B are two classes (or structures) having no inheritance relationships (thus, object slicing cannot work). I also have an object b of the type B. I would like to interpret its binary value as a value of type A:
A a = b;
I could use reinterpret_cast, but I would need to use pointers:
A a = reinterpret_cast<A>(b); // error: invalid cast
A a = *reinterpret_cast<A *>(&b); // correct [EDIT: see *footnote]
Is there a more compact way (without pointers) that does the same? (Including the case where sizeof(A) != sizeof(B))
Example of code that works using pointers: [EDIT: see *footnote]
#include <iostream>
using namespace std;
struct C {
int i;
string s;
};
struct S {
unsigned char data[sizeof(C)];
};
int main() {
C c;
c.i = 4;
c.s = "this is a string";
S s = *reinterpret_cast<S *>(&c);
C s1 = *reinterpret_cast<C *>(&s);
cout << s1.i << " " << s1.s << endl;
cout << reinterpret_cast<C *>(&s)->i << endl;
return 0;
}
*footnote: It worked when I tried it, but it is actually an undefined behavior (which means that it may work or not) - see comments below
No. I think there's nothing in the C++ syntax that allows you to implicitly ignore types. First, that's against the notion of static typing. Second, C++ lacks standardization at binary level. So, whatever you do to trick the compiler about the types you're using might be specific to a compiler implementation.
That being said, if you really wanna do it, you should check how your compiler's data alignment/padding works (i.e.: struct padding in c++) and if there's a way to control it (i.e.: What is the meaning of "__attribute__((packed, aligned(4))) "). If you're planning to do this across compilers (i.e.: with data transmitted across the network), then you should be extra careful. There are also platform issues, like different addressing models and endianness.
Yes, you can do it without a pointer:
A a = reinterpret_cast<A &>(b); // note the '&'
Note that this may be undefined behaviour. Check out the exact conditions at http://en.cppreference.com/w/cpp/language/reinterpret_cast

Padding in struct containing only one int array member in C++?

Giving simple structure (POD) containing only one array of shorts (bytes, ints from <cstdint>, etc) and no more fields will be added later:
#define FIXED_SIZE 128 // 'fixed' in long term, shouldn’t change in future versions
struct Foo {
uint16_t bar[FIXED_SIZE];
};
is it any possibility to end up with padding at the end of the structure added by compiler for any reason ?
It seems reasonable not to make any padding as it is no any obvious need of it, but is it any guarantees by standard (could you provide any links where it is explained)?
Later I would like to use arrays of Foo structs in simple serialization (IPC) within different platforms and don't want to use any libraries for this simple task (code simplified for demonstration):
#define FOO_ELEMS 1024
...
// sender
Foo *from = new Foo[FOO_ELEMS];
uint8_t *buff_to = new uint8_t[FOO_ELEMS * FIXED_SIZE * sizeof(uint16_t) ];
memcpy(buff_to, from, ...);
...
// receiver
uint8_t *buff_from = new uint8_t[ ... ];
Foo *to = new Foo[FOO_ELEMS];
memcpy(to, buff_from, ...);
I would like to use struct here instead of plain arrays as it will be some auxiliary methods within struct and it seems more convenient then to use plain functions + arrays pointers instead.
Intersects with this (plain C) question, but seems a little bit different for me:
Alignment of char array struct members in C standard
The various standards provide for padding to occur (but not at the start).
There is no strict requirement at all that it will only appear to align the members and the object in arrays.
So the truly conformant answer is:
Yes, there may be padding because the compiler can add it but not at the start or between array elements.
There is no standard way of forcing packing either.
However every time this comes up and every time I ask no one has ever identified a real compiler on a platform that pads structures for any other reason than for internal alignment and array alignment.
So for all know practical purposes that structure will not be packed on any known platform.
Please consider this yet another request for someone to find a real platform that breaks that principle.
Since we are already guaranteed that there will no padding at the beginning of the structure don't have to worry about that. At the end I could see padding being added if the sizeof of the array was not divisible by the word size of the machine.
The only way I could get any padding to be added to the struct though was to add an int member to the struct as well. In doing so the struct was padded to make them the same size.
#include <iostream>
#include <cstdint>
struct a
{
uint16_t bar[128];
};
struct b
{
uint16_t bar[127];
};
struct c
{
int test;
uint16_t bar[128];
};
struct d
{
int test;
uint16_t bar[127];
};
struct e
{
uint16_t bar[128];
int test;
};
struct f
{
uint16_t bar[127];
int test;
};
int main()
{
std::cout << sizeof(a) << "\t" << sizeof(b) << "\t" << sizeof(c) << "\t" << sizeof(d) << "\t" << sizeof(e) << "\t" << sizeof(f);
}
Live Example

Valid use of reinterpret_cast?

Empirically the following works (gcc and VC++), but is it valid and portable code?
typedef struct
{
int w[2];
} A;
struct B
{
int blah[2];
};
void my_func(B b)
{
using namespace std;
cout << b.blah[0] << b.blah[1] << endl;
}
int main(int argc, char* argv[])
{
using namespace std;
A a;
a.w[0] = 1;
a.w[1] = 2;
cout << a.w[0] << a.w[1] << endl;
// my_func(a); // compiler error, as expected
my_func(reinterpret_cast<B&>(a)); // reinterpret, magic?
my_func( *(B*)(&a) ); // is this equivalent?
return 0;
}
// Output:
// 12
// 12
// 12
Is the reinterpret_cast valid?
Is the C-style cast equivalent?
Where the intention is to have the bits located at &a interpreted as a
type B, is this a valid / the best approach?
(Off topic: For those that want to know why I'm trying to do this, I'm dealing with two C libraries that want 128 bits of memory, and use structs with different internal names - much like the structs in my example. I don't want memcopy, and I don't want to hack around in the 3rd party code.)
In C++11, this is fully allowed if the two types are layout-compatible, which is true for structs that are identical and have standard layout. See this answer for more details.
You could also stick the two structs in the same union in previous versions of C++, which had some guarantees about being able to access identical data members (a "common initial sequence" of data members) in the same order for different structure types.
In this case, yes, the C-style cast is equivalent, but reinterpret_cast is probably more idiomatic.

Naming Array Elements, or Struct And Array Within a Union

Consider the following struct:
struct Vector4D
{
union
{
double components[4];
struct { double x, y, z, t; } Endpoint;
};
};
It seems to me that I have seen something similar in WinApi's IPAddress struct. The idea is to give me the possibility to use the array components both by index and by name, for example:
Vector4D v;
v.components[2] = 3.0;
ASSERT(v.Endpoint.z == 3.0) //let's ignore precision issues for now
In the C++ standard there is a guarantee that there will be no "empty" space at the beginning of a POD-struct, that is, the element x will be situated right in the beginnig of the Endpoint struct. Good so far. But I don't seem to find any guarantees that there will be no empty space or padding, if you will, between x and y, or y and z, etc. I haven't checked out the C99 standard though.
The problem is that if there is an empty space between Endpoint struct elements, then the idea will not work.
Questions:
Am I right that there indeed is no guarantee that this will work either in C or C++.
Will this practically work on any known implementation? In other words, do you know of any implementation where this doesn't work?
Is there any standard(I mean not compiler-specific) way to express the same idea? Maybe the C++0x alignment features might help?
By the way, this isn't something I am doing in production code, don't worry, just curious. Thanks in advance.
yes
depends on the alignment needs of the architecture and the compilers strategy
no, but you could make a object wrapper (but you will end up with .z() instead of just .z)
Most compilers should support squashing a structure using a pragma or an attribute. #pragma pack for example.
You can circumvent any memory alignment issues by having references to each element of the array, as long as you declare the array before the references in the class to ensure they point to valid data. Having said that I doubt alignment would be an issue with doubles, but could be for other types (float on 64bit arch perhaps?)
#include <iostream>
using namespace std;
struct Vector4D
{
Vector4D() : components(), x(components[0]), y(components[1]), z(components[2]), t(components[3]) { }
double components[4];
double& x;
double& y;
double& z;
double& t;
};
int main()
{
Vector4D v;
v.components[0] = 3.0;
v.components[1] = 1.0;
v.components[2] = 4.0;
v.components[3] = 15.0;
cout << v.x << endl;
cout << v.y << endl;
cout << v.z << endl;
cout << v.t << endl;
}
Hope this helps.
When it comes to the standard, there are two problems with it:
It is unspecified what happens when writing to an element in a union and reading from another, see the C standard 6.2.6.1 and K.1
The standard does not guarantee the layout of the struct match that of the layout of the array, see the C standard 6.7.2.1.10 for details.
Having said this, in practice this will work on normal compilers. In fact, this kind of code is widely spread and is often used to reinterpret values of one type into values of another type.
Padding bytes will not cause an issue as all variables are of type double. The compiler will treat Vector4D as a double array. That means, v.Endpoint.z is essentially the same as v[2].