Consider the following structure:
struct S
{
int a;
int b;
double arr[0];
} __attribute__((packed));
As you can see, this structure is packed and has Zero sized array at the end.
I'd like to send this as binary data over the network (assume I took care of endianity).
In C/C++ I could just use malloc to allocate as much space as I want and use free later.
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
Sure, there is:
shared_ptr<S> make_buffer(size_t s)
{
auto buffer = malloc(s); // allocate as usual
auto release = [](void* p) { free(p); }; // a deleter
shared_ptr<void> sptr(buffer, release); // make it shared
return { sptr, new(buffer) S }; // an aliased pointer
}
This works with any objects that are placed in a malloced buffer, not just when there are zero-sized arrays, provided that the destructor is trivial (performs no action) because it is never called.
The usual caveats about zero-sized arrays and packed structures still apply as well, of course.
double arr[0];
} __attribute__((packed));
Zero sized arrays are not allowed as data members (nor as any other variable) in C++. Furthermore, there is no such attribute as packed in C++; it is a language extension (as such, it may be considered to be a special hack). __attribute__ itself is a language extension. The standard syntax for function attributes uses nested square brackets like this: [[example_attribute]].
I'd like to send this as binary data over the network
You probably should properly serialise the data. There are many serialisation specifications although none of them is universally ubiquitous and none of them is implemented in the C++ standard library. Indeed, there isn't a standard API for network commnication either.
A straightforward solution is to pick an existing serialisation format and use an existing library that implements it.
First, let me explain again why I have this packed structure:
it is used for serialization of data over the network so there's a header file with all network packet structures.
I know it generates bad assembly due to alignment issues, but I guess that this problem persists with regular serialization (copy to char * buffer with memcpy).
Zero size arrays are supported both by gcc and clang which I use.
Here's an example of a full program with a solution to my question and it's output (same output for gcc and g++).
compiled with -O3 -std=c++17 flags
#include <iostream>
#include <memory>
#include <type_traits>
#include <cstddef>
struct S1
{
~S1() {std::cout << "deleting s1" << std::endl;}
char a;
int b;
int c[0];
} __attribute__((packed));
struct S2
{
char a;
int b;
int c[0];
};
int main(int argc, char **argv)
{
auto s1 = std::shared_ptr<S1>(static_cast<S1 *>(::operator
new(sizeof(S1) + sizeof(int) * 1e6)));
std::cout << "is standart: " << std::is_standard_layout<S1>::value << std::endl;
for (int i = 0; i < 1e6; ++i)
{
s1->c[i] = i;
}
std::cout << sizeof(S1) << ", " << sizeof(S2) << std::endl;
std::cout << offsetof(S1, c) << std::endl;
std::cout << offsetof(S2, c) << std::endl;
return 0;
}
This is the output:
is standart: 1
5, 8
5
8
deleting s1
Is there anything wrong with doing this?
I made sure using valgrind all allocations/frees work properly.
Related
I was learning about the structures in C++ and got to know that if a structure in C++ has 3 variables (let each one be of some data type), then all of them are not allocated in a contiguous fashion. Is this correct?
If yes, then how much memory would be allocated to an object to that structure type?
For e.g. Let's say we have a structure like this:
struct a
{
int x;
int y;
char c;
};
Now how intuitively an object of type a, must occupy some space = sizeOf(int) + sizeOf(int) + sizeOf(char). But, if they are not allocated continuously, there could be some memory locations allocated for that object, that are just present for providing some padding - i.e. the memory allocated for that object could look something like this:
xxxx[4-bytes]xxxxx[4-bytes]xxxx[1-byte]xxx
(NOTE: In the above blockquote x corresponds to a memory location of size 1 byte. I also assumed that sizeOf(int) = 4-bytes and sizeOf(char) = 1- byte.)
So in the above one, we can see that the object a occupies more than 9-bytes (because there are some memory locations (x's) used for padding.)
So, does something like this happen?
Thanks for your replies!
P.S. Please let me know if I hadn't written something clearly.
When it comes to structs and classes, The layout is implementation-defined between compilers, os, and architecture... Some will use automatic alignment, others will use padding, some may even auto arrange it's members. If you need to know the size of a struct, use sizeof(Your Struct).
Here's a code snippet...
#include <iostream>
struct A {
char a;
float b;
int c;
};
struct B {
float a;
int b;
char c;
};
int main() {
std::cout << "Sizeof(A) = " << sizeof(A) << '\n';
std::cout << "Sizeof(B) = " << sizeof(B) << '\n';
return 0;
}
Output:
Sizeof(A) = 12
Sizeof(B) = 12
For my particular machine, I'm running Windows 7 - 64bit, It is an Intel Core2 Quad Extreme, and I'm using Visual Studio 2017 running it with C++17.
With my particular setup, both structures are being generated with a different layout, but have the same size in bytes.
In A's case...
char a; // 1 byte
// 3 bytes of padding
float b; // 4 bytes
int c; // 4 bytes (int is 32bit even on x64).
In B's case...
float a; // 4 bytes
int b; // 4 bytes
char c; // 1 byte
// 3 bytes of padding.
Also, your compiler flags and optimizations may have an effect. This isn't always guaranteed, as it is implementation-defined as stated in the standard.
--Edit--
Also, if you don't want this exact behavior there are some pragmas directives and macros such as pragma pack and alignas() that can be used to modify your implementation details. Here are a few references.
How to use alignas to replace pragma pack?
https://en.cppreference.com/w/cpp/preprocessor/impl
https://en.cppreference.com/w/cpp/language/alignas
https://learn.microsoft.com/en-us/cpp/cpp/alignment-cpp-declarations?view=msvc-160
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/pragma_pack.htm
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/packqua.htm
https://www.iditect.com/how-to/57426535.html
https://downloads.ctfassets.net/oxjq45e8ilak/1LriV4eAdhNlu9Zv06H9NJ/53576095f772b5f6cddbbedccb7ebd8a/Alexander_Titov_Know_your_hardware_CPU_memory_hierarchy.pdf
https://cpc110.blogspot.com/2020/10/vs2019-alignas-in-struct-definition.html
So, structure Object variables take contiguous memory locations and the variables are stored in memory in the same order in which they are defined.
Please refer to the code below you will get an idea.
struct c{
int a;
int b;
int c;
};
int main()
{
c obj;
cout << &(obj.a) << endl; //0x7ffe128fe1c4
cout << &(obj.b) << endl; //0x7ffe128fe1c8
cout << &(obj.c) << endl; //0x7ffe128fe1cc
return 0;
}
For Structures Padding Concept Refer here
Let's assume that A and B are two classes (or structures) having no inheritance relationships (thus, object slicing cannot work). I also have an object b of the type B. I would like to interpret its binary value as a value of type A:
A a = b;
I could use reinterpret_cast, but I would need to use pointers:
A a = reinterpret_cast<A>(b); // error: invalid cast
A a = *reinterpret_cast<A *>(&b); // correct [EDIT: see *footnote]
Is there a more compact way (without pointers) that does the same? (Including the case where sizeof(A) != sizeof(B))
Example of code that works using pointers: [EDIT: see *footnote]
#include <iostream>
using namespace std;
struct C {
int i;
string s;
};
struct S {
unsigned char data[sizeof(C)];
};
int main() {
C c;
c.i = 4;
c.s = "this is a string";
S s = *reinterpret_cast<S *>(&c);
C s1 = *reinterpret_cast<C *>(&s);
cout << s1.i << " " << s1.s << endl;
cout << reinterpret_cast<C *>(&s)->i << endl;
return 0;
}
*footnote: It worked when I tried it, but it is actually an undefined behavior (which means that it may work or not) - see comments below
No. I think there's nothing in the C++ syntax that allows you to implicitly ignore types. First, that's against the notion of static typing. Second, C++ lacks standardization at binary level. So, whatever you do to trick the compiler about the types you're using might be specific to a compiler implementation.
That being said, if you really wanna do it, you should check how your compiler's data alignment/padding works (i.e.: struct padding in c++) and if there's a way to control it (i.e.: What is the meaning of "__attribute__((packed, aligned(4))) "). If you're planning to do this across compilers (i.e.: with data transmitted across the network), then you should be extra careful. There are also platform issues, like different addressing models and endianness.
Yes, you can do it without a pointer:
A a = reinterpret_cast<A &>(b); // note the '&'
Note that this may be undefined behaviour. Check out the exact conditions at http://en.cppreference.com/w/cpp/language/reinterpret_cast
Giving simple structure (POD) containing only one array of shorts (bytes, ints from <cstdint>, etc) and no more fields will be added later:
#define FIXED_SIZE 128 // 'fixed' in long term, shouldn’t change in future versions
struct Foo {
uint16_t bar[FIXED_SIZE];
};
is it any possibility to end up with padding at the end of the structure added by compiler for any reason ?
It seems reasonable not to make any padding as it is no any obvious need of it, but is it any guarantees by standard (could you provide any links where it is explained)?
Later I would like to use arrays of Foo structs in simple serialization (IPC) within different platforms and don't want to use any libraries for this simple task (code simplified for demonstration):
#define FOO_ELEMS 1024
...
// sender
Foo *from = new Foo[FOO_ELEMS];
uint8_t *buff_to = new uint8_t[FOO_ELEMS * FIXED_SIZE * sizeof(uint16_t) ];
memcpy(buff_to, from, ...);
...
// receiver
uint8_t *buff_from = new uint8_t[ ... ];
Foo *to = new Foo[FOO_ELEMS];
memcpy(to, buff_from, ...);
I would like to use struct here instead of plain arrays as it will be some auxiliary methods within struct and it seems more convenient then to use plain functions + arrays pointers instead.
Intersects with this (plain C) question, but seems a little bit different for me:
Alignment of char array struct members in C standard
The various standards provide for padding to occur (but not at the start).
There is no strict requirement at all that it will only appear to align the members and the object in arrays.
So the truly conformant answer is:
Yes, there may be padding because the compiler can add it but not at the start or between array elements.
There is no standard way of forcing packing either.
However every time this comes up and every time I ask no one has ever identified a real compiler on a platform that pads structures for any other reason than for internal alignment and array alignment.
So for all know practical purposes that structure will not be packed on any known platform.
Please consider this yet another request for someone to find a real platform that breaks that principle.
Since we are already guaranteed that there will no padding at the beginning of the structure don't have to worry about that. At the end I could see padding being added if the sizeof of the array was not divisible by the word size of the machine.
The only way I could get any padding to be added to the struct though was to add an int member to the struct as well. In doing so the struct was padded to make them the same size.
#include <iostream>
#include <cstdint>
struct a
{
uint16_t bar[128];
};
struct b
{
uint16_t bar[127];
};
struct c
{
int test;
uint16_t bar[128];
};
struct d
{
int test;
uint16_t bar[127];
};
struct e
{
uint16_t bar[128];
int test;
};
struct f
{
uint16_t bar[127];
int test;
};
int main()
{
std::cout << sizeof(a) << "\t" << sizeof(b) << "\t" << sizeof(c) << "\t" << sizeof(d) << "\t" << sizeof(e) << "\t" << sizeof(f);
}
Live Example
Suppose there is non-POD struct below, is the alignment taken effect? If not, what would be expected?
struct S1
{
string s;
int32_t i;
double d;
} __attribute__ ((aligned (64)));
EDIT: the output of the sample code below is 64 even s is set to a long string.
int main(int argc,char *argv[])
{
S1 s1;
s1.s = "123451111111111111111111111111111111111111111111111111111111111111111111111111";
s1.i = 100;
s1.d = 20.123;
printf("%ld\n", sizeof(s1));
return 1;
}
Size of objects in C++ never changes. It's compile time constant. It has to be, because the compiler needs to know how much space to allocate for the object. In fact, sizeof(s1) is just alias for sizeof(S1) and no instance is involved with the later.
The std::string is a smallish object that wraps pointer to array of characters and reallocates that array to accommodate the value you set. So the string itself is stored outside the S1 object.
The __attribute__ ((aligned (64))) is not standard C++, it is GCC extension. It is honored even for non-POD objects. It simply tells the compiler to round the size of the structure up to next multiple of 64 bytes.
While it is honored on non-POD objects, I can't think of a reason to use it for one and can only think of very few reasons to use it for POD object. The default alignment is reasonable. You don't need to tweak it.
Yes, the alignment takes effect in GCC.
Example code:
#include <string>
#include <iostream>
#include <cstddef>
using namespace std;
struct S1
{
string s;
int i;
double d;
} __attribute__ ((aligned (64)));
struct S2
{
string s;
int i;
double d;
} __attribute__ ((aligned)); // let the compiler decide
struct S3
{
string s;
int i;
double d;
};
int main() {
cout << "S1 " << sizeof(S1) << endl;
cout << "S2 " << sizeof(S2) << endl;
cout << "S3 " << sizeof(S3) << endl;
}
Output:
S1 64
S2 16
S3 16
Also: You have observed that S1 is non-POD. Note that std::string is allowed to store its data externally (it usually does so because the data can be of arbitrary length, so a memory buffer must be allocated dynamically).
Remember that sizeof is only calculated during compilation, it can't depend on runtime values. Specifically, you can't ever query a size of dynamically allocated memory this way.
Also remember that every element in an array is always of the same type, so of the same size too.
Empirically the following works (gcc and VC++), but is it valid and portable code?
typedef struct
{
int w[2];
} A;
struct B
{
int blah[2];
};
void my_func(B b)
{
using namespace std;
cout << b.blah[0] << b.blah[1] << endl;
}
int main(int argc, char* argv[])
{
using namespace std;
A a;
a.w[0] = 1;
a.w[1] = 2;
cout << a.w[0] << a.w[1] << endl;
// my_func(a); // compiler error, as expected
my_func(reinterpret_cast<B&>(a)); // reinterpret, magic?
my_func( *(B*)(&a) ); // is this equivalent?
return 0;
}
// Output:
// 12
// 12
// 12
Is the reinterpret_cast valid?
Is the C-style cast equivalent?
Where the intention is to have the bits located at &a interpreted as a
type B, is this a valid / the best approach?
(Off topic: For those that want to know why I'm trying to do this, I'm dealing with two C libraries that want 128 bits of memory, and use structs with different internal names - much like the structs in my example. I don't want memcopy, and I don't want to hack around in the 3rd party code.)
In C++11, this is fully allowed if the two types are layout-compatible, which is true for structs that are identical and have standard layout. See this answer for more details.
You could also stick the two structs in the same union in previous versions of C++, which had some guarantees about being able to access identical data members (a "common initial sequence" of data members) in the same order for different structure types.
In this case, yes, the C-style cast is equivalent, but reinterpret_cast is probably more idiomatic.