Giving simple structure (POD) containing only one array of shorts (bytes, ints from <cstdint>, etc) and no more fields will be added later:
#define FIXED_SIZE 128 // 'fixed' in long term, shouldn’t change in future versions
struct Foo {
uint16_t bar[FIXED_SIZE];
};
is it any possibility to end up with padding at the end of the structure added by compiler for any reason ?
It seems reasonable not to make any padding as it is no any obvious need of it, but is it any guarantees by standard (could you provide any links where it is explained)?
Later I would like to use arrays of Foo structs in simple serialization (IPC) within different platforms and don't want to use any libraries for this simple task (code simplified for demonstration):
#define FOO_ELEMS 1024
...
// sender
Foo *from = new Foo[FOO_ELEMS];
uint8_t *buff_to = new uint8_t[FOO_ELEMS * FIXED_SIZE * sizeof(uint16_t) ];
memcpy(buff_to, from, ...);
...
// receiver
uint8_t *buff_from = new uint8_t[ ... ];
Foo *to = new Foo[FOO_ELEMS];
memcpy(to, buff_from, ...);
I would like to use struct here instead of plain arrays as it will be some auxiliary methods within struct and it seems more convenient then to use plain functions + arrays pointers instead.
Intersects with this (plain C) question, but seems a little bit different for me:
Alignment of char array struct members in C standard
The various standards provide for padding to occur (but not at the start).
There is no strict requirement at all that it will only appear to align the members and the object in arrays.
So the truly conformant answer is:
Yes, there may be padding because the compiler can add it but not at the start or between array elements.
There is no standard way of forcing packing either.
However every time this comes up and every time I ask no one has ever identified a real compiler on a platform that pads structures for any other reason than for internal alignment and array alignment.
So for all know practical purposes that structure will not be packed on any known platform.
Please consider this yet another request for someone to find a real platform that breaks that principle.
Since we are already guaranteed that there will no padding at the beginning of the structure don't have to worry about that. At the end I could see padding being added if the sizeof of the array was not divisible by the word size of the machine.
The only way I could get any padding to be added to the struct though was to add an int member to the struct as well. In doing so the struct was padded to make them the same size.
#include <iostream>
#include <cstdint>
struct a
{
uint16_t bar[128];
};
struct b
{
uint16_t bar[127];
};
struct c
{
int test;
uint16_t bar[128];
};
struct d
{
int test;
uint16_t bar[127];
};
struct e
{
uint16_t bar[128];
int test;
};
struct f
{
uint16_t bar[127];
int test;
};
int main()
{
std::cout << sizeof(a) << "\t" << sizeof(b) << "\t" << sizeof(c) << "\t" << sizeof(d) << "\t" << sizeof(e) << "\t" << sizeof(f);
}
Live Example
Related
Consider the following structure:
struct S
{
int a;
int b;
double arr[0];
} __attribute__((packed));
As you can see, this structure is packed and has Zero sized array at the end.
I'd like to send this as binary data over the network (assume I took care of endianity).
In C/C++ I could just use malloc to allocate as much space as I want and use free later.
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
Sure, there is:
shared_ptr<S> make_buffer(size_t s)
{
auto buffer = malloc(s); // allocate as usual
auto release = [](void* p) { free(p); }; // a deleter
shared_ptr<void> sptr(buffer, release); // make it shared
return { sptr, new(buffer) S }; // an aliased pointer
}
This works with any objects that are placed in a malloced buffer, not just when there are zero-sized arrays, provided that the destructor is trivial (performs no action) because it is never called.
The usual caveats about zero-sized arrays and packed structures still apply as well, of course.
double arr[0];
} __attribute__((packed));
Zero sized arrays are not allowed as data members (nor as any other variable) in C++. Furthermore, there is no such attribute as packed in C++; it is a language extension (as such, it may be considered to be a special hack). __attribute__ itself is a language extension. The standard syntax for function attributes uses nested square brackets like this: [[example_attribute]].
I'd like to send this as binary data over the network
You probably should properly serialise the data. There are many serialisation specifications although none of them is universally ubiquitous and none of them is implemented in the C++ standard library. Indeed, there isn't a standard API for network commnication either.
A straightforward solution is to pick an existing serialisation format and use an existing library that implements it.
First, let me explain again why I have this packed structure:
it is used for serialization of data over the network so there's a header file with all network packet structures.
I know it generates bad assembly due to alignment issues, but I guess that this problem persists with regular serialization (copy to char * buffer with memcpy).
Zero size arrays are supported both by gcc and clang which I use.
Here's an example of a full program with a solution to my question and it's output (same output for gcc and g++).
compiled with -O3 -std=c++17 flags
#include <iostream>
#include <memory>
#include <type_traits>
#include <cstddef>
struct S1
{
~S1() {std::cout << "deleting s1" << std::endl;}
char a;
int b;
int c[0];
} __attribute__((packed));
struct S2
{
char a;
int b;
int c[0];
};
int main(int argc, char **argv)
{
auto s1 = std::shared_ptr<S1>(static_cast<S1 *>(::operator
new(sizeof(S1) + sizeof(int) * 1e6)));
std::cout << "is standart: " << std::is_standard_layout<S1>::value << std::endl;
for (int i = 0; i < 1e6; ++i)
{
s1->c[i] = i;
}
std::cout << sizeof(S1) << ", " << sizeof(S2) << std::endl;
std::cout << offsetof(S1, c) << std::endl;
std::cout << offsetof(S2, c) << std::endl;
return 0;
}
This is the output:
is standart: 1
5, 8
5
8
deleting s1
Is there anything wrong with doing this?
I made sure using valgrind all allocations/frees work properly.
I was learning about the structures in C++ and got to know that if a structure in C++ has 3 variables (let each one be of some data type), then all of them are not allocated in a contiguous fashion. Is this correct?
If yes, then how much memory would be allocated to an object to that structure type?
For e.g. Let's say we have a structure like this:
struct a
{
int x;
int y;
char c;
};
Now how intuitively an object of type a, must occupy some space = sizeOf(int) + sizeOf(int) + sizeOf(char). But, if they are not allocated continuously, there could be some memory locations allocated for that object, that are just present for providing some padding - i.e. the memory allocated for that object could look something like this:
xxxx[4-bytes]xxxxx[4-bytes]xxxx[1-byte]xxx
(NOTE: In the above blockquote x corresponds to a memory location of size 1 byte. I also assumed that sizeOf(int) = 4-bytes and sizeOf(char) = 1- byte.)
So in the above one, we can see that the object a occupies more than 9-bytes (because there are some memory locations (x's) used for padding.)
So, does something like this happen?
Thanks for your replies!
P.S. Please let me know if I hadn't written something clearly.
When it comes to structs and classes, The layout is implementation-defined between compilers, os, and architecture... Some will use automatic alignment, others will use padding, some may even auto arrange it's members. If you need to know the size of a struct, use sizeof(Your Struct).
Here's a code snippet...
#include <iostream>
struct A {
char a;
float b;
int c;
};
struct B {
float a;
int b;
char c;
};
int main() {
std::cout << "Sizeof(A) = " << sizeof(A) << '\n';
std::cout << "Sizeof(B) = " << sizeof(B) << '\n';
return 0;
}
Output:
Sizeof(A) = 12
Sizeof(B) = 12
For my particular machine, I'm running Windows 7 - 64bit, It is an Intel Core2 Quad Extreme, and I'm using Visual Studio 2017 running it with C++17.
With my particular setup, both structures are being generated with a different layout, but have the same size in bytes.
In A's case...
char a; // 1 byte
// 3 bytes of padding
float b; // 4 bytes
int c; // 4 bytes (int is 32bit even on x64).
In B's case...
float a; // 4 bytes
int b; // 4 bytes
char c; // 1 byte
// 3 bytes of padding.
Also, your compiler flags and optimizations may have an effect. This isn't always guaranteed, as it is implementation-defined as stated in the standard.
--Edit--
Also, if you don't want this exact behavior there are some pragmas directives and macros such as pragma pack and alignas() that can be used to modify your implementation details. Here are a few references.
How to use alignas to replace pragma pack?
https://en.cppreference.com/w/cpp/preprocessor/impl
https://en.cppreference.com/w/cpp/language/alignas
https://learn.microsoft.com/en-us/cpp/cpp/alignment-cpp-declarations?view=msvc-160
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/pragma_pack.htm
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/packqua.htm
https://www.iditect.com/how-to/57426535.html
https://downloads.ctfassets.net/oxjq45e8ilak/1LriV4eAdhNlu9Zv06H9NJ/53576095f772b5f6cddbbedccb7ebd8a/Alexander_Titov_Know_your_hardware_CPU_memory_hierarchy.pdf
https://cpc110.blogspot.com/2020/10/vs2019-alignas-in-struct-definition.html
So, structure Object variables take contiguous memory locations and the variables are stored in memory in the same order in which they are defined.
Please refer to the code below you will get an idea.
struct c{
int a;
int b;
int c;
};
int main()
{
c obj;
cout << &(obj.a) << endl; //0x7ffe128fe1c4
cout << &(obj.b) << endl; //0x7ffe128fe1c8
cout << &(obj.c) << endl; //0x7ffe128fe1cc
return 0;
}
For Structures Padding Concept Refer here
Let's say I have the following
struct MyType { long a, b, c; char buffer[remainder] }
I wanted to do something like
char buffer[4096 - offsetof(MyType, buffer)]
But it appears that it's illegal
You can do:
struct ABC {long a,b,c; }
struct MyType : ABC {char buffer[4096-sizeof(ABC)];};
static_assert(sizeof(MyType)==4096,"!");
Your problem stems from trying to use the not-yet-fully-defined MyType type while defining it. You could do this with a union:
#include <iostream>
struct MyType {
union {
struct { long a, b, c; } data;
char buffer[4096];
};
};
static_assert(sizeof(MyType) == 4096, "MyType size should be exactly 4K");
int main() {
MyType x;
x.data.a = 42;
std::cout << sizeof(x) << " " << x.data.a << "\n";
return 0;
}
The output (on my system):
4096 42
Because it's a union, the type actually holds the a/b/c tuple and buffer area in an overlapped region of memory, big enough to hold the larger of the two. So, unless your long variable are really wide, that will be the 4K buffer area :-)
In any case, that size requirement is checked by the static_assert.
That may be less than ideal as buffer takes up the entire 4K. If instead you want to ensure that buffer is only the rest of the structure (after the long variables), you can use the following:
struct MyType {
long a, b, c;
char buffer[4096 - 3 * sizeof(long)];
};
and ensure that you use x.something rather than x.data.something when accessing the a, b, or c variables.
This solves your problem by using the size of three longs (these are fully defined) instead of the size of something not yet defined. It's still a good idea to keep the static_assert to ensure overall size is what you wanted.
Technically, the compiler has total control over padding and layout. A union/struct combo combined with a static_assert sanity check might be enough for government work, but std::aligned_storage is also there to give you memory blocks that are safe to put objects in.
struct MyType {
long a, b, c;
};
using MyTypeStorage = std::aligned_storage<4096, std::alignment_of<MyType>::value>::type;
/* ... */
MyTypeStorage myTypeStorage;
MyType* x = new (&myTypeStorage) MyType {};
https://godbolt.org/z/87e7Tc
I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!
Sorry for a stupid question, but if I need to ensure the alignment of a structure / class / union, should I add attribute((aligned(align))) to typedef declaration?
class myAlignedStruct{} __attribute__ ((aligned(16)));
typedef myAlignedStruct myAlignedStruct2; // Will myAlignedStruct2 be aligned by 16 bytes or not?
should I add attribute((aligned(align))) to typedef declaration?
No... typedefs are just pseudonyms or aliases for the actual type specified, they don't exist as a separate type to have different alignment, packing etc..
#include <iostream>
struct Default_Alignment
{
char c;
};
struct Align16
{
char c;
} __attribute__ ((aligned(16)));
typedef Align16 Also_Align16;
int main()
{
std::cout << __alignof__(Default_Alignment) << '\n';
std::cout << __alignof__(Align16) << '\n';
std::cout << __alignof__(Also_Align16) << '\n';
}
Output:
1
16
16
The accepted answer ("No") is correct, but I wanted to clarify one potentially misleading part of it. I'd add a comment but need to format some code; hence the new answer.
typedefs are just pseudonyms or aliases for the actual type specified, they don't exist as a separate type to have different alignment, packing etc..
This is incorrect, at least for GCC (the OP's compiler), and GHS. For example, the following compiles without errors, showing that the alignment can be attached to a typedef.
The perverse alignment (greater than the size of the object) is merely for shock and amusement value.
#define CASSERT( expr ) { typedef char cassert_type[(expr) ? 1 : -1]; }
typedef __attribute__((aligned(64))) uint8_t aligned_uint8_t;
typedef struct
{
aligned_uint8_t t;
} contains_aligned_char_t;
void check_aligned_char_semantics()
{
CASSERT(__alignof(aligned_uint8_t) == 64);
CASSERT(sizeof(aligned_uint8_t) == 1);
CASSERT(sizeof(contains_aligned_char_t) == 64);
}