C struct sizes inconsistence [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I find the size of a struct?
Struct varies in memory size?
I am using following struct for network communication, It creates lots of unnecessary bytes in between.
It gives different size than expected 8 Bytes.
struct HttpPacket {
unsigned char x1;
union {
struct {
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
unsigned char bytes[7];
unsigned long num;
}
And Following gives different size even though that I am removing a field from a union
struct HttpPacket {
unsigned char x1;
union {
struct {
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
unsigned long num;
}
Also, A more clear example
struct {
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
And it gives a size of 8, instead of 7.
And I add one more field, It still gives the same size
struct {
unsigned char EXTRAADDEDFIELD;
unsigned char len;
unsigned short host;
unsigned char content[4];
} packet;
Can someone please help on resolving this issue ?
UPDATE: I need the format to hold while transmitting the packet, So I want to skip these paddings

C makes no guarantees on the size of a struct. The compiler is allowed to line up the members however it wants. Usually, as in this case, it will make the size word-aligned since that's fastest on most machines.

Ever heard of alignment and padding?
Basically, to ensure fast access, certain types have to be on certain bounds of memory addresses.
This is called alignment.
To achieve that, the compiler is allowed to insert bytes into your data structure to achieve that alignment.
This is called padding.

By default, structure fields are aligned on natural boundaries. For example, a 4-byte field will start on a 4-byte boundary. The compiler inserts pad bytes to achieve this. You can avoid the padding by using #pragma pack(0) or other similar compiler directives

If you have a C99 compiler and can use the "new" fixed-width types: make an array of uint8_t and do the separation in members yourself.
uint8_t data[8];
x1 = data[0];
len = data[1];
host = data[2] * 256 + data[3]; /* big endian */
content[0] = data[4];
content[1] = data[5];
content[2] = data[6];
content[3] = data[7];
/* ... */
You can follow the same procedure in C89 if you can rely on CHAR_BIT being 8.

Related

Reserving a bit for discriminating the type of a union in C++

I currently have code that looks like this:
union {
struct {
void* buffer;
uint64_t n : 63;
uint64_t flag : 1;
} a;
struct {
unsigned char buffer[15];
unsigned char n : 7;
unsigned char flag : 1;
} b;
} data;
It is part of an attempted implementation of a data structure that does small-size optimization. Although it works on my machine with the compiler I am using, I am aware that there is no guarantee that the two flag bits from each of the structs actually end up in the same bit. Even if they did, it would still technically be undefined behavior to read it from the struct that wasn't most recently written. I would like to use this bit to discriminate between which of the two types is currently stored.
Is there a safe and portable way to achieve the same thing without increasing the size of the union? For our purpose, it can not be larger than 16 bytes.
If not, could it be achieved by sacrificing an entire byte (of n in the first struct and of buffer in the second), instead of a bit?

Put data from a QByteArray in a Struct

I'm working with a serial device that returns a byte array.
In this array are values that are stored in unsigned shorts and unsigned chars.
I have the following structure:
typedef struct {
unsigned short RPM; //0
unsigned short Intakepress; //1
unsigned short PressureV; //2
unsigned short ThrottleV; //3
unsigned short Primaryinp; //4
unsigned short Fuelc; //5
unsigned char Leadingign; //6
unsigned char Trailingign; //7
unsigned char Fueltemp; //8
unsigned char Moilp; //9
unsigned char Boosttp; //10
unsigned char Boostwg; //11
unsigned char Watertemp; //12
unsigned char Intaketemp; //13
unsigned char Knock; //14
unsigned char BatteryV; //15
unsigned short Speed; //16
unsigned short Iscvduty; //17
unsigned char O2volt; //18
unsigned char na1; //19
unsigned short Secinjpulse; //20
unsigned char na2; //21
} fc_adv_info_t;
what's the best way to map the array to this structure? The order in the array received from the serial device matches the structure.
First of all, your description of the type of data in the structure using C-like syntax is ambiguous. It tells us nothing about the size of a short or char type, nor about the endianness of the data! A short int doesn't have to be 16 bits wide, neither is char always 8 bits! At the very least, you should use the fixed width integer types, or their Qt equivalents, and specify their endianness.
Also, typedef struct is a C-ism, unnecessary in C++. Drop the typedef.
Assuming a big endian packet, unsigned short to mean uint16_t and unsigned char to mean uint8_t, here is how you could do it:
struct FcAdvInfo { // this structure shouldn't be packed or anything like that!
quint16 RPM;
quint16 IntakePress;
...
quint8 LeadingIgn;
...
FcAdvInfo parse(const QByteArray &);
};
FcAdvInfo FcAdvInfo::parse(const QByteArray & src) {
FcAdvInfo p;
QDataStream ds(src);
ds.setByteOrder(QDataStream::BigEndian);
ds
>> p.RPM
>> p.IntakePress
...
>> p.LeadingIgn
...
;
return p;
}
Finally, if your struct comes from some C code, you must understand that it's not portable, and even on the same CPU, if you upgrade the compiler, the packing and the size of structure types can and will change! So don't do it. A C/C++ struct declaration implies nothing about how the data is arranged in memory, other than the chosen arrangement doesn't lead to undefined behavior, and must agree with other requirements of the standard (there are just a few). That's all, pretty much.
First, I would say that is is not safe to pack unsigned short type in structures that you are going to serialize/deserialize and exchange with other devices: unsigned short is usually 16-bit, but you can't take that as guaranteed, it is platform dependent.
It is even worse if struct members are not aligned so that compiler inserts paddings in the struct.
If binary data received from serial port is kept in QByteArray and byte order and "unsigned short" types are ok then to map a data in QByteArray on the struct you can use the code below. Note, that it is only correct if your struct is packed and has no padding gaps within it, use struct packing technique for your compiler (see Structure padding and packing).
QByteArray bArr;
bArr.resize(sizeof(fc_adv_info_t));
// do something to fill bArr with received data
fc_adv_info_t* info=reinterpret_cast<fc_adv_info_t*>(bArr.data());

Working with individual bytes using unsigned char arrays

I've searched through many sites and can not seem to find anything relevant.
I would like to be able to take the individual bytes of each default data types such as short, unsigned short, int, unsigned int, float and double, and to store each individual byte information(binary part) into each index of the unsigned char array. How can this be achieved?
For example:
int main() {
short sVal = 1;
unsigned short usVal = 2;
int iVal = 3;
unsigned int uiVal = 4;
float fVal = 5.0f;
double dVal = 6.0;
const unsigned int uiLengthOfShort = sizeof(short);
const unsigned int uiLengthOfUShort = sizeof(unsigned short);
const unsigned int uiLengthOfInt = sizeof(int);
const unsigned int uiLengthOfUInt = sizeof(unsigned int);
const unsigned int uiLengthOfFloat = sizeof(float);
const unsigned int uiLengthOfDouble = sizeof(double);
unsigned char ucShort[uiLengthOfShort];
unsigned char ucUShort[uiLengthOfUShort];
unsigned char ucInt[uiLengthOfInt];
unsigned char ucUInt[uiLengthOfUInt];
unsigned char ucFloat[uiLengthOfFloat];
unsigned char ucDouble[uiLengthOfDouble];
// Above I declared a variable val for each data type to work with
// Next I created a const unsigned int of each type's size.
// Then I created unsigned char[] using each data types size respectively
// Now I would like to take each individual byte of the above val's
// and store them into the indexed location of each unsigned char array.
// For Example: - I'll not use int here since the int is
// machine and OS dependent.
// I will use a data type that is common across almost all machines.
// Here I will use the short as my example
// We know that a short is 2-bytes or has 16 bits encoded
// I would like to take the 1st byte of this short:
// (the first 8 bit sequence) and to store it into the first index of my unsigned char[].
// Then I would like to take the 2nd byte of this short:
// (the second 8 bit sequence) and store it into the second index of my unsigned char[].
// How would this be achieved for any of the data types?
// A Short in memory is 2 bytes here is a bit representation of an
// arbitrary short in memory { 0101 1101, 0011 1010 }
// I would like ucShort[0] = sVal's { 0101 1101 } &
// ucShort[1] = sVal's { 0011 1010 }
ucShort[0] = sVal's First Byte info. (8 Bit sequence)
ucShort[1] = sVal's Second Byte info. (8 Bit sequence)
// ... and so on for each data type.
return 0;
}
Ok, so first, don't do that if you can avoid it. Its dangerous and can be extremely dependent on architecture.
The commentators above are correct, union is the safest way to do it, you have the endian problem still, yes, but at least you don't have the stack alignment problem (I assume this is for network code, so stack-alignment is another potential architecture problem)
This is what I've found to be the most straight-forward way to do this:
uint32_t example_int;
char array[4];
//No endian switch
array[0] = ((char*) &example_int)[0];
array[1] = ((char*) &example_int)[1];
array[2] = ((char*) &example_int)[2];
array[3] = ((char*) &example_int)[3];
//Endian switch
array[0] = ((char*) &example_int)[3];
array[1] = ((char*) &example_int)[2];
array[2] = ((char*) &example_int)[1];
array[3] = ((char*) &example_int)[0];
If you're trying to write cross-architecture code, you will need to deal with endian problems one way or another. My suggestion is to construct a short endian test and build functions to "pack" and "unpack" byte arrays based on the above method. It should be noted that to "unpack" a byte array, simply reverse the above assignment statements.
The simplest correct way is:
// static_assert(sizeof ucShort == sizeof sVal);
memcpy( &ucShort, &sVal, sizeof ucShort);
The stuff you write in comments is not correct; all types have machine-dependent size, other than character types.
With the help of Raw N by providing me a website, I did a search on byte manipulation and found this thread - http://www.cplusplus.com/forum/articles/12/ and it presents a similar solution towards what I am looking for, however I would have to repeat this process for every default data type.
After doing some testing this is what I have come up with so far and this is dependent on machine architecture, but to do this on other machines the concept is the same.
typedef struct packed_2bytes {
unsigned char c0;
unsigned char c1;
} packed_2bytes;
typedef struct packed_4bytes {
unsigned char c0;
unsigned char c1;
unsigned char c2;
unsigned char c3;
} packed_4bytes;
typedef struct packed_8bytes {
unsigned char c0;
unsigned char c1;
unsigned char c2;
unsigned char c3;
unsigned char c4;
unsigned char c5;
unsigned char c6;
unsigned char c7;
} packed_8bytes;
typedef union {
short s;
packed_2bytes bytes;
} packed_short;
typedef union {
unsigned short us;
packed_2bytes bytes;
} packed_ushort;
typedef union { // 32bit machine, os, compiler only
int i;
packed_4bytes bytes;
} packed_int;
typedef union { // 32 bit machine, os, compiler only
unsigned int ui;
packed_4bytes bytes;
} packed_uint;
typedef union {
float f;
packed_4bytes bytes;
} packed_float;
typedef union {
double d;
packed_8bytes bytes;
} packed_double;
There is no implementation of use only the declarations or definitions to these types. I do think that they should contain which ever endian is being used, but the person who is using them has to know this ahead of time just as knowing the machines architectures sizes for each of the default types. I am not sure if there would be a problem with signed int or not due to one's, two's compliment or signed bit implementations, but it could also be something to consider.

Limiting structures size by use of :

Why this piece of code is needed ?
typedef struct corr_id_{
unsigned int size:8;
unsigned int valueType:8;
unsigned int classId:8;
unsigned int reserved:8;
} CorrId;
I did some investigation around it and found that this way we are limiting the memory consumption to just what we need.
For E.g.
typedef struct corr_id_new{
unsigned int size;
unsigned int valueType;
unsigned int classId;
unsigned int reserved;
} CorrId_NEW;
typedef struct corr_id_{
unsigned int size:8;
unsigned int valueType:8;
unsigned int classId:8;
unsigned int reserved:8;
} CorrId;
int main(){
CorrId_NEW Obj1;
CorrId Obj2;
std::cout<<sizeof(Obj1)<<endl;
std::cout<<sizeof(Obj2)<<endl;
}
Output:-
16
4
I want to understand the real use case of such scenarios? why can't we declare the struct something like this,
typedef struct corr_id_new{
unsigned _int8 size;
unsigned _int8 valueType;
unsigned _int8 classId;
unsigned _int8 reserved;
} CorrId_NEW;
Does this has something to do with compiler optimizations? Or, what are the benefits of declaring the structure that way?
I want to understand the real use case of such scenarios?
For example, structure of status register of some CPU may look like this:
In order to represent it via structure, you could use bitfield:
struct CSR
{
unsigned N: 1;
unsigned Z: 1;
unsigned C: 1;
unsigned V: 1;
unsigned : 20;
unsigned I: 1;
unsigned : 2;
unsigned M: 5;
};
You can see here that fields are not multiplies of 8, so you can't use int8_t, or something similar.
Lets see a simple scenario,
typedef struct student{
unsigned int age:8; // max 8-bits is enough to store a students's age 255 years
unsigned int roll_no:16; //max roll_no can be 2^16, which long enough
unsigned int classId:4; //class ID can be 4-bits long (0-15), as per need.
unsigned int reserved:4; // reserved
};
Above case all work is done in 32-bits only.
But if you use just a integer it would have taken 4*32 bits.
If we take age as 32-bit integer, It can store in range of 0 to 2^32. But don't forget a normal person's age is just max 100 or 140 or 150 (even somebody studying in this age also), which needs max 8-bits to store, So why to waste remaining 24-bits.
You are right, the last structure definition with unsigned _int8 is almost equivalent to the definition using :8. Almost, because byte order can make a difference here, so you might find that the memory layout is reversed in the two cases.
The main purpose of the :8 notation is to allow the use of fractional bytes, as in
struct foo {
uint32_t a:1;
uint32_t b:2;
uint32_t c:3;
uint32_t d:4;
uint32_t e:5;
uint32_t f:6;
uint32_t g:7;
uint32_t h:4;
}
To minimize padding, I strongly suggest to learn the padding rules yourself, they are not hard to grasp. If you do, you can know that your version with unsigned _int8 does not add any padding. Or, if you don't feel like learning those rules, just use __attribute__((__packed__)) on your struct, but that may introduce a severe performance penalty.
It's often used with pragma pack to create bitfields with labels, e.g.:
#pragma pack(0)
struct eg {
unsigned int one : 4;
unsigned int two : 8;
unsigned int three : 16
};
Can be cast for whatever purpose to an int32_t, and vice versa. This might be useful when reading serialized data that follows a (language agnostic) protocol -- you extract an int and cast it to a struct eg to match the fields and field sizes defined in the protocol. You could also skip the conversion and just read an int sized chunk into such a struct, point being that the bitfield sizes match the protocol field sizes. This is extremely common in network programming -- if you want to send a packet following the protocol, you just populate your struct, serialize, and transmit.
Note that pragma pack is not standard C but it is recognized by various common compilers. Without pragma pack, however, the compiler is free to place padding between fields, reducing the use value for the purposes described above.

why add fillers in a c++ struct?

What are the effect of fillers in a c++ struct? I often see them in some c++ api. For example:
struct example
{
unsigned short a;
unsigned short b;
char c[3];
char filler1;
unsigned short e;
char filler2;
unsigned int g;
};
This struct is meant to transport through network
struct example
{
unsigned short a; //2 bytes
unsigned short b;//2 bytes
//4 bytes consumed
char c[3];//3 bytes
char filler1;//1 bytes
//4 bytes consumed
unsigned short e;//2 bytes
char filler2;//1 bytes
//3 bytes consumed ,should be filler[2]
unsigned int g;//4 bytes
};
Because sometimes you don't actually control the format of the data you're using.
The format may be specified by something beyond your control. For example, it may be created in a system with different alignment requirements to yours.
Alternatively, the data may have real data in those filler areas that your code doesn't care about.
Those fillers are usually inserted to explicitly make sure some of the members of a structure are naturally aligned i.e. their offset inside a structure is a multiple of its size.
In the example below assuming char is 1 bytes, short is 2 and int is 4.
struct example
{
unsigned short a;
unsigned short b;
char c[3];
char filler1;
unsigned short e; // starts at offset 8
char filler2[2];
unsigned int g; // starts at offset 12
};
If you don't specify any fillers, a compiler will usually add the necessary padding bytes to ensure a proper alignment of the structure members.
Btw, these fields can also be used for reserved fields that might appear in the future.
updated:
Since it has been mentioned that a structure is a network packet, the fillers are required to get a structure that is compatible with the one being passed from another host.
However, inserting filler bytes in this case might not be enough (especially, if portability is required). If these structures are to be sent via a network as is (i.e. without manually packing into a separate buffer for sending), you have to inform a compiler that the structure should be packed.
In microsoft compiler this can be achieved using #pragma pack:
#pragma pack(1)
struct T {
char t;
int i;
short j;
double k;
};
In gcc you can use __attribute__((packed))
struct foo {
char c;
int x;
} __attribute__((packed));
However, many people prefer to manually pack/unpack structures int a raw-byte array, because accessing misaligned data on some systems might not be [properly] supported.
Depending on what code you're working with they may be attempting to align the structure on word boundries (32 bit in your case), this is a speed optimization, however, doing things like this has been rendered obsolete by decent optimizing compilers, however if the compiler was instructed not to optimize this piece of code, or the compiler is very low-end e.g. for an embedded system, it may be better to handle this yourself. It basically boils downto how much you trust the compiler.
The other reason is for writing binary files, where reserved bytes have been left in the file format specification.