C++ struct containing unsigned char and int bug

C++ struct containing unsigned char and int bug - c++

Ok i have a struct in my C++ program that is like this:
struct thestruct
{
unsigned char var1;
unsigned char var2;
unsigned char var3[2];
unsigned char var4;
unsigned char var5[8];
int var6;
unsigned char var7[4];
};
When i use this struct, 3 random bytes get added before the "var6", if i delete "var5" it's still before "var6" so i know it's always before the "var6".
But if i remove the "var6" then the 3 extra bytes are gone.
If i only use a struct with a int in it, there is no extra bytes.
So there seem to be a conflict between the unsigned char and the int, how can i fix that?

The compiler is probably using its default alignment option, where members of size x are aligned on a memory boundary evenly divisible by x.
Depending on your compiler, you can affect this behaviour using a #pragma directive, for example:
#pragma pack(1)
will turn off the default alignment in Visual C++:
Specifies the value, in bytes, to be used for packing. The default value for n is 8. Valid values are 1, 2, 4, 8, and 16. The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.
Note that for low-level CPU performance reasons, it is usually best to try to align your data members so that they fall on an aligned boundary. Some CPU architectures require alignment, while others (such as Intel x86) tolerate misalignment with a decrease in performance (sometimes quite significantly).

Your data structure being aligned so that your int falls on word boundries, which for your target might be 32 or 64 bits.
You can reorganize your struct like so so that this won't happen:
struct thestruct
{
int var6;
unsigned char var1;
unsigned char var2;
unsigned char var3[2];
unsigned char var4;
unsigned char var5[8];
unsigned char var7[4];
};

Are you talking about padding bytes? That's not a bug. As allowed by the C++ standard, the compiler is adding padding to keep the members aligned. This is required for some architectures, and will greatly improve performance for others.

You're having a byte alignment problem. The compiler is adding padding to align the bytes. See this wikipedia article.

Read up on data structure alignment. Essentially, depending on the compiler and compile options, you'll get alignment onto different powers-of-2.
To avoid it, move multi-byte items (int or pointers) before single-byte (signed or unsigned char) items -- although it might still be there after your last item.

While rearranging the order you declare data members inside your struct is fine, it should be emphasized that overriding the default alignment by using #pragmas and such is a bad idea unless you know exactly what you're doing. Depending on your compiler and architecture, attempting to access unaligned data, particularly by storing the address in a pointer and later trying to dereference it, can easily give the dreaded Bus Error or other undefined behavior.

Related

what does colon used in struct def mean in C++? [duplicate]

Is bitfield a C concept or C++?
Can it be used only within a structure? What are the other places we can use them?
AFAIK, bitfields are special structure variables that occupy the memory only for specified no. of bits. It is useful in saving memory and nothing else. Am I correct?
I coded a small program to understand the usage of bitfields - But, I think it is not working as expected. I expect the size of the below structure to be 1+4+2 = 7 bytes (considering the size of unsigned int is 4 bytes on my machine), But to my surprise it turns out to be 12 bytes (4+4+4). Can anyone let me know why?
#include <stdio.h>
struct s{
unsigned int a:1;
unsigned int b;
unsigned int c:2;
};
int main()
{
printf("sizeof struct s = %d bytes \n",sizeof(struct s));
return 0;
}
OUTPUT:
sizeof struct s = 12 bytes

Because a and c are not contiguous, they each reserve a full int's worth of memory space. If you move a and c together, the size of the struct becomes 8 bytes.
Moreover, you are telling the compiler that you want a to occupy only 1 bit, not 1 byte. So even though a and c next to each other should occupy only 3 bits total (still under a single byte), the combination of a and c still become word-aligned in memory on your 32-bit machine, hence occupying a full 4 bytes in addition to the int b.
Similarly, you would find that
struct s{
unsigned int b;
short s1;
short s2;
};
occupies 8 bytes, while
struct s{
short s1;
unsigned int b;
short s2;
};
occupies 12 bytes because in the latter case, the two shorts each sit in their own 32-bit alignment.

1) They originated in C, but are part of C++ too, unfortunately.
2) Yes, or within a class in C++.
3) As well as saving memory, they can be used for some forms of bit twiddling. However, both memory saving and twiddling are inherently implementation dependent - if you want to write portable software, avoid bit fields.

Its C.
Your comiler has rounded the memory allocation to 12 bytes for alignment purposes. Most computer memory syubsystems can't handle byte addressing.

Your program is working exactly as I'd expect. The compiler allocates adjacent bitfields into the same memory word, but yours are separated by a non-bitfield.
Move the bitfields next to each other and you'll probably get 8, which is the size of two ints on your machine. The bitfields would be packed into one int. This is compiler specific, however.
Bitfields are useful for saving space, but not much else.

Bitfields are widely used in firmware to map different fields in registers. This save a lot of manual bitwise operations which would have been necessary to read / write fields without it.
One disadvantage is you can't take address of bitfields.

Why are the values returned by sizeof() compiler dependent?

struct A
{
char c;
double d;
} a;
In mingw32-gcc.exe: sizeof a = 16
In gcc 4.6.3(ubuntu): sizeof a = 12
Why they are different? I think it should be 16, does gcc4.6.3 do some optimizations?

Compilers might perform data structure alignment for a target architecture if needed. It might done purely to improve runtime performance of the application, or in some cases is required by the processor (i.e. the program will not work if data is not aligned).
For example, most (but not all) SSE2 instructions require data to aligned on 16-byte boundary. To put it simply, everything in computer memory has an address. Let's say we have a simple array of doubles, like this:
double data[256];
In order to use SSE2 instructions that require 16-byte alignment, one must make sure that address of &data[0] is multiple of 16.
The alignment requirements differ from one architecture to another. On x86_64, it is recommended that all structures larger than 16 bytes align on 16-byte boundaries. In general, for the best performance, align data as follows:
Align 8-bit data at any address
Align 16-bit data to be contained within an aligned four-byte word
Align 32-bit data so that its base address is a multiple of four
Align 64-bit data so that its base address is a multiple of eight
Align 80-bit data so that its base address is a multiple of sixteen
Align 128-bit data so that its base address is a multiple of sixteen
Interestingly enough, most x86_64 CPUs would work with both aligned and non-aligned data. However, if the data is not aligned properly, CPU executes code significantly slower.
When compiler takes this into consideration, it may align members of the structure implicitly and that would affect its size. For example, let's say we have a structure like this:
struct A {
char a;
int b;
};
Assuming x86_64, the size of int is 32-bit or 4 bytes. Therefore, it is recommended to always make address of b a multiple of 4. But because a field size is only 1 byte, this won't be possible. Therefore, compiler would add 3 bytes of padding in between a and b implicitly:
struct A {
char a;
char __pad0[3]; /* This would be added by compiler,
without any field names - __pad0 is for
demonstration purposes */
int b;
};
How compiler does it depends not only on compiler and architecture, but on compiler settings (flags) you pass to the compiler. This behavior can also be affected using special language constructs. For example, one can ask the compiler to not perform any padding with packed attribute like this:
struct A {
char a;
int b;
} __attribute__((packed));
In your case, mingw32-gcc.exe has simply added 7 bytes between c and d to align d on 8 byte boundary. Whereas gcc 4.6.3 on Ubuntu has added only 3 to align d on 4 byte boundary.
Unless you are performing some optimizations, trying to use special extended instruction set, or have specific requirements for your data structures, I'd recommend you do not depend on specific compiler behavior and always assume that not only your structure might get padded, it might get padded differently between architectures, compilers and/or different compiler versions. Otherwise you'd need to semi-manually ensure data alignment and structure sizes using compiler attributes and settings, and make sure it all works across all compilers and platforms you are targeting using unit tests or maybe even static assertions.
For more information, please check out:
Data Alignment article on Wikipedia
Data Alignment when Migrating to 64-Bit Intel® Architecture
GCC Variable Attributes
Hope it helps. Good Luck!
How to minimize padding:
It is always good to have all your struct members properly aligned and at the same time keep your structure size reasonable. Consider these 2 struct variants with members rearanged (from now on assume sizeof char, short, int, long, long long to be 1, 2, 4, 4, 8 respectively):
struct A
{
char a;
short b;
char c;
int d;
};
struct B
{
char a;
char c;
short b;
int d;
};
Both structures are supposed to keep the same data but while sizeof(struct A) will be 12 bytes, sizeof(struct B) will be 8 due to well-though-out member order which eliminated implicit padding:
struct A
{
char a;
char __pad0[1]; // implicit compiler padding
short b;
char c;
char __pad1[3]; // implicit compiler padding
int d;
};
struct B // no implicit padding
{
char a;
char c;
short b;
int d;
};
Rearranging struct members may be error prone with increase of member count. To make it less error prone - put longest at the beginning and shortest at the end:
struct B // no implicit padding
{
int d;
short b;
char a;
char c;
};
Implicit padding at the end of stuct:
Depending on your compiler, settings, platform etc used you may notice that compiler adds padding not only before struct members but also at the end (ie. after the last member). Below structure:
struct abcd
{
long long a;
char b;
};
may occupy 12 or 16 bytes (worst compilers will allow it to be 9 bytes). This padding may be easily overlooked but is very important if your structure will be array alement. It will ensure your a member in subsequent array cells/elements will be properly aligned too.
Final and random thoughts:
It will never hurt (and may actually save) you if - when working with structs - you follow these advices:
Do not rely on compiler to interleave your struct members with proper padding.
Make sure your struct (if outside array) is aligned to boundary required by its longest member.
Make sure you arrange your struct members so that longest are placed first and last member is shortest.
Make sure you explicitly padd your struct (if needed) so that if you create array of structs, every structure member has proper alignment.
Make sure that arrays of your structs are properly aligned too as although your struct may require 8 byte alignment, your compiler may align your array at 4 byte boundary.

The values returned by sizeof for structs are not mandated by any C standard. It's up to the compiler and machine architecture.
For example, it can be optimal to align data members on 4 byte boundaries: in which case the effective packed size of char c will be 4 bytes.

why add fillers in a c++ struct?

What are the effect of fillers in a c++ struct? I often see them in some c++ api. For example:
struct example
{
unsigned short a;
unsigned short b;
char c[3];
char filler1;
unsigned short e;
char filler2;
unsigned int g;
};
This struct is meant to transport through network

struct example
{
unsigned short a; //2 bytes
unsigned short b;//2 bytes
//4 bytes consumed
char c[3];//3 bytes
char filler1;//1 bytes
//4 bytes consumed
unsigned short e;//2 bytes
char filler2;//1 bytes
//3 bytes consumed ,should be filler[2]
unsigned int g;//4 bytes
};

Because sometimes you don't actually control the format of the data you're using.
The format may be specified by something beyond your control. For example, it may be created in a system with different alignment requirements to yours.
Alternatively, the data may have real data in those filler areas that your code doesn't care about.

Those fillers are usually inserted to explicitly make sure some of the members of a structure are naturally aligned i.e. their offset inside a structure is a multiple of its size.
In the example below assuming char is 1 bytes, short is 2 and int is 4.
struct example
{
unsigned short a;
unsigned short b;
char c[3];
char filler1;
unsigned short e; // starts at offset 8
char filler2[2];
unsigned int g; // starts at offset 12
};
If you don't specify any fillers, a compiler will usually add the necessary padding bytes to ensure a proper alignment of the structure members.
Btw, these fields can also be used for reserved fields that might appear in the future.
updated:
Since it has been mentioned that a structure is a network packet, the fillers are required to get a structure that is compatible with the one being passed from another host.
However, inserting filler bytes in this case might not be enough (especially, if portability is required). If these structures are to be sent via a network as is (i.e. without manually packing into a separate buffer for sending), you have to inform a compiler that the structure should be packed.
In microsoft compiler this can be achieved using #pragma pack:
#pragma pack(1)
struct T {
char t;
int i;
short j;
double k;
};
In gcc you can use __attribute__((packed))
struct foo {
char c;
int x;
} __attribute__((packed));
However, many people prefer to manually pack/unpack structures int a raw-byte array, because accessing misaligned data on some systems might not be [properly] supported.

Depending on what code you're working with they may be attempting to align the structure on word boundries (32 bit in your case), this is a speed optimization, however, doing things like this has been rendered obsolete by decent optimizing compilers, however if the compiler was instructed not to optimize this piece of code, or the compiler is very low-end e.g. for an embedded system, it may be better to handle this yourself. It basically boils downto how much you trust the compiler.
The other reason is for writing binary files, where reserved bytes have been left in the file format specification.

Bit setting question

In C or C++, it's apparently possible to restrict the number of bits a variable has, so for example:
unsigned char A:1;
unsigned char B:3;
I am unfamiliar however with how it works specifically, so a number of questions:
If I have a class with the following variables:
unsigned char A:1;
unsigned char B:3;
unsigned char C:1;
unsigned char D:3;
What is the above technique actually called?
Is above class four bytes in size, or one byte in size?
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
Is there someway of combining the bits to a centralised byte? So for example:
.
unsigned char MainByte;
unsigned char A:1; //Can this be made to point at the first bit in MainByte?
unsigned char B:3; //Etc etc
unsigned char C:1;
unsigned char D:3;
Is there an article that covers this topic in more depth?
If 'A:1' is treated like an entire byte, what is the point/purple of it?
Feel free to mention any other considerations (like compiler restrictions or other limitations).
Thank you.

What is the above technique actually called?
Bitfields. And you're only supposed to use int (signed, unsigned or otherwise) as the "type", not char.
Is above class four bytes in size, or one byte in size?
Neither. It is probably sizeof(int) because the compiler generates a word-sized object. The actual bitfields will be stored within a byte, however. It'll just waste some space.
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
They represent only the bits specified, and will be packed as tightly as possible.
Is there someway of combining the bits to a centralised byte? So for example:
Use a union:
struct bits {
unsigned A:1;
unsigned B:3;
unsigned C:1;
unsigned D:3;
};
union SplitByte {
struct bits Bits;
unsigned char Byte[sizeof(struct bits)];
/* the array is a trick so the two fields
are guaranteed to be the same size and
thus be aligned on the same boundary */
} SplitByteObj;
// access the byte
SplitByteObj.Byte[0]
// access a bitfield
SplitByteObj.Bits.B
Note that there are problems with bitfields, for example when using threads. Each bitfield cannot be accessed individually, so you may get errors if you try to use a mutex to guard each of them. Also, the order in which the fields are laid out is not clearly specified by the standard. Many people prefer to use bitwise operators to implement bitfields manually for that reason.
Is there an article that covers this topic in more depth?
Not many. The first few you'll get when you Google it are about all you'll find. They're not a widely used construct. You'll be best off nitpicking the standard to figure out exactly how they work so that you don't get bitten by a weird edge case. I couldn't tell you exactly where in the standard they're specified.
If 'A:1' is treated like an entire byte, what is the point/purple of it?
It's not, but I've addressed this already.

These are bit-fields.
The details of how these fields are arranged in memory are largely implementation-defined. Typically, you will find that the compiler packs them in some way. But it may take various alignment issues into account.

Forcing unaligned bitfield packing in MSVC

I've a struct of bitfields that add up to 48 bits. On GCC this correctly results in a 6 byte structure, but in MSVC the structure comes out 8 bytes. I need to find some way to force MSVC to pack the struct properly, both for interoperability and because it's being used in a memory-critical environment.
The struct seen below consists of three 15-bit numbers, one 2-bit number, and a 1-bit sign. 15+15+15+2+1 = 48, so in theory it should fit into six bytes, right?
struct S
{
unsigned short a:15;
unsigned short b:15;
unsigned short c:15;
unsigned short d:2;
unsigned short e:1;
};
However, compiling this on both GCC and MSVC results in sizeof(S) == 8. Thinking that this might have to do with alignment, I tried using #pragma pack(1) before the struct declaration, telling the compiler to back to byte, not int, boundaries. On GCC, this worked, resulting in sizeof(S) == 6.
However, on MSVC05, the sizeof still came out to 8, even with pack(1) set! After reading this other SO answer, I tried replacing unsigned short d with unsigned char and unsigned short e with bool. The result is sizeof(S) == 7!
I found that if I split d into two one-bit fields and wedged them in between the other members, the struct finally packed properly.
struct S
{
unsigned short a:15;
unsigned short dHi : 1;
unsigned short b:15;
unsigned short dLo : 1;
unsigned short c:15;
unsigned short e:1;
};
printf( "%d\n", sizeof(S) ); // "6"
But having d split like that is cumbersome and causes trouble for me later on when I have to work on the struct. Is there some way I can force MSVC to pack this struct into 6 bytes, exactly as GCC does?

It is implementation defined how fields will be placed in the structure. Visual Studio will fit consecutive bitfields into an underlying type, if it can, and waste the leftover space. (C++ Bit Fields in VS)

If you use the type "unsigned __int64" to declare all elements of the structure, you'll get an object with sizeof(S)=8, but the last two bytes will be unused and the first six will contain the data in the format you want.
Alternatively, if you can accept some structure reordering, this will work
#pragma pack(1)
struct S3
{
unsigned int a:15;
unsigned int b:15;
unsigned int d:2;
unsigned short c:15;
unsigned short e:1;
};

I don't think so, and I think it's MSVC's behavior that is actually correct, and GCC that deviates from the standard.
AFAIK, the standard does not permit bitfields to cross word boundaries of the underlying type.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ struct containing unsigned char and int bug - c++

Are you talking about padding bytes? That's not a bug. As allowed by the C++ standard, the compiler is adding padding to keep the members aligned. This is required for some architectures, and will greatly improve performance for others.

You're having a byte alignment problem. The compiler is adding padding to align the bytes. See this wikipedia article.

Related

what does colon used in struct def mean in C++? [duplicate]

Why are the values returned by sizeof() compiler dependent?

why add fillers in a c++ struct?

Bit setting question

Forcing unaligned bitfield packing in MSVC

Categories

Resources