sizeof(struct) returns unexpected value - c++

This should be simple but I have no clue where to look for the issue:
I have a struct:
struct region
{
public:
long long int x;
long long int y;
long long int width;
long long int height;
unsigned char scale;
};
When I do sizeof(region) it gives me 40 when I am expecting 33.
Any ideas?
(mingw gcc, win x64 os)

It's padding the struct to fit an 8-byte boundary. So it actually is taking 40 bytes in memory - sizeof is returning the correct value.
If you want it to only take 33 bytes then specify the packed attribute:
struct region
{
public:
long long int x;
long long int y;
long long int width;
long long int height;
unsigned char scale;
} __attribute__ ((packed));

long long int values are 8 bytes each. scale is only 1 byte but is padded for alignments, so it effectively takes up 8 bytes too. 5*8 = 40.

As others said, structs are padded for alignments, and such padding not only depends on the type of the members, but also on the order of the members in which they're defined.
For example, consider these two structs A and B as defined below. Both structs are identical in terms of members and types; the only difference is that the order in which members are defined isn't same:
struct A
{
int i;
int j;
char c;
char d;
};
struct B
{
int i;
char c;
int j;
char d;
};
Would the sizeof(A) be equal to sizeof(B) just because they've same number of members of same type? No. Try printing the size of each:
cout << "sizeof(A) = "<< sizeof(A) << endl;
cout << "sizeof(B) = "<< sizeof(B) << endl;
Output:
sizeof(A) = 12
sizeof(B) = 16
Surprised? See the output yourself : http://ideone.com/yCX4S

Related

Why is the size of the union greater than expected?

#include <iostream>
typedef union dbits {
double d;
struct {
unsigned int M1: 20;
unsigned int M2: 20;
unsigned int M3: 12;
unsigned int E: 11;
unsigned int s: 1;
};
};
int main(){
std::cout << "sizeof(dbits) = " << sizeof(dbits) << '\n';
}
output: sizeof(dbits) = 16, but if
typedef union dbits {
double d;
struct {
unsigned int M1: 12;
unsigned int M2: 20;
unsigned int M3: 20;
unsigned int E: 11;
unsigned int s: 1;
};
};
Output: sizeof(dbits) = 8
Why does the size of the union increase?
In the first and second union, the same number of bits in the bit fields in the structure, why the different size?
I would like to write like this:
typedef union dbits {
double d;
struct {
unsigned long long M: 52;
unsigned int E: 11;
unsigned int s: 1;
};
};
But, sizeof(dbits) = 16, but not 8, Why?
And how convenient it is to use bit fields in structures to parse bit in double?
members of a bit field will not cross boundaries of the specified storage type. So
unsigned int M1: 20;
unsigned int M2: 20;
will be 2 unsigned int using 20 out of 32 bit each.
In your second case 12 + 20 == 32 fits in a single unsigned int.
As for your last case members with different storage type can never share. So you get one unsigned long long and one unsigned int instead of a single unsigned long long as you desired.
You should use uint64_t so you get exact bit counts. unsigned int could e anything from 16 to 128 (or more) bit.
Note: bitfields are highly implementation defined, this is just the common way it usually works.

Size of structure - padding and alignment

I have an explicitly sized structure as follow:
typedef struct
{
unsigned long A : 4;
unsigned long B : 12;
union
{
unsigned long C1 : 8;
unsigned long C2 : 8;
unsigned long C3 : 8;
};
unsigned long D : 8;
}FooStruct;
The total size of this struct should be 32bit (4 bytes) in theory. However, I get a 12-byte size using sizeof so there should be some padding and alignment happening here.
I just don't see why and where. Can someone explain to me how this structure takes 12 bytes in memory?
The union forces the start of a new unsigned long, and the member after the union yet another unsigned long. Assuming long is 4 bytes that means your struct will have 3 unsigned longs for a total of 12 bytes. Although a union with three equally sized members also seems odd.
If you want this to have a size of 4 bytes why not change it to:
typedef struct
{
unsigned short A : 4;
unsigned short B : 12;
union
{
unsigned char C1 : 8;
unsigned char C2 : 8;
unsigned char C3 : 8;
};
unsigned char D : 8;
}FooStruct;
Additionally if you are using gcc and want to disable structure padding, you can use __attribute__((packed)):
struct FooStruct
{
unsigned long A : 4;
unsigned long B : 12;
union
{
unsigned long C1 : 8;
unsigned long C2 : 8;
unsigned long C3 : 8;
} __attribute__((packed)) C;
unsigned long D : 8;
} __attribute__((packed));
But beware that some architectures may have penalities on unalligned data access or not allow it at all.

C++ Bits in 64 bit integer

Hello I have a struct here that is 7 bytes and I'd like to write it to a 64 bit integer. Next, I'd like to extract out this struct later from the 64 bit integer.
Any ideas on this?
#include "stdafx.h"
struct myStruct
{
unsigned char a;
unsigned char b;
unsigned char b;
unsigned int someNumber;
};
int _tmain(int argc, _TCHAR* argv[])
{
myStruct * m = new myStruct();
m->a = 11;
m->b = 8;
m->c = 12;
m->someNumber = 30;
printf("\n%s\t\t%i\t%i\t%i\t%i\n\n", "struct", m->a, m->b, m->c, m->someNumber);
unsigned long num = 0;
// todo: use bitwise operations from m into num (total of 7 bytes)
printf("%s\t\t%i\n\n", "ulong", num);
m = new myStruct();
// todo: use bitwise operations from num into m;
printf("%s\t\t%i\t%i\t%i\t%i\n\n", "struct", m->a, m->b, m->c, m->someNumber);
return 0;
}
You should to do something like this:
class structured_uint64
{
uint64_t data;
public:
structured_uint64(uint64_t x = 0):data(x) {}
operator uint64_t&() { return data; }
unsigned uint8_t low_byte(size_t n) const { return data >> (n * 8); }
void low_byte(size_t n, uint8_t val) {
uint64_t mask = static_cast<uint64_t>(0xff) << (8 * n);
data = (data & ~mask) | (static_cast<uint64_t>(val) << (8 * n));
}
unsigned uint32_t hi_word() const { return (data >> 24); }
// et cetera
};
(there is, of course, lots of room for variation on the details of the interface and where among the 64 bits the constituents are placed)
Using different types to alias the same portion of memory is a generally bad idea. The thing is, it's very valuable for the optimizer to be able to use reasoning like:
"Okay, I've read a uint64_t at the start of this block, and nowhere in the middle does the program write to any uint64_ts, therefore the value must be unchanged!"
which means it will get the wrong answer if you tried to change the value of the uint64_t object through a uint32_t reference. And as this is very dependent what optimizations are possible and done, it is actually pretty easy to never run across the problem in test cases, but see it in the real program you're trying to write -- and you'll spend forever trying to find the bug because you convinced yourself it's not this problem.
So, you really should do the insertion/extraction of the fields with bit twiddling (or intrinsics, if profiling shows that this is a performance issue and there are useful ones available) rather than trying to set up a clever struct.
If you really know what you're doing, you can make the aliasing work, I believe. But it should only be done if you really know what you're doing, and that includes knowing relevant rules from the standard inside and out (which I don't, and so I can't advise you on how to make it work). And even then you probably shouldn't do it.
Also, if you intend your integral types to be a specific size, you should really use the correct types. For example, never use unsigned int for an integer that is supposed to be exactly 32 bits. Instead use uint32_t. Not only is it self-documenting, but you won't run into a nasty surprise when you try to build your program in an environment where unsigned int is not 32 bits.
Use a union. Each element of a union occupies the same address space. The struct is one element, the unsigned long long is another.
#include <stdio.h>
union data
{
struct
{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned int d;
} e;
unsigned long long f;
};
int main()
{
data dat;
dat.f = 0xFFFFFFFFFFFFFFFF;
dat.e.a = 1;
dat.e.b = 2;
dat.e.c = 3;
dat.e.d = 4;
printf("f=%016llX\n",dat.f);
printf("%02X %02X %02X %08X\n",dat.e.a,dat.e.b,dat.e.c,dat.e.d);
return 0;
}
Output, but note one byte of the original unsigned long long remains. Compilers like to align data such as 4-byte integers on addresses divisible by 4, so three bytes, then a pad byte so the integer is at offset 4 and the struct has a total size of 8.
f=00000004FF030201
01 02 03 00000004
This can be controlled in compiler-dependent fashion. Below is for Microsoft C++:
#include <stdio.h>
#pragma pack(push,1)
union data
{
struct
{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned int d;
} e;
unsigned long long f;
};
#pragma pack(pop)
int main()
{
data dat;
dat.f = 0xFFFFFFFFFFFFFFFF;
dat.e.a = 1;
dat.e.b = 2;
dat.e.c = 3;
dat.e.d = 4;
printf("f=%016llX\n",dat.f);
printf("%02X %02X %02X %08X\n",dat.e.a,dat.e.b,dat.e.c,dat.e.d);
return 0;
}
Note the struct occupies seven bytes now and the highest byte of the unsigned long long is now unchanged:
f=FF00000004030201
01 02 03 00000004
Got it.
static unsigned long long compress(char a, char b, char c, unsigned int someNumber)
{
unsigned long long x = 0;
x = x | a;
x = x << 8;
x = x | b;
x = x << 8;
x = x | c;
x = x << 32;
x = x | someNumber;
return x;
}
myStruct * decompress(unsigned long long x)
{
printBinary(x);
myStruct * m = new myStruct();
m->someNumber = x | 4294967296;
x = x >> 32;
m->c = x | 256;
x = x >> 8;
m->b = x | 256;
x = x >> 8;
m->a = x | 256;
return m;
}

How to get array of bits in a structure?

I was pondering (and therefore am looking for a way to learn this, and not a better solution) if it is possible to get an array of bits in a structure.
Let me demonstrate by an example. Imagine such a code:
#include <stdio.h>
struct A
{
unsigned int bit0:1;
unsigned int bit1:1;
unsigned int bit2:1;
unsigned int bit3:1;
};
int main()
{
struct A a = {1, 0, 1, 1};
printf("%u\n", a.bit0);
printf("%u\n", a.bit1);
printf("%u\n", a.bit2);
printf("%u\n", a.bit3);
return 0;
}
In this code, we have 4 individual bits packed in a struct. They can be accessed individually, leaving the job of bit manipulation to the compiler. What I was wondering is if such a thing is possible:
#include <stdio.h>
typedef unsigned int bit:1;
struct B
{
bit bits[4];
};
int main()
{
struct B b = {{1, 0, 1, 1}};
for (i = 0; i < 4; ++i)
printf("%u\n", b.bits[i]);
return 0;
}
I tried declaring bits in struct B as unsigned int bits[4]:1 or unsigned int bits:1[4] or similar things to no avail. My best guess was to typedef unsigned int bit:1; and use bit as the type, yet still doesn't work.
My question is, is such a thing possible? If yes, how? If not, why not? The 1 bit unsigned int is a valid type, so why shouldn't you be able to get an array of it?
Again, I don't want a replacement for this, I am just wondering how such a thing is possible.
P.S. I am tagging this as C++, although the code is written in C, because I assume the method would be existent in both languages. If there is a C++ specific way to do it (by using the language constructs, not the libraries) I would also be interested to know.
UPDATE: I am completely aware that I can do the bit operations myself. I have done it a thousand times in the past. I am NOT interested in an answer that says use an array/vector instead and do bit manipulation. I am only thinking if THIS CONSTRUCT is possible or not, NOT an alternative.
Update: Answer for the impatient (thanks to neagoegab):
Instead of
typedef unsigned int bit:1;
I could use
typedef struct
{
unsigned int value:1;
} bit;
properly using #pragma pack
NOT POSSIBLE - A construct like that IS NOT possible(here) - NOT POSSIBLE
One could try to do this, but the result will be that one bit is stored in one byte
#include <cstdint>
#include <iostream>
using namespace std;
#pragma pack(push, 1)
struct Bit
{
//one bit is stored in one BYTE
uint8_t a_:1;
};
#pragma pack(pop, 1)
typedef Bit bit;
struct B
{
bit bits[4];
};
int main()
{
struct B b = {{0, 0, 1, 1}};
for (int i = 0; i < 4; ++i)
cout << b.bits[i] <<endl;
cout<< sizeof(Bit) << endl;
cout<< sizeof(B) << endl;
return 0;
}
output:
0 //bit[0] value
0 //bit[1] value
1 //bit[2] value
1 //bit[3] value
1 //sizeof(Bit), **one bit is stored in one byte!!!**
4 //sizeof(B), ** 4 bytes, each bit is stored in one BYTE**
In order to access individual bits from a byte here is an example (Please note that the layout of the bitfields is implementation dependent)
#include <iostream>
#include <cstdint>
using namespace std;
#pragma pack(push, 1)
struct Byte
{
Byte(uint8_t value):
_value(value)
{
}
union
{
uint8_t _value;
struct {
uint8_t _bit0:1;
uint8_t _bit1:1;
uint8_t _bit2:1;
uint8_t _bit3:1;
uint8_t _bit4:1;
uint8_t _bit5:1;
uint8_t _bit6:1;
uint8_t _bit7:1;
};
};
};
#pragma pack(pop, 1)
int main()
{
Byte myByte(8);
cout << "Bit 0: " << (int)myByte._bit0 <<endl;
cout << "Bit 1: " << (int)myByte._bit1 <<endl;
cout << "Bit 2: " << (int)myByte._bit2 <<endl;
cout << "Bit 3: " << (int)myByte._bit3 <<endl;
cout << "Bit 4: " << (int)myByte._bit4 <<endl;
cout << "Bit 5: " << (int)myByte._bit5 <<endl;
cout << "Bit 6: " << (int)myByte._bit6 <<endl;
cout << "Bit 7: " << (int)myByte._bit7 <<endl;
if(myByte._bit3)
{
cout << "Bit 3 is on" << endl;
}
}
In C++ you use std::bitset<4>. This will use a minimal number of words for storage and hide all the masking from you. It's really hard to separate the C++ library from the language because so much of the language is implemented in the standard library. In C there's no direct way to create an array of single bits like this, instead you'd create one element of four bits or do the manipulation manually.
EDIT:
The 1 bit unsigned int is a valid type, so why shouldn't you be able
to get an array of it?
Actually you can't use a 1 bit unsigned type anywhere other than the context of creating a struct/class member. At that point it's so different from other types it doesn't automatically follow that you could create an array of them.
C++ would use std::vector<bool> or std::bitset<N>.
In C, to emulate std::vector<bool> semantics, you use a struct like this:
struct Bits {
Word word[];
size_t word_count;
};
where Word is an implementation-defined type equal in width to the data bus of the CPU; wordsize, as used later on, is equal to the width of the data bus.
E.g. Word is uint32_fast_t for 32-bit machines, uint64_fast_t for 64-bit machines;
wordsize is 32 for 32-bit machines, and 64 for 64-bit machines.
You use functions/macros to set/clear bits.
To extract a bit, use GET_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] & (1 << ((bit) % wordsize))).
To set a bit, use SET_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] |= (1 << ((bit) % wordsize))).
To clear a bit, use CLEAR_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] &= ~(1 << ((bit) % wordsize))).
To flip a bit, use FLIP_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] ^= (1 << ((bit) % wordsize))).
To add resizeability as per std::vector<bool>, make a resize function which calls realloc on Bits.word and changes Bits.word_count accordingly. The exact details of this is left as a problem.
The same applies for proper range-checking of bit indices.
this is abusive, and relies on an extension... but it worked for me:
struct __attribute__ ((__packed__)) A
{
unsigned int bit0:1;
unsigned int bit1:1;
unsigned int bit2:1;
unsigned int bit3:1;
};
union U
{
struct A structVal;
int intVal;
};
int main()
{
struct A a = {1, 0, 1, 1};
union U u;
u.structVal = a;
for (int i =0 ; i<4; i++)
{
int mask = 1 << i;
printf("%d\n", (u.intVal & mask) >> i);
}
return 0;
}
You can also use an array of integers (ints or longs) to build an arbitrarily large bit mask. The select() system call uses this approach for its fd_set type; each bit corresponds to the numbered file descriptor (0..N). Macros are defined: FD_CLR to clear a bit, FD_SET to set a bit, FD_ISSET to test a bit, and FD_SETSIZE is the total number of bits. The macros automatically figure out which integer in the array to access and which bit in the integer. On Unix, see "sys/select.h"; under Windows, I think it is in "winsock.h". You can use the FD technique to make your own definitions for a bit mask. In C++, I suppose you could create a bit-mask object and overload the [] operator to access individual bits.
You can create a bit list by using a struct pointer. This will use more than a bit of space per bit written though, since it'll use one byte (for an address) per bit:
struct bitfield{
unsigned int bit : 1;
};
struct bitfield *bitstream;
Then after this:
bitstream=malloc( sizeof(struct bitfield) * numberofbitswewant );
You can access them like so:
bitstream[bitpointer].bit=...

C++: sizeof of struct with bit fields

Why is gcc giving returning 13 as the sizeof of the following class ?
It seems to me that we should get e (4 bytes) + d (4 bytes) + 1 byte (for a and b) = 9 bytes. If it was alignment, aren't most 32 bit systems aligned on 8 byte boundaries ?
class A {
unsigned char a:1;
unsigned char b:4;
unsigned int d;
A* e;
} __attribute__((__packed__));
int main( int argc, char *argv[] )
{
cout << sizeof(A) << endl;
}
./a.out
13
You are very likely running on a 64 bit platform and the size of the pointer is not 4 but 8 bytes. Just do a sizeof on A * and print it out.
The actual size of structs with bitfields is implementation dependent, so whatever size gcc decides it to be would be right.