Understanding sizeof(char) in 32 bit C compilers

Understanding sizeof(char) in 32 bit C compilers - c++

(sizeof) char always returns 1 in 32 bit GCC compiler.
But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???
Considering the following :
struct st
{
int a;
char c;
};
sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)
I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.
Can someone pls explain this???
I would be very thankful for any replies explaining it!!!
EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made.
Thanks to all those who made it a point that It must be changed especially #Mike Burton for downvoting the question and to #jalf who seemed to jump to conclusions over my understanding of concepts!!

sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.
The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.
char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.
There is a wikipedia article called Data structure alignment which has a good explanation and examples.

It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here

Sample code demonstrating structure alignment:
struct st
{
int a;
char c;
};
struct stb
{
int a;
char c;
char d;
char e;
char f;
};
struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};
std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb) << std::endl; //8
std::cout<<sizeof(stc) << std::endl; //12
The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.

First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.
Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).
Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).

All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.
On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.
A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.
An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.
In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.
The CPU can still address individual bytes, which is useful when dealing with chars, for example.
As for your example:
struct st
{
int a;
char c;
};
sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.
A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.
So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.
In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.
To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.
Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.
If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.

Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.
This is why sizeof(char) is 1.
ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.

Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.

It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.
Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.
Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.

Related

what does colon used in struct def mean in C++? [duplicate]

Is bitfield a C concept or C++?
Can it be used only within a structure? What are the other places we can use them?
AFAIK, bitfields are special structure variables that occupy the memory only for specified no. of bits. It is useful in saving memory and nothing else. Am I correct?
I coded a small program to understand the usage of bitfields - But, I think it is not working as expected. I expect the size of the below structure to be 1+4+2 = 7 bytes (considering the size of unsigned int is 4 bytes on my machine), But to my surprise it turns out to be 12 bytes (4+4+4). Can anyone let me know why?
#include <stdio.h>
struct s{
unsigned int a:1;
unsigned int b;
unsigned int c:2;
};
int main()
{
printf("sizeof struct s = %d bytes \n",sizeof(struct s));
return 0;
}
OUTPUT:
sizeof struct s = 12 bytes

Because a and c are not contiguous, they each reserve a full int's worth of memory space. If you move a and c together, the size of the struct becomes 8 bytes.
Moreover, you are telling the compiler that you want a to occupy only 1 bit, not 1 byte. So even though a and c next to each other should occupy only 3 bits total (still under a single byte), the combination of a and c still become word-aligned in memory on your 32-bit machine, hence occupying a full 4 bytes in addition to the int b.
Similarly, you would find that
struct s{
unsigned int b;
short s1;
short s2;
};
occupies 8 bytes, while
struct s{
short s1;
unsigned int b;
short s2;
};
occupies 12 bytes because in the latter case, the two shorts each sit in their own 32-bit alignment.

1) They originated in C, but are part of C++ too, unfortunately.
2) Yes, or within a class in C++.
3) As well as saving memory, they can be used for some forms of bit twiddling. However, both memory saving and twiddling are inherently implementation dependent - if you want to write portable software, avoid bit fields.

Its C.
Your comiler has rounded the memory allocation to 12 bytes for alignment purposes. Most computer memory syubsystems can't handle byte addressing.

Your program is working exactly as I'd expect. The compiler allocates adjacent bitfields into the same memory word, but yours are separated by a non-bitfield.
Move the bitfields next to each other and you'll probably get 8, which is the size of two ints on your machine. The bitfields would be packed into one int. This is compiler specific, however.
Bitfields are useful for saving space, but not much else.

Bitfields are widely used in firmware to map different fields in registers. This save a lot of manual bitwise operations which would have been necessary to read / write fields without it.
One disadvantage is you can't take address of bitfields.

Writing bits to file?

I'm trying to implement a Huffman tree.
Content of my simple .txt file that I want to do a simple test:
aaaaabbbbccd
Frequencies of characters: a:5, b:4, c:2, d:1
Code Table: (Data type of 1s and 0s: string)
a:0
d:100
c:101
b:11
Result that I want to write as binary: (22 bits)
0000011111111101101100
How can I write bit-by-bit each character of this result as a binary to ".dat" file? (not as string)

Answer: You can't.
The minimum amount you can write to a file (or read from it), is a char or unsigned char. For all practical purposes, a char has exactly eight bits.
You are going to need to have a one char buffer, and a count of the number of bits it holds. When that number reaches 8, you need to write it out, and reset the count to 0. You will also need a way to flush the buffer at the end. (Not that you cannot write 22 bits to a file - you can only write 16 or 24. You will need some way to mark which bits at the end are unused.)
Something like:
struct BitBuffer {
FILE* file; // Initialization skipped.
unsigned char buffer = 0;
unsigned count = 0;
void outputBit(unsigned char bit) {
buffer <<= 1; // Make room for next bit.
if (bit) buffer |= 1; // Set if necessary.
count++; // Remember we have added a bit.
if (count == 8) {
fwrite(&buffer, sizeof(buffer), 1, file); // Error handling elided.
buffer = 0;
count = 0;
}
}
};

The OP asked:
How can I write bit-by-bit each character of this result as a binary to ".dat" file? (not as string)
You can not and here is why...
Memory model
Defines the semantics of a computer memory storage for the purpose of C++ abstract machine.
The memory available to a C++ program is one or more contiguous sequences of bytes. Each byte in memory has a unique address.
Byte
A byte is the smallest addressable unit of memory. It is defined as a contiguous sequence of bits, large enough to hold the value of any UTF-8 code unit (256 distinct values) and of (since C++14) any member of the basic execution character set (the 96 characters that are required to be single-byte). Similar to C, C++ supports bytes of sizes 8 bits and greater.
The types char, unsigned char, and signed char use one byte for both storage and value representation. The number of bits in a byte is accessible as CHAR_BIT or std::numeric_limits<unsigned char>::digits.
Compliments of cppreference.com
You can find this page here: cppreference:memory model
This comes from the 2017-03-21: standard
©ISO/IEC N4659
4.4 The C++ memory model [intro.memory]
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (5.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits,4 the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
[ Note: The representation of types is described in 6.9. —end note ]
A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having nonzero width. [ Note: Various features of the language, such as references and virtual functions, might involve additional memory locations that are not accessible to programs but are managed by the implementation. —end note ] Two or more threads of execution (4.7) can access separate memory locations without interfering
with each other.
[ Note: Thus a bit-field and an adjacent non-bit-field are in separate memory locations, and therefore can be concurrently updated by two threads of execution without interference. The same applies to two bit-fields, if one is declared inside a nested struct declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field declaration. It is not safe to concurrently update two bit-fields in the same struct if all fields between them are also bit-fields of nonzero width. —end note ]
[ Example: A structure declared as
struct {
char a;
int b:5,
c:11,
:0,
d:8;
struct {int ee:8;} e;
}
contains four separate memory locations: The field a and bit-fields d and e.ee are each separate memory
locations, and can be modified concurrently without interfering with each other. The bit-fields b and c
together constitute the fourth memory location. The bit-fields b and c cannot be concurrently modified, but
b and a, for example, can be. —end example ]
4) The number of bits in a byte is reported by the macro CHAR_BIT in the header <climits>.
This version of the standard can be found here:
www.open-std.org section § 4.4 on pages 8 & 9.
The smallest possible memory module that can be written to in a program is 8 contiguous bits or more for a standard byte. Even with bit fields, the 1 byte requirement still holds. You can manipulate, toggle, set, individual bits within a byte but you can not write individual bits.
What can be done is to have a byte buffer with a count of bits written. When your required bits are written you will need to have the rest of the unused bits marked as padding or un-used buffer bits.
Edit
[Note:] -- When using bit fields or unions one thing that you must take into consideration is the endian of the specific architecture.

Answer: You can, in a way.
Hello, from my experience I have found a way to do that simple. For the task you need to define yourself and array of characters (it just needs to be for instance 1 byte, it can be bigger). After that you must define functions to access a specific bit from any element. For example, how to write an expression to get the value of the 3th bit from a char in C++.
*/*position is [1,..,n], and bytes
are in little endian and index from 0`enter code here`*/
int bit_at(int position, unsigned char byte)
{
return (byte & (1 << (position - 1)));
}*
Now you can vision the array of bytes as this
[b1,...,bn]
Now what we actually have in memory is 8 * n bits of memory
We can try to visualize it like so.
NOTE: the arrays is zeroed!
|0000 0000|0000 0000|...|0000 0000|
Now from this you or whoever wants can figure how to manipulate it to get a specific bit from this array. Of course there will be some sort of converted but that is not such a problem.
In the end, for the encoding you provide, that is:
a:0
d:100
c:101
b:11
We can encode the message "abcd",
and make an array that holds the bits
of the message, using the elements
of the array as arrays for bits, like so:
|0111 0110|0000 0000|
You can write this to memory and you will have an excess of at most 7 bits.
This is a simple example, but it can be extended into much more.
I hope this gave some answers to your question.

size of struct-array

If I have a struct A defined as:
struct A {
char* c;
float f;
int i;
};
and an array
A col[5];
then why is
sizeof(*(col+0))
16?

On your platform, 16 bytes are required to hold that structure, the structure being of type A.
You should keep in mind that *(col+0) is identical to col[0] so it's only one of the structure, not the entire array of them. If you wanted the size of the array, you would use sizeof(col).

Possibly because:
you are on a 64-bit platform and char* takes 8 bytes while int and float take 4 bytes,
you are on a 32-bit platform and char* takes 4 bytes but your compiler decided that the array would be faster if it dropped 4 bytes of padding there. Padding can be controlled on most compilers by #pragma pack(push,1) and #pragma pack(pop) respectively.
If you want to be sure, you can use offsetof (on GCC) or create an object and examine the addresses of its member fields to inspect which fields got actually padded and how much.

For starters, your original declaration was incorrect (this has now been fixed in a question edit). A is the name of the type; to declare an array named col, you want
A col[5];
not
col A[5];
sizeof(*(col+0)) is the same as sizeof col[0], which is the same as sizeof (A).
It's 16 because that's the size of that structure, for the compiler and system you're using (you haven't mentioned what it is).
I take it from the question that you were expecting something different, but you didn't say so.
Compilers may insert padding bytes between members, or after the last member, to ensure that each member is aligned properly. I find 16 bytes to be an unsurprising size for that structure on a 64-bit system -- and in this particular case, it's probably that no padding is even required.
And in case you weren't aware, sizeof yields a result in bytes, where a byte is usually (but not always) 8 bits.

Your problem is most likely that your processor platform uses 8-byte alignment on floats. So, your char* will take 4 (assuming you're on a 32-bit system) since it's a pointer which is an address. Your float will take 8, and your int will take another 4 which totals 16 bytes.
Compilers will often make certain types align on certain byte boundaries in order to speed up computation on the hardware platform in use.
For example, if you did:
struct x {
char y;
int z;
};
Your system would (probably) say the size of x was 8, padding the char out to an int inside the structure.
You can add pragmas (implementation dependent) to stop this:
#pragma pack(1)
struct x {
char y;
int z;
};
#pragma pack(0)
which would make the size of this equal to 5.

Edit: There seem to be two parts to this question. "Why is sizeof(A) equal to 16?" On balance, I see now that this is probably the question that was intended. Instead I am answering the second part, i.e. "Why is sizeof(*(col+0)) == sizeof(A)?"
col is an array. col + 0 is meaningless for arrays, so the compiler must convert col to a pointer first. Then col is effectively just an A*. Adding zero to a pointer changes nothing. Finally, you dereference it with * and are left with a simple A of size 16.
In short, sizeof(A) == sizeof(*(col+0))
PS: I have not addressed the question "Why does that one element of the array take up 16 bytes?" Others have answered that well.

On a modern x86-64 processor, char* is 8 bytes, float is 4 bytes, int is 4 bytes. So the sizes of the members added together is 16. What else would you be expecting? Did someone tell you a pointer is 4 bytes? Because that's only true for x86-32.

Memory alignment in C-structs

I'm working on a 32-bit machine, so I suppose that the memory alignment should be 4 bytes. Say I have this struct:
typedef struct {
unsigned short v1;
unsigned short v2;
unsigned short v3;
} myStruct;
The plain added size is 6 bytes, and I suppose that the aligned size should be 8, but sizeof(myStruct) returns me 6.
However if I write:
typedef struct {
unsigned short v1;
unsigned short v2;
unsigned short v3;
int i;
} myStruct;
the plain added size is 10 bytes, aligned size shall be 12, and this time sizeof(myStruct) == 12.
Can somebody explain what is the difference?

At least on most machines, a type is only ever aligned to a boundary as large as the type itself [Edit: you can't really demand any "more" alignment than that, because you have to be able to create arrays, and you can't insert padding into an array]. On your implementation, short is apparently 2 bytes, and int 4 bytes.
That means your first struct is aligned to a 2-byte boundary. Since all the members are 2 bytes apiece, no padding is inserted between them.
The second contains a 4-byte item, which gets aligned to a 4-byte boundary. Since it's preceded by 6 bytes, 2 bytes of padding is inserted between v3 and i, giving 6 bytes of data in the shorts, two bytes of padding, and 4 more bytes of data in the int for a total of 12.

Forget about having different members, even if you write two structs whose members are exactly same, with a difference is that the order in which they're declared is different, then size of each struct can be (and often is) different.
For example, see this,
#include <iostream>
using namespace std;
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i; //note the order is different!
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4, and you get this output:
8
12
That is, sizes are different even though both structs has same members!
Code at Ideone : http://ideone.com/HGGVl
The bottomline is that the Standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.

By default, values are aligned according to their size. So a 2-byte value like a short is aligned on a 2-byte boundary, and a 4-byte value like an int is aligned on a 4-byte boundary
In your example, 2 bytes of padding are added before i to ensure that i falls on a 4-byte boundary.
(The entire structure is aligned on a boundary at least as big as the biggest value in the structure, so your structure will be aligned to a 4-byte boundary.)
The actual rules vary according to the platform - the Wikipedia page on Data structure alignment has more details.
Compilers typically let you control the packing via (for example) #pragma pack directives.

Assuming:
sizeof(unsigned short) == 2
sizeof(int) == 4
Then I personally would use the following (your compiler may differ):
unsigned shorts are aligned to 2 byte boundaries
int will be aligned to 4 byte boundaries.
typedef struct
{
unsigned short v1; // 0 bytes offset
unsigned short v2; // 2 bytes offset
unsigned short v3; // 4 bytes offset
} myStruct; // End 6 bytes.
// No part is required to align tighter than 2 bytes.
// So whole structure can be 2 byte aligned.
typedef struct
{
unsigned short v1; // 0 bytes offset
unsigned short v2; // 2 bytes offset
unsigned short v3; // 4 bytes offset
/// Padding // 6-7 padding (so i is 4 byte aligned)
int i; // 8 bytes offset
} myStruct; // End 12 bytes
// Whole structure needs to be 4 byte aligned.
// So that i is correctly aligned.

Firstly, while the specifics of padding are left up to the compiler, the OS also imposes some rules as to alignment requirements. This answer assumes that you are using gcc, though the OS may vary
To determine the space occupied by a given struct and its elements, you can follow these rules:
First, assume that the struct always starts at an address that is properly aligned for all data types.
Then for every entry in the struct:
The minimum space needed is the raw size of the element given by sizeof(element).
The alignment requirement of the element is the alignment requirement of the element's base type.
Notably, this means that the alignment requirement for a char[20] array is the same as
the requirement for a plain char.
Finally, the alignment requirement of the struct as a whole is the maximum of the alignment requirements of each of its elements.
gcc will insert padding after a given element to ensure that the next one (or the struct if we are talking about the last element) is correctly aligned. It will never rearrange the order of the elements in the struct, even if that will save memory.
Now the alignment requirements themselves are also a bit odd.
32-bit Linux requires that 2-byte data types have 2-byte alignment (their addresses must be even). All larger data types must have 4-byte alignment (addresses ending in 0x0, 0x4, 0x8 or 0xC). Note that this applies to types larger than 4 bytes as well (such as double and long double).
32-bit Windows is more strict in that if a type is K bytes in size, it must be K byte aligned. This means that a double can only placed at an address ending in 0x0 or 0x8. The only exception to this is the long double which is still 4-byte aligned even though it is actually 12-bytes long.
For both Linux and Windows, on 64-bit machines, a K byte type must be K byte aligned. Again, the long double is an exception and must be 16-byte aligned.

Each data type needs to be aligned on a memory boundary of its own size. So a short needs to be on aligned on a 2-byte boundary, and an int needs to be on a 4-byte boundary. Similarly, a long long would need to be on an 8-byte boundary.

The reason for the second sizeof(myStruct) being 12 is the padding that gets inserted between v3 and i to align i at a 32-bit boundary. There is two bytes of it.
Wikipedia explains the padding and alignment reasonably clearly.

In your first struct, since every item is of size short, the whole struct can be aligned on short boundaries, so it doesn't need to add any padding at the end.
In the second struct, the int (presumably 32 bits) needs to be word aligned so it inserts padding between v3 and i to align i.

Sounds like its being aligned to bounderies based on the size of each var, so that the address is a multiple of the size being accessed(so shorts are aligned to 2, ints aligned to 4 etc), if you moved one of the shorts after the int, sizeof(mystruct) should be 10. Of course this all depends on the compiler being used and what settings its using in turn.

The standard doesn't say much about the layout of structs with complete types - it's up to to the compiler. It decided that it needs the int to start on a boundary to access it, but since it has to do sub-boundary memory addressing for the shorts there is no need to pad them

struct size is different from typedef version?

I have the following struct declaration and typedef in my code:
struct blockHeaderStruct {
bool allocated;
unsigned int length;
};
typedef struct blockHeaderStruct blockHeader;
When I do sizeof(blockheader), I get the value of 4 bytes back, but when I do sizeof(struct blockHeaderStruct), I get 8 bytes.
Why is this happening? Why am I not getting 5 back instead?

Firstly, you cannot do sizeof(blockHeaderStruct). That simply will not compile. What you can do is sizeof(struct blockHeaderStruct), which could indeed give you 8 bytes as result.
Secondly, getting a different result from sizeof(blockheader) is highly unlikely. Judging by your reference to sizeof(blockHeaderStruct) (which, again, will not even compile) your description of the problem is inaccurate. Take a closer look at what is it you are really doing. Most likely, you are taking a sizeof of a pointer type (which gives you 4), not a struct type.
In any case, try posting real code.

Looking at the definition of your struct, you have 1 byte value followed by 4 byte Integer. This integer needs to be allocated on 4 byte boundary, which will force compiler to insert a 3 byte padding after your 1 byte bool. Which makes the size of struct to 8 byte. To avoid this you can change order of elements in the struct.
Also for two sizeof calls returning different values, are you sure you do not have a typo here and you are not taking size of pointer or different type or some integer variable.

The most likely scenario is that you are actually looking at the size of a pointer, not the struct, on a 32-bit system.
However, int may be 2 bytes (16 bits). In that case, the expected size of the structure is 4:
2 bytes for the int
1 byte for the bool
round up to the next multiple of 2, because the size of the struct is usually rounded to a multiple of the size of its largest primitive member.
Nothing could explain sizeof(blockHeaderStruct) != sizeof(struct blockHeader), though, given that typedef. That is completely impossible.

Struct allocation normally occurs on a 4 byte boundary. That is the compiler will pad data types within a struct up to the 4 byte boundary before starting with the next data type. Given that this is c++ (bool is a sizeof 1) and not c (bool needs to be #define as something)
struct blockHeaderStruct {
bool allocated; // 1 byte followed by 3 pad bytes
unsigned int length; // 4 bytes
};
typedef struct blockHeaderStruct blockHeader;
typedef struct blockHeaderStruct *blockHeaderPtr;
A sizeof operation would result:
sizeof(blockHeader) == 8
sizeof(struct blockHeader) == 8
sizeof(blockHeaderPtr) == 4
(Note: The last entry will be 8 for a
64 bit compiler. )
There should be no difference in sizes between the first two lines of code. A typedef merely assigns an alias to an existing type. The third is taking the sizeof a pointer which is 4 bytes in a 32 bit machine and 8 bytes on a 64 bit machine.
To fix this, simply apply the #pragma pack directive before a structure is defined. This forces the compiler to pack on the specified boundary. Usually set as 1,2,or 4 (although 4 is normally the default and doesn't need to be set).
#include <stddef.h>
#include <stdio.h>
#pragma pack(1)
struct blockHeaderStruct {
bool allocated;
unsigned int length;
};
typedef struct blockHeaderStruct blockHeader;
int main()
{
printf("sizeof(blockHeader) == %li\n", sizeof(blockHeader));
printf("sizeof(struct blockHeader) == %li\n", sizeof(struct blockHeaderStruct));
return 0;
}
Compiled with g++ (Ubuntu
4.4.1-4ubuntu9) 4.4.1
Results in:
sizeof(blockHeader) == 5
sizeof(structblockHeader) == 5
You don't normally need this directive. Just remember to pack your structs efficiently. Group smaller data types together. Do not alternate < 4 byte datatypes and 4 byte data types as your structs will be mostly unused space. This can cause unnecessary bandwidth for network related applications.

I actually copied that snippet direct
from my source file.
OK.
When I do sizeof(blockheader), I get
the value of 4 bytes back
It looks like blockheader is typedef'ed somewhere and its type occupies 4 bytes or its type requires 4 byte alignment.
If you try sizeof(blockHeader) you'll get your type.
when I do sizeof(blockHeaderStruct), I
get 8 bytes.
The reason why alignment matters is that if you need an array of blockHeaders then you can compute how much memory you need. Or if you have an array and need to compute how much memory to copy, you can compute it.
If you want to align all struct members to addresses that are multiples of 1 instead of 4 or instead of your compiler's defaults, your compiler might offer a #pragma to do it. Then you'll save memory but accesses might be slower in order to access unaligned data.

Some compilers will optimize to always allocate data by powers of 2 (4 stays 4, but 5 is rounded up to 8).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js