Weird sizeof() result

Weird sizeof() result - c++

When I run sizeof(r) on my Mac. It says sizeof(r) = 1. My understanding is that the size of a union is the size of its largest element. In this case shouldn't the largest element be the struct s?
union
{
struct
{
char i:1;
char j:2;
char m:3;
}s;
char ch;
}r;

Your union composes of two parts, a struct, and a character. The size of the union, since it shares the memory, is the size of the largest element, plus the size of any padding it sticks on (which in your case is 0 bytes).
First, let's see the size ideone reports for each:
http://ideone.com/LAhop
Okay, both are 1. Therefore, the union's size must be 1 as well.
The struct is composed of bitfields. One is 1 bit, one is 2, and one is 3. This gives a total of 6 out of the 8 bits in one byte. Since it has to be at least one byte anyway (bitfields aren't really sized in bits), the size is 1.
As for char, here's what the C++11 standard says in § 3.9.1/1 [basic.fundamental]:
Objects declared as characters (char) shall be large enough to store any member
of the implementation’s basic character set.
For pretty much every platform, this is one byte.
This is one byte.

The struct s is taking up 1 + 2 + 3 = 6 bits which fit into 1 byte and its unioning with a char which is 1 byte. Hence the answer 1 byte.

Related

Why the size of my Person is 10 bytes, and not 16 ? [duplicate]

Someone explain me how does the order of the member declaration inside a class determines the size of that class.
For Example :
class temp
{
public:
int i;
short s;
char c;
};
The size of above class is 8 bytes.
But when the order of the member declaration is changed as below
class temp
{
public:
char c;
int i;
short s;
};
then the size of class is 12 bytes.
How?

The reason behind above behavior is data structure alignment and padding. Basically if you are creating a 4 byte variable e.g. int, it will be aligned to a four byte boundary i.e. it will start from an address in memory, which is multiple of 4. Same applies to other data types. 2 byte short should start from even memory address and so on.
Hence if you have a 1 byte character declared before the int (assume 4 byte here), there will be 3 free bytes left in between. The common term used for them is 'padded'.
Data structure alignment
Another good pictorial explanation
Reason for alignment
Padding allows faster memory access i.e. for cpu, accessing memory areas that are aligned is faster e.g. reading a 4 byte aligned integer might take a single read call where as if an integer is located at a non aligned address range (say address 0x0002 - 0x0006), then it would take two memory reads to get this integer.
One way to force compiler to avoid alignment is (specific to gcc/g++) to use keyword 'packed' with the structure attribute. packed keyword Also the link specifies how to enforce alignment by a specific boundary of your choice (2, 4, 8 etc.) using the aligned keyword.
Best practice
It is always a good idea to structure your class/struct in a way that variables are already aligned with minimum padding. This reduces the size of the class overall plus it reduces the amount of work done by the compiler i.e. no rearrangement of structure. Also one should always access member variables by their names in the code, rather than trying to read a specific byte from structure assuming a value would be located at that byte.
Another useful SO question on performance advantage of alignment
For the sake of completion, following would still have a size of 8 bytes in your scenario (32 bit machine), but it won't get any better since full 8 bytes are now occupied, and there is no padding.
class temp
{
public:
int i;
short s;
char c;
char c2;
};

class temp
{
public:
int i; //size 4 alignment 4
short s; //size 2 alignment 2
char c; //size 1 alignment 1
}; //Size 8 alignment max(4,2,1)=4
temp[i[0-4];s[4-2];c[6-7]]] -> 8
Padding in (7-8)
class temp
{
public:
char c; //size 1 alignment 1
int i; //size 4 alignment 4
short s; //size 2 alignment 2
};//Size 12 alignment max(4,2,1)=4
temp[c[0-1];i[4-8];s[8-10]]] -> 12
Padding in (1-4) and (10-12)

Why a pointer to an integer increments by 4 bytes? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Question about pointer increment
When i increment a int pointer then its address have a gap of 4 bytes. why it is so ? why a int pointer takes 4 bytes to store whereas a char takes 2 bytes ?

When you increment a pointer of a type A, you move that pointer forward in the memory by the size of the type it points to. On your machine, int takes 4 bytes, so the pointer moves by 4 bytes.
As for "why does int take 4 bytes on my machine?":
The C++ standard says (4.9.1. paragraph 2):
There are five standard signed integer types : “signed char”, “short
int”, “int”, “long int”, and “long long int”. In this list, each type
provides at least as much storage as those preceding it in the list.
<...> Plain ints have the natural size suggested by the architecture
of the execution environment[44]; the other signed integer types are
provided to meet special needs.
[44]: that is, large enough to contain any value in the
range of INT_MIN and INT_MAX, as defined in the header .
Basically, the sizes of fundamental types are not set in stone, and are implementation-defined. The accepted answer to this SO question has some information about it.

Here is the general rule:
If the type is T, its size N is calculated as sizeof(T) bytes. So pointer of type T* is increased by N bytes if you increment the pointer by 1.
Mathematically,
T *p = getT();
size_t diff = static_cast<size_t>(p+1) - static_cast<size_t>(p);
bool alwaysTrue = (diff == sizeof(T)); //alwaysTrue is always true!

the size of the pointer to any data types always be the same as supported by your system
If system is 32 -bit the size would be 4 bytes for all the pointers.
In pointer arithmetic when you do ptr++ or ptr-- the increments and decrements takes place according to the size of the data type this ptrpointer points to .
char *cptr;
int *iptr;
char c[5];
int a[5];
cptr=c;
iptr=a;
By doing cptr++ you will get c[1] and pointer will increments by only one byte
You can check the address of each char.
Similarly iptr++ will give you a[1] here pointer increased by 4 bytes.
int main()
{
int i;
for(i=0;i<5;i++)
{
printf("%p\t",&c[i]); //internally pointer arithmeitc: (c+sizeof(char)*i) ,
printf("%p\n",&a[i]); //intenally pointer arithmetic : (a+sizeof(int)*i)
}
}
Size of int or other data types are implementation defined

Pointers increment by the size in bytes of the things they point to. ints take 4 bytes on a 32-bit machine.

Because, on your computer, sizeof (int) == 4, so stepping from one int to the next requires an increment of four bytes.
Most integer types have different sizes on different computers. int must have at least 16 bits, and is supposed to be a "natural" size for the computer. Most 32 or 64-bit platforms choose 32 bits as a "natural" size, and most computers have 8-bit bytes, so 4 bytes is a very common size for int.
However, sizeof (char) == 1 on all computers, so I'm rather surprised that you say "a char takes 2 bytes". It should only take one.

because the size of data (int) which the pointer is pointing has 4 byte size so the pointer increments 4 bytes (size of data (int))
another example: if you have structure with size 8 byte and you have pointer pointing to this structure the increment of this pointer will be 8 byte:
struct test {
int x;
int y;
}
struct test ARRAY[50];
struct test *p=ARRAY; // p pointer is pointing here to the first element ARRAY[0]. ARRAY[0] is with size 8 bytes
p++; // this will increment p with 8 byte (size of struct test). So p now is pointing to the second element ARRAY[1]

Memory alignment in C-structs

I'm working on a 32-bit machine, so I suppose that the memory alignment should be 4 bytes. Say I have this struct:
typedef struct {
unsigned short v1;
unsigned short v2;
unsigned short v3;
} myStruct;
The plain added size is 6 bytes, and I suppose that the aligned size should be 8, but sizeof(myStruct) returns me 6.
However if I write:
typedef struct {
unsigned short v1;
unsigned short v2;
unsigned short v3;
int i;
} myStruct;
the plain added size is 10 bytes, aligned size shall be 12, and this time sizeof(myStruct) == 12.
Can somebody explain what is the difference?

At least on most machines, a type is only ever aligned to a boundary as large as the type itself [Edit: you can't really demand any "more" alignment than that, because you have to be able to create arrays, and you can't insert padding into an array]. On your implementation, short is apparently 2 bytes, and int 4 bytes.
That means your first struct is aligned to a 2-byte boundary. Since all the members are 2 bytes apiece, no padding is inserted between them.
The second contains a 4-byte item, which gets aligned to a 4-byte boundary. Since it's preceded by 6 bytes, 2 bytes of padding is inserted between v3 and i, giving 6 bytes of data in the shorts, two bytes of padding, and 4 more bytes of data in the int for a total of 12.

Forget about having different members, even if you write two structs whose members are exactly same, with a difference is that the order in which they're declared is different, then size of each struct can be (and often is) different.
For example, see this,
#include <iostream>
using namespace std;
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i; //note the order is different!
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4, and you get this output:
8
12
That is, sizes are different even though both structs has same members!
Code at Ideone : http://ideone.com/HGGVl
The bottomline is that the Standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.

By default, values are aligned according to their size. So a 2-byte value like a short is aligned on a 2-byte boundary, and a 4-byte value like an int is aligned on a 4-byte boundary
In your example, 2 bytes of padding are added before i to ensure that i falls on a 4-byte boundary.
(The entire structure is aligned on a boundary at least as big as the biggest value in the structure, so your structure will be aligned to a 4-byte boundary.)
The actual rules vary according to the platform - the Wikipedia page on Data structure alignment has more details.
Compilers typically let you control the packing via (for example) #pragma pack directives.

Assuming:
sizeof(unsigned short) == 2
sizeof(int) == 4
Then I personally would use the following (your compiler may differ):
unsigned shorts are aligned to 2 byte boundaries
int will be aligned to 4 byte boundaries.
typedef struct
{
unsigned short v1; // 0 bytes offset
unsigned short v2; // 2 bytes offset
unsigned short v3; // 4 bytes offset
} myStruct; // End 6 bytes.
// No part is required to align tighter than 2 bytes.
// So whole structure can be 2 byte aligned.
typedef struct
{
unsigned short v1; // 0 bytes offset
unsigned short v2; // 2 bytes offset
unsigned short v3; // 4 bytes offset
/// Padding // 6-7 padding (so i is 4 byte aligned)
int i; // 8 bytes offset
} myStruct; // End 12 bytes
// Whole structure needs to be 4 byte aligned.
// So that i is correctly aligned.

Firstly, while the specifics of padding are left up to the compiler, the OS also imposes some rules as to alignment requirements. This answer assumes that you are using gcc, though the OS may vary
To determine the space occupied by a given struct and its elements, you can follow these rules:
First, assume that the struct always starts at an address that is properly aligned for all data types.
Then for every entry in the struct:
The minimum space needed is the raw size of the element given by sizeof(element).
The alignment requirement of the element is the alignment requirement of the element's base type.
Notably, this means that the alignment requirement for a char[20] array is the same as
the requirement for a plain char.
Finally, the alignment requirement of the struct as a whole is the maximum of the alignment requirements of each of its elements.
gcc will insert padding after a given element to ensure that the next one (or the struct if we are talking about the last element) is correctly aligned. It will never rearrange the order of the elements in the struct, even if that will save memory.
Now the alignment requirements themselves are also a bit odd.
32-bit Linux requires that 2-byte data types have 2-byte alignment (their addresses must be even). All larger data types must have 4-byte alignment (addresses ending in 0x0, 0x4, 0x8 or 0xC). Note that this applies to types larger than 4 bytes as well (such as double and long double).
32-bit Windows is more strict in that if a type is K bytes in size, it must be K byte aligned. This means that a double can only placed at an address ending in 0x0 or 0x8. The only exception to this is the long double which is still 4-byte aligned even though it is actually 12-bytes long.
For both Linux and Windows, on 64-bit machines, a K byte type must be K byte aligned. Again, the long double is an exception and must be 16-byte aligned.

Each data type needs to be aligned on a memory boundary of its own size. So a short needs to be on aligned on a 2-byte boundary, and an int needs to be on a 4-byte boundary. Similarly, a long long would need to be on an 8-byte boundary.

The reason for the second sizeof(myStruct) being 12 is the padding that gets inserted between v3 and i to align i at a 32-bit boundary. There is two bytes of it.
Wikipedia explains the padding and alignment reasonably clearly.

In your first struct, since every item is of size short, the whole struct can be aligned on short boundaries, so it doesn't need to add any padding at the end.
In the second struct, the int (presumably 32 bits) needs to be word aligned so it inserts padding between v3 and i to align i.

Sounds like its being aligned to bounderies based on the size of each var, so that the address is a multiple of the size being accessed(so shorts are aligned to 2, ints aligned to 4 etc), if you moved one of the shorts after the int, sizeof(mystruct) should be 10. Of course this all depends on the compiler being used and what settings its using in turn.

The standard doesn't say much about the layout of structs with complete types - it's up to to the compiler. It decided that it needs the int to start on a boundary to access it, but since it has to do sub-boundary memory addressing for the shorts there is no need to pad them

Understanding sizeof(char) in 32 bit C compilers

(sizeof) char always returns 1 in 32 bit GCC compiler.
But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???
Considering the following :
struct st
{
int a;
char c;
};
sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)
I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.
Can someone pls explain this???
I would be very thankful for any replies explaining it!!!
EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made.
Thanks to all those who made it a point that It must be changed especially #Mike Burton for downvoting the question and to #jalf who seemed to jump to conclusions over my understanding of concepts!!

sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.
The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.
char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.
There is a wikipedia article called Data structure alignment which has a good explanation and examples.

It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here

Sample code demonstrating structure alignment:
struct st
{
int a;
char c;
};
struct stb
{
int a;
char c;
char d;
char e;
char f;
};
struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};
std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb) << std::endl; //8
std::cout<<sizeof(stc) << std::endl; //12
The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.

First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.
Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).
Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).

All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.
On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.
A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.
An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.
In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.
The CPU can still address individual bytes, which is useful when dealing with chars, for example.
As for your example:
struct st
{
int a;
char c;
};
sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.
A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.
So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.
In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.
To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.
Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.
If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.

Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.
This is why sizeof(char) is 1.
ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.

Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.

It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.
Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.
Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.

struct size is different from typedef version?

I have the following struct declaration and typedef in my code:
struct blockHeaderStruct {
bool allocated;
unsigned int length;
};
typedef struct blockHeaderStruct blockHeader;
When I do sizeof(blockheader), I get the value of 4 bytes back, but when I do sizeof(struct blockHeaderStruct), I get 8 bytes.
Why is this happening? Why am I not getting 5 back instead?

Firstly, you cannot do sizeof(blockHeaderStruct). That simply will not compile. What you can do is sizeof(struct blockHeaderStruct), which could indeed give you 8 bytes as result.
Secondly, getting a different result from sizeof(blockheader) is highly unlikely. Judging by your reference to sizeof(blockHeaderStruct) (which, again, will not even compile) your description of the problem is inaccurate. Take a closer look at what is it you are really doing. Most likely, you are taking a sizeof of a pointer type (which gives you 4), not a struct type.
In any case, try posting real code.

Looking at the definition of your struct, you have 1 byte value followed by 4 byte Integer. This integer needs to be allocated on 4 byte boundary, which will force compiler to insert a 3 byte padding after your 1 byte bool. Which makes the size of struct to 8 byte. To avoid this you can change order of elements in the struct.
Also for two sizeof calls returning different values, are you sure you do not have a typo here and you are not taking size of pointer or different type or some integer variable.

The most likely scenario is that you are actually looking at the size of a pointer, not the struct, on a 32-bit system.
However, int may be 2 bytes (16 bits). In that case, the expected size of the structure is 4:
2 bytes for the int
1 byte for the bool
round up to the next multiple of 2, because the size of the struct is usually rounded to a multiple of the size of its largest primitive member.
Nothing could explain sizeof(blockHeaderStruct) != sizeof(struct blockHeader), though, given that typedef. That is completely impossible.

Struct allocation normally occurs on a 4 byte boundary. That is the compiler will pad data types within a struct up to the 4 byte boundary before starting with the next data type. Given that this is c++ (bool is a sizeof 1) and not c (bool needs to be #define as something)
struct blockHeaderStruct {
bool allocated; // 1 byte followed by 3 pad bytes
unsigned int length; // 4 bytes
};
typedef struct blockHeaderStruct blockHeader;
typedef struct blockHeaderStruct *blockHeaderPtr;
A sizeof operation would result:
sizeof(blockHeader) == 8
sizeof(struct blockHeader) == 8
sizeof(blockHeaderPtr) == 4
(Note: The last entry will be 8 for a
64 bit compiler. )
There should be no difference in sizes between the first two lines of code. A typedef merely assigns an alias to an existing type. The third is taking the sizeof a pointer which is 4 bytes in a 32 bit machine and 8 bytes on a 64 bit machine.
To fix this, simply apply the #pragma pack directive before a structure is defined. This forces the compiler to pack on the specified boundary. Usually set as 1,2,or 4 (although 4 is normally the default and doesn't need to be set).
#include <stddef.h>
#include <stdio.h>
#pragma pack(1)
struct blockHeaderStruct {
bool allocated;
unsigned int length;
};
typedef struct blockHeaderStruct blockHeader;
int main()
{
printf("sizeof(blockHeader) == %li\n", sizeof(blockHeader));
printf("sizeof(struct blockHeader) == %li\n", sizeof(struct blockHeaderStruct));
return 0;
}
Compiled with g++ (Ubuntu
4.4.1-4ubuntu9) 4.4.1
Results in:
sizeof(blockHeader) == 5
sizeof(structblockHeader) == 5
You don't normally need this directive. Just remember to pack your structs efficiently. Group smaller data types together. Do not alternate < 4 byte datatypes and 4 byte data types as your structs will be mostly unused space. This can cause unnecessary bandwidth for network related applications.

I actually copied that snippet direct
from my source file.
OK.
When I do sizeof(blockheader), I get
the value of 4 bytes back
It looks like blockheader is typedef'ed somewhere and its type occupies 4 bytes or its type requires 4 byte alignment.
If you try sizeof(blockHeader) you'll get your type.
when I do sizeof(blockHeaderStruct), I
get 8 bytes.
The reason why alignment matters is that if you need an array of blockHeaders then you can compute how much memory you need. Or if you have an array and need to compute how much memory to copy, you can compute it.
If you want to align all struct members to addresses that are multiples of 1 instead of 4 or instead of your compiler's defaults, your compiler might offer a #pragma to do it. Then you'll save memory but accesses might be slower in order to access unaligned data.

Some compilers will optimize to always allocate data by powers of 2 (4 stays 4, but 5 is rounded up to 8).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js