On VS (release), I run the following:
int main(void)
{
char b[] = "123";
char a[] = "1234567";
printf("%x %x\n", b,a);
return 0;
}
I can see that, the mem address of a is b+3(the length of the string). Which shows that the memory are allocated with no gaps. And this guarantee that least memories are used.
So, I now kind of believe that all compilers will do so.
I want to make sure of this guess here. Can somebody give me an more formal proof or tell me that my guess is rooted on a coincidence.
No, it's not guaranteed that there will always be perfect packing of data.
For example, I compiled and runned this code on g++, and the difference is 8.
You can read more about this here.
tl;dr: Compilers can align objects in memory to only addresses divisible by some constant (always machine-word length) to help processor(for them it's easier to work with such addresses)
UPD: one interesting example about alignment:
#include <iostream>
using namespace std;
struct A
{
int a;
char b;
int c;
char d;
};
struct B
{
int a;
int c;
char b;
char d;
};
int main()
{
cout << sizeof(A) << " " << sizeof(B) << "\n";
}
For me, it prints
16 12
There is no guarantee what addresses will be chosen for each variable. Different processors may have different requirements or preferences for alignment of variables for instance.
Also, I hope there were at least 4 bytes between the addresses in your example. "123" requires 4 bytes - the extra byte being for the null terminator.
Try reversing the order of declaring a[] and b[], and/or increase the length of b.
You are making a very big assumption about how storage is allocated. Depending on your compiler the string literals might get stored in a literal pool that is NOT on the stack. Yet a[] and b[] do occupy elements on the stack. So, another test would be to add int c and compare those addresses.
Related
How to copy to flexible array inside struct in c?
#include <stdio.h>
#include <string.h>
typedef struct
{
int val;
char buf[];
} foo;
int main()
{
foo f;
f.val = 4;
// f.buf = "asd"; -> invalid use of flexible array member
memcpy(f.buf, "asd\0", 4);
printf("%s\n", f.buf);
}
output:
asd
*** stack smashing detected ***: terminated
Aborted (core dumped)
Also, if the struct was declared as:
typedef struct
{
char buf[];
} foo
vscode editor gives error:
incomplete type is not allow
and gcc gives error:
error: flexible array member in a struct with no named members
6 | char buf[];
Why is array in struct now allowed but pointer is? (char *buf).
Also, If a struct has a flexible array, what is its sizeof(struct containsFlexArray)? How can I dynamically resolve its array, when it has no dimension?
EDIT:
if the above works in C++, because the incomplete array "decay" to pointer of known length (8 bytes in x64), why is this not also the case in c? If I peek to asm, I see the program does not allocate enough stack for the struct (it allocates only space for foo.val member, but not bur foo.buf member, in which case the program tries to use override the foo.val member (by using its address instead of foo.buf), which causes the stack smashing detected. But why is it implemented this wrong way? (So I want to know the rationale behind introducing flexible array as well)
You may want to read information on flexible array member here.
It seems as, when using a flexible array in a struct there must be at least one other data member and the flexible array member must be last.
And also you may have an element of clarification concerning the usage of flexible array members in C here
lets use intel/amd architecture here where char => 1 byte int => 4 and long is 8 bytes long.
Struct alignment is an issue here. You see when you declare a struct in c, compiler looks at it as individual block. So if you have a struct like this:
struct a {
long l;
char c1;
char c2;
}
compiler looks at the first type used, and allocates 8 bytes of memory for l, looks at c and determines that c1 is shorter than l and rather than figure out what c1's type is, it allocates 8 bytes of data for c1. It does the same for c2. So you end up with struct that is 24 bytes long and only 10 are used. Where if you use this:
struct b {
char c1;
long l;
char c2;
}
this will allocate 1 byte for c1, 8 byte for l, and 8 bytes for c2. So you end up with 17 bytes and 10 used. Where as if you have this:
struct b {
char c1;
char c2;
long l;
}
well it allocates 1 byte for c1, 1 byte for c2, and 8 bytes for l. In total 10 bytes but all 10 are used.
So what does it have to do with array? You see if you have:
struct aa {
char a;
long b[];
}
This will know to allocate at least one byte for b initially. Where when you do not have char a,
struct aa {
long b[];
}
Compiler might not allocate any memory (allocate 0 bytes), because it simply does not know how much to allocate.
EDIT:
Left my PC and in mean time other answer popped up. The other answer is very good!!! But I hope this helps you understand what is going on.
You did not initialize the buf[] array when you declared an instance in main(). The compiler does not know how much memory to allocate. The stack smashing is a compiler feature that keeps your program from doing... bad things to you computer. Add a number to your array declaration in typedef struct.
Like this:
` #include <stdio.h>
#include <string.h>
typedef struct
{
int val;
char buf[5];
} foo;
int main()
{
foo f;
f.val = 4;
// f.buf = "asd"; -> invalid use of flexible array member
memcpy(f.buf, "asd\0", 4);
printf("%s\n", f.buf);
}`
I was learning about the structures in C++ and got to know that if a structure in C++ has 3 variables (let each one be of some data type), then all of them are not allocated in a contiguous fashion. Is this correct?
If yes, then how much memory would be allocated to an object to that structure type?
For e.g. Let's say we have a structure like this:
struct a
{
int x;
int y;
char c;
};
Now how intuitively an object of type a, must occupy some space = sizeOf(int) + sizeOf(int) + sizeOf(char). But, if they are not allocated continuously, there could be some memory locations allocated for that object, that are just present for providing some padding - i.e. the memory allocated for that object could look something like this:
xxxx[4-bytes]xxxxx[4-bytes]xxxx[1-byte]xxx
(NOTE: In the above blockquote x corresponds to a memory location of size 1 byte. I also assumed that sizeOf(int) = 4-bytes and sizeOf(char) = 1- byte.)
So in the above one, we can see that the object a occupies more than 9-bytes (because there are some memory locations (x's) used for padding.)
So, does something like this happen?
Thanks for your replies!
P.S. Please let me know if I hadn't written something clearly.
When it comes to structs and classes, The layout is implementation-defined between compilers, os, and architecture... Some will use automatic alignment, others will use padding, some may even auto arrange it's members. If you need to know the size of a struct, use sizeof(Your Struct).
Here's a code snippet...
#include <iostream>
struct A {
char a;
float b;
int c;
};
struct B {
float a;
int b;
char c;
};
int main() {
std::cout << "Sizeof(A) = " << sizeof(A) << '\n';
std::cout << "Sizeof(B) = " << sizeof(B) << '\n';
return 0;
}
Output:
Sizeof(A) = 12
Sizeof(B) = 12
For my particular machine, I'm running Windows 7 - 64bit, It is an Intel Core2 Quad Extreme, and I'm using Visual Studio 2017 running it with C++17.
With my particular setup, both structures are being generated with a different layout, but have the same size in bytes.
In A's case...
char a; // 1 byte
// 3 bytes of padding
float b; // 4 bytes
int c; // 4 bytes (int is 32bit even on x64).
In B's case...
float a; // 4 bytes
int b; // 4 bytes
char c; // 1 byte
// 3 bytes of padding.
Also, your compiler flags and optimizations may have an effect. This isn't always guaranteed, as it is implementation-defined as stated in the standard.
--Edit--
Also, if you don't want this exact behavior there are some pragmas directives and macros such as pragma pack and alignas() that can be used to modify your implementation details. Here are a few references.
How to use alignas to replace pragma pack?
https://en.cppreference.com/w/cpp/preprocessor/impl
https://en.cppreference.com/w/cpp/language/alignas
https://learn.microsoft.com/en-us/cpp/cpp/alignment-cpp-declarations?view=msvc-160
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/pragma_pack.htm
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/packqua.htm
https://www.iditect.com/how-to/57426535.html
https://downloads.ctfassets.net/oxjq45e8ilak/1LriV4eAdhNlu9Zv06H9NJ/53576095f772b5f6cddbbedccb7ebd8a/Alexander_Titov_Know_your_hardware_CPU_memory_hierarchy.pdf
https://cpc110.blogspot.com/2020/10/vs2019-alignas-in-struct-definition.html
So, structure Object variables take contiguous memory locations and the variables are stored in memory in the same order in which they are defined.
Please refer to the code below you will get an idea.
struct c{
int a;
int b;
int c;
};
int main()
{
c obj;
cout << &(obj.a) << endl; //0x7ffe128fe1c4
cout << &(obj.b) << endl; //0x7ffe128fe1c8
cout << &(obj.c) << endl; //0x7ffe128fe1cc
return 0;
}
For Structures Padding Concept Refer here
#include <iostream>
using namespace std;
struct node1{
char b[3];
int c[0];
};
struct node2{
int c[0];
};
struct node3{
char b[3];
};
int main() {
cout << sizeof(node1) << endl; // prints 4
cout << sizeof(node2) << endl; // prints 0
cout << sizeof(node3) << endl; // prints 3
}
My Question is why does the compiler allocate 0 bytes for int c[0] in node2
but allocate 1 byte for its when part of node1.
I'm assuming that this 1 byte is the reason why sizeof(node1) returns 4 since without it (like in node3) its size is 3 or is that due to padding??
Also trying to understand that shouldn't node2 have enough space to hold a pointer to an array (which will be allocated in the further down in the code as part of the flexible array/struct hack?
Yes, it's about padding/alignment. If you add __attribute__((__packed__)) to the end [useful when writing device drivers], you'll get 3 0 3 for your output.
If node1 had defined c[1], the size is 8 not 7, because the compiler will align c to an int boundary. With packed, sizeof would be 7
Yes, padding makes the difference. The reason why node1 has a padding byte, while node3 doesn't, lies in the typical usage of zero-length arrays.
Zero-length arrays are typically used with casting: You cast a larger, (possibly variable-sized) object to the struct containing the zero-length array. Then you access the "rest" of the large object using the zero-length array, which, for this purpose, has to be aligned properly. The padding byte is inserted before the zero-sized array, such that the ints are aligned. Since you can't do that with node3, no padding is needed.
Example:
struct Message {
char Type[3];
int Data[]; // it compiles without putting 0 explicitly
};
void ReceiveMessage(unsigned char* buffer, size_t length) {
if(length < sizeof(Message))
return;
Message* msg = (Message*)buffer;
if(!memcmp(msg->Type, "GET", 3)) {
HandleGet(msg->Data, (length - sizeof(Message))/sizeof(int));
} else if....
Note: this is rather hackish, but efficient.
c doesn't allocate one byte in node1. Its because of the padding added to b.
For b, to be easily obtainable by a 32-bit CPU, it is four bytes big. 32-bit CPUs can read 4 consecutive bytes from memory at a time. To read three, they have to read four and then remove the one not necessary. Therefore, to optimize this behavior, the compiler padds the struct with some bytes.
You can observe similar compiler optimizations when values are pushed on the stack (that is, arguments or local variables are allocated). The stack is always kept aligned to the CPU's data bus size (commonly 32 or 64 bits).
int main() {
cout << sizeof(node1) << endl; // prints 4
cout << sizeof(node2) << endl; // prints 0
cout << sizeof(node3) << endl; // prints 3
}
the main function queries the the size of the user defined structs, not of the array members. sizeof() will return the number of bytes allocated to the struct, with each character allocated in the character array being allocated 1 byte. A character array is really a C style string which is terminated by the sentinel character '\0'. It is likely to include the byte allocated to hold the sentinel character when evaluating the sizeof(node1) as there is another variable after it so it reads over it, but not include the sentinel in sizeof(node3) where the string and the struct terminates
if i have a struct , say:
struct A {
int a,b,c,d,e;
}
A m;//struct if 5 ints
int n[5];//array of 5 ints.
i know that elements in the array are stored one after other so we can use *(n+i) or n[i]
But in case of struct ,is each element is stored next to each other (in the struct A)?
The compiler may insert padding as it wishes, except before the first item.
In C++03 you were guaranteed increasing addresses of items between access specifiers.
I'm not sure if the access specifier restriction is still there in C++11.
The only thing that is granted is that members are stored in the same order.
Between elements there can be some "padding" the compiler may insert so that each value is aligned with the processor word length.
Different compiler can make different choices also depending on the target platform and can be forced to keep a given alignment by option switches or pragma-s.
Your particular case is "luky" for the most of compiler since int is normally implemented as "the integral that better fits the integer arithmetic of the processor". With this idea, a sequence of int-s is aligned by definition. But that may not be the case, for example if you have
struct test
{
char a;
short b;
long c;
long long d;
};
You can dscovery that (&a)+1 != &b and (&b)+1 != &c or (&b)-1 != &a etc.
What is granted is the progression &a < &b; &b < &c; &c < &d;
Structs members in general are stored in increasing addresses but they are not guaranteed to be contiguous.so elements may not always be contiguous. In the example above, given $base is the base address of the struct the layout will be the following.
a will be stored at $base+0
b will be stored at $base+4
c will be stored at $base+8 ... etc
You can see the typical alignment values at http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
I have written simple program to show strut elements are next to each other
int main() {
struct S {
int a;
int b;
int c;
};
S s = {1,2,3};
int* p = reinterpret_cast <int *> (&s);
cout<<p[0]<<" "<<p[1]<<" "<<p[2];
return 0;
}
Output : 1,2,3
Remember, [] or *(i+1) are symantic construct, that suits with pointers, not with struct variables directly.
As suggested in Cheers and hth. - Alf's answer, there can be padding, before or after struct elements.
I'm wondering if the arrays a[] and b[] in the code below are contiguous in memory:
int main() {
int a[3];
int b[3];
return 0;
}
a[0],a[1] and a[2] should be contiguous and the same for b, but are there any guarantees on where b will be allocated in relation to a?
If not, is there any way to force a and b to be contiguous with eachother? i.e. - so that they are allocated next to eachother in the stack.
No, C++ makes no guarantee about the location of these variables in memory. One or both may not even be in memory (for example if they are optimised out)!
In order to obtain a relative ordering between the two, at least, you would have to encapsulate them into a struct or class and, even then, there would be padding/alignment to consider.
There's no guarantee that individual variables will be next to each other on the stack.
In your example you could "cheat" and simulate what you're looking for by doing this:
int main()
{
int buffer[6]
int *a = buffer;
int *b = buffer+3;
return 0;
}
Now, when you write to a[4] you'll actually write to b[0].