Where and how are constants stored? - c++

I read this question from here and I also read related question from c-faq but I don't understand the exact reason behind this :-
#include <iostream>
using namespace std;
int main()
{
//const int *p1 = (int*) &(5); //error C2101: '&' on constant
//cout << *p1;
const int five = 5;
const int *p2 = &(five);
cout << *p2 << endl;
char *chPtr = (char*) &("abcde");
for (int i=0; i<4; i++) cout << *(chPtr+i);
cout << endl;
return 0;
}
I was wondering how constants, either integer or string literal, get stored. My understanding of string literals is that they are created in global static memory upon start of program and persist until program exit. In the case of "abcde" even though I did not give it a variable name I can take it's address (chPtr) and I assume I could probably dereference chPtr any time before program termination and the character values would still be there, even if I dereferenced it outside the scope where it was declared. Is the const int variable "five" also placed in global static and that address p2 can also be referenced any time?
Why can I take the address of "five" but I cannot ask for: &(5) ? Are the constants "5" and "five" stored differently? and where "5" is get stored in memory ?

You cannot take the address of a literal (e.g. &(5)) because the literal is not "stored" anywhere - it is actually written in the assembly instruction. Depending on the platform, you'll get different instructions, but a MIPS64 addition example would look like this:
DADDUI R1, R1, #5
Trying to take the address of the immediate is meaningless as it doesn't reside in (data) memory, but is actually part of the instruction.
If you declare a const int i = 5, and do not need the address of it, the compiler can (and likely will) convert it to a literal and place 5 in the appropriate assembly instructions. Once you attempt to take the address of i, the compiler will see that it can no longer do that, and will place it in memory. This is not the case if you just attempt to take the address of a literal because you haven't indicated to the compiler that it needed to allocate space for a variable (when you declare a const int i, it allocates the space in the first pass, and will later determine it no longer needs it - it does not function in the reverse).
String constants are stored in the static portion of the data memory - which is why you can take the address of them.

"It depends" is probably not a satisfying answer, but it is the correct one. The compiler will store some const variables in the stack if it needs to (such as if you ever take the address of it). However, there has always been the idea of a "constexpr" variable in compilers, even if we didn't always have the mechansim to call it directly: If an expression can be calculated at compile time, then instead of caluclating it at run time, we can calculate it durring compile time. And if we can calculate it at compile time, and we never do anything that requires it to be something different, then we can remove it all together and turn it into a literal, which would be part of the instruction!
Take for example, the following code:
int main(int argc, char** argv)
{
const int a = 2;
const int b = 3;
const int c = a+b;
volatile int d = 6;
volatile int e = c+d;
std::cout << e << std::endl;
return 0;
}
Look at how smart the compiler is:
37 const int a = 2;
38 const int b = 3;
39 const int c = a+b;
40
41 volatile int d = 6;
0x400949 <+0x0009> movl $0x6,0x8(%rsp)
42 volatile int e = c+d;
0x400951 <+0x0011> mov 0x8(%rsp),%eax
0x400955 <+0x0015> add $0x5,%eax
0x400958 <+0x0018> mov %eax,0xc(%rsp)
43
44 std::cout << e << std::endl;
0x400944 <+0x0004> mov $0x601060,%edi
0x40095c <+0x001c> mov 0xc(%rsp),%esi
0x400960 <+0x0020> callq 0x4008d0 <_ZNSolsEi#plt>
45 return 0;
46 }
(volatile tells the compiler not to do fancy memory tricks to that variable) In line 41, when I use c, the add is done with the LITERAL 0x5, despite it even be a combination of the other code. Lines 37-39 contain NO instructions.
Now lets change the code so that I need the location of a:
int main(int argc, char** argv)
{
const int a = 2;
const int b = 3;
const int c = a+b;
volatile int d = 6;
volatile int e = c+d;
volatile int* f = (int*)&a;
volatile int g = *f;
std::cout << e << std::endl;
std::cout << g << std::endl;
return 0;
}
37 const int a = 2;
0x400955 <+0x0015> movl $0x2,(%rsp)
38 const int b = 3;
39 const int c = a+b;
40
41 volatile int d = 6;
0x400949 <+0x0009> movl $0x6,0x4(%rsp)
42 volatile int e = c+d;
0x400951 <+0x0011> mov 0x4(%rsp),%eax
0x40095c <+0x001c> add $0x5,%eax
0x40095f <+0x001f> mov %eax,0x8(%rsp)
43 volatile int* f = (int*)&a;
44 volatile int g = *f;
0x400963 <+0x0023> mov (%rsp),%eax
0x400966 <+0x0026> mov %eax,0xc(%rsp)
45
46 std::cout << e << std::endl;
0x400944 <+0x0004> mov $0x601060,%edi
0x40096a <+0x002a> mov 0x8(%rsp),%esi
0x40096e <+0x002e> callq 0x4008d0 <_ZNSolsEi#plt>
47 std::cout << g << std::endl;
0x40097b <+0x003b> mov 0xc(%rsp),%esi
0x40097f <+0x003f> mov $0x601060,%edi
0x400984 <+0x0044> callq 0x4008d0 <_ZNSolsEi#plt>
48 return 0;
So we can see that a is initialized into actual memory space, on the stack (I can tell cuz the rsp). But wait...c is dependent on a, but whenever I use c it is still a literal 5! What is happening here? Well, the compiler knows that a needs to be in a memory location because of the way it is used. However, it knows that the variable's value is never NOT 2, so whenever I use it in ways that don't need the memory, I can use it as a literal 2. Which means the a in line 37 is not the same as the a in line 43.
So where are const variables stored? They are stored where they NEED to be stored. CRAZY.
(btw, these were all compiled with g++ -g -O2, different compilers/flags will optimize it differently, this mostly demonstrates what the compiler can do, the only guarantee is that your code will behave correctly.)

Here's an example of taking the address of a const int and demonstrating that (in gcc on my machine, at least) it's stored as a local (not global static) variable.
#include <iostream>
const int *func() {
const int five = 5;
const int *p = &(five);
std::cout << *p << '\n';
return p;
}
// function to overwrite stack values left by earlier function call
int func2(int n, int x) {
for (int i = 0; i < x; ++i)
n *= 2;
return n;
}
int main() {
const int *p = func();
std::cout << func2(2, 10) << '\n';
std::cout << *p << '\n';
return 0;
}
Example output:
5
2048
1

Related

Pointer typecasting c++

#include <iostream>
#define print(x) std::cout << x;
#define println(x) std::cout << x << std::endl;
int main() {
int ex[5];
int* ptr = ex;
for (int i = 0; i < 5; i++) {
ex[i] = 2;
}
ex[2] = 3;
*(int*)((char*)ptr + 8) = 4;
println(ex[2]);
}
on line 13 i'm using (char*) and when i run println(sizeof(char*)) it says that it's 4 bytes but my instructor says that it's 1 byte long so we need to add 8 bytes to access the value in ex[2], how could this be possible i didn't understand ! :/
It depends on the architecture you use. By definition char is the type that has the size of 1, so sizeof(char) evaluates as 1, but it does not automatically mean that it is 8 bits.
To access the next value, you must add sizeof(int) to the pointer to make your code work independent of the architecture it is used on.
When you work with pointers, you tell the compiler that the value the pointer points to takes the space of that type in memory, and the next thing in the memory should be after that amount of units(bytes). So if you cast your int pointer to char pointer, you should add sizeof(int) to your char pointer to have the same effect as you would have added 1 to the int pointer. This is because char is automatically 1 unit by definition, if you would use anything other than char, this would not work, there is no architecture independent specification of sizes of types.

Why does memcpy to int not work after calling memcpy to bool value

I was playing around with memcpy when I stumbled on a strange result, where a memcpy that is called on the same pointer of memory after bool memcpy gives unexpected result.
I created a simple test struct that has a bunch of different type variables. I cast the struct into unsigned char pointer and then using memcpy I copy data from that pointer into separate variables. I tried playing around the offset of memcpy and shifting the int memcpy before bool (changed the layout of test struct so that the int would go before the bool too). Suprisingly the shifting fixed the problem.
// Simple struct containing 3 floats
struct vector
{
float x;
float y;
float z;
};
// My test struct
struct test2
{
float a;
vector b;
bool c;
int d;
};
int main()
{
// I create my structure on the heap here and assign values
test2* test2ptr = new test2();
test2ptr->a = 50;
test2ptr->b.x = 100;
test2ptr->b.y = 101;
test2ptr->b.z = 102;
test2ptr->c = true;
test2ptr->d = 5;
// Then turn the struct into an array of single bytes
unsigned char* data = (unsigned char*)test2ptr;
// Variable for keeping track of the offset
unsigned int offset = 0;
// Variables that I want the memory copied into they
float a;
vector b;
bool c;
int d;
// I copy the memory here in the same order as it is defined in the struct
std::memcpy(&a, data, sizeof(float));
// Add the copied data size in bytes to the offset
offset += sizeof(float);
std::memcpy(&b, data + offset, sizeof(vector));
offset += sizeof(vector);
std::memcpy(&c, data + offset, sizeof(bool));
offset += sizeof(bool);
// It all works until here the results are the same as the ones I assigned
// however the int value becomes 83886080 instead of 5
// moving this above the bool memcpy (and moving the variable in the struct too) fixes the problem
std::memcpy(&d, data + offset, sizeof(int));
offset += sizeof(int);
return 0;
}
So I expected the value of d to be 5 however it becomes 83886080 which I presume is just random uninitialized memory.
You ignore the padding of your data in a struct.
Take a look on the following simplified example:
struct X
{
bool b;
int i;
};
int main()
{
X x;
std::cout << "Address of b " << (void*)(&x.b) << std::endl;
std::cout << "Address of i " << (void*)(&x.i) << std::endl;
}
This results on my PC with:
Address of b 0x7ffce023f548
Address of i 0x7ffce023f54c
As you see, the bool value in the struct takes 4 bytes here even it uses less for its content. The compiler must add padding bytes to the struct to make it possible the cpu can access the data directly. If you have the data arranged linear as written in your code, the compiler have to generate assembly instructions on all access to align the data later which slows down your program a lot.
You can force the compiler to do that by adding pragma pack or something similar with your compiler. All the pragma things are compiler specific!
For your program, you have to use the address if the data for the memcpy and not the size of the data element before the element you want to access as this ignore padding bytes.
If I add a pragma pack(1) before my program, the output is:
Address of b 0x7ffd16c79cfb
Address of i 0x7ffd16c79cfc
As you can see, there are no longer padding bytes between the bool and the int. But the code which will access i later will be very large and slow! So avoid use of #pragma pack at all!
You've got the answer you need so I'll not get into detail. I just made an extraction function with logging to make it easier to follow what's happening.
#include <cstring>
#include <iostream>
#include <memory>
// Simple struct containing 3 floats
struct vector {
float x;
float y;
float z;
};
// My test struct
struct test2 {
float a;
vector b;
bool c;
int d;
};
template<typename T>
void extract(T& dest, unsigned char* data, size_t& offset) {
std::uintptr_t dp = reinterpret_cast<std::uintptr_t>(data + offset);
size_t align_overstep = dp % alignof(T);
std::cout << "sizeof " << sizeof(T) << " alignof " << alignof(T) << " data "
<< dp << " mod " << align_overstep << "\n";
if(align_overstep) {
size_t missing = alignof(T) - align_overstep;
std::cout << "misaligned - adding " << missing << " to align it again\n";
offset += missing;
}
std::memcpy(&dest, data + offset, sizeof(dest));
offset += sizeof(dest);
}
int main() {
std::cout << std::boolalpha;
// I create my structure on the heap here and assign values
test2* test2ptr = new test2();
test2ptr->a = 50;
test2ptr->b.x = 100;
test2ptr->b.y = 101;
test2ptr->b.z = 102;
test2ptr->c = true;
test2ptr->d = 5;
// Then turn the struct into an array of single bytes
unsigned char* data = reinterpret_cast<unsigned char*>(test2ptr);
// Variable for keeping track of the offset
size_t offset = 0;
// Variables that I want the memory copied into they
float a;
vector b;
bool c;
int d;
// I copy the memory here in the same order as it is defined in the struct
extract(a, data, offset);
std::cout << "a " << a << "\n";
extract(b, data, offset);
std::cout << "b.x " << b.x << "\n";
std::cout << "b.y " << b.y << "\n";
std::cout << "b.z " << b.z << "\n";
extract(c, data, offset);
std::cout << "c " << c << "\n";
extract(d, data, offset);
std::cout << "d " << d << "\n";
std::cout << offset << "\n";
delete test2ptr;
}
Possible output
sizeof 4 alignof 4 data 12840560 mod 0
a 50
sizeof 12 alignof 4 data 12840564 mod 0
b.x 100
b.y 101
b.z 102
sizeof 1 alignof 1 data 12840576 mod 0
c true
sizeof 4 alignof 4 data 12840577 mod 1
misaligned - adding 3 to align it again
d 5
24
There are apparently three padding bytes between the bool and the subsequent int. This is allowed by the standard due to alignment considerations (accessing a 4 byte int that is not aligned on a 4 byte boundary may be slow or crash on some systems).
So when you do offset += sizeof(bool), you are not incrementing enough. The int follows 4 bytes after, not 1. The result is that the 5 is not the first byte you read but the last one - you are reading three padding bytes plus the first one from test2ptr->d into d. And it is no coincidence that 83886080 = 2^24 * 5 (the padding bytes were apparently all zeros).

static const cached result

In the following example -
#include <iostream>
int someMethod(){
static int a = 20;
static const int result = a + 1;
++a;
std::cout << " [" << a << "] ";
return result;
}
int main(){
std::cout << someMethod() << "\n";
std::cout << someMethod() << "\n";
std::cout << someMethod() << "\n";
}
The output comes as -
[21] 21
[22] 21
[23] 21
What is the reason that is preventing result value to be modified on subsequent calls made to the same function? I've printed the output of variable a as well, which is certainly being incremented and since it is static as well there must not be multiple copies existing for the same method.
IDEONE - COMPILER OUTPUT
The const here only disturb the reader from the real cause. This code performs exactly the same.
int someMethod(){
static int a = 20;
static int result = a + 1;
++a;
std::cout << " [" << a << "] ";
return result;
}
The real reason is that the = sign can represent two different operations in C++: an assignation (executed each time) or an initialisation (executed only when the variable is created). The difference in when it is part of declaration/definition of the variable.
In normal context (not bloc static variables), both are equivalent because an automatic variable is created every time the block in which it is declared it run (or at least the compiler must ensure that all behaves as if it that the case).
But for a block static variable the initialization occurs only once and here the variable result is initialized to 21 and its value will never change.
Those variants would be much different
int someMethod(){
static int a = 20;
static int result;
result = a + 1; // assignation: result will see its value change with the value of a
...
int someMethod(){
static int a = 20;
static const int result = a;
result = a + 1; // Error modification of a const declared variable
Since result is static, it will be initialized only once during runtime.Therefore the following line is executed only once no matter how many times you call someMethod()
static const int result = a + 1;
Static is a hint to the compiler not to reinitialize a variable aside from forcing the compiler to allocate the variable's value on the program's data segment.
As explained by others, result is a static variable so it's initialized only at the first execution of someMethod().
And result is also const, so the first value assigned with
static const int result = a + 1;
remain unmodified for the following execution of the program.
I suppose you were expecting something that is achievable using a reference; if you modify the preceding line as follows
static const int & result = a;
^
// note the & ---|
you link result to a and modifying a you modify result and the output is
[21] 21
[22] 22
[23] 23
The problem is that you can reference a variable to another variable, not to an expression; so you can link result to a, not to a+1 (not to the dynamic value of a+1); so the following line
static const int & result = a + 1;
compile but (if I'm not wrong) link result to the unnamed variable where is stored (at the first execution of someMethod()) the result of the expression a + 1, so the output is again
[21] 21
[22] 21
[23] 21

memory location of pointer variable itself?

Is it possible to find the memory location of a pointer variable itself?
i.e. I don't want to know the memory location the pointer is referencing, I want to know what the memory location of the pointer variable is.
int A = 5;
int *k = &A;
cout << k;
will give me the location of A. Does k have a location?
The location of int *k is &k, like so:
// For this example, assume the variables start at 0x00 and are 32 bits each.
int A = 9; // 0x00 = 0x09
int * k = &A; // 0x04 = 0x00
int ** k_2 = &k; // 0x08 = 0x04
// Thus:
cout << "Value of A: " << A; // "Value of A: 9"
cout << "Address of A: " << k; // "Address of A: 0x00"
cout << "Address of k: " << k_2; // "Address of k: 0x04"
assert( A == *k);
assert(&A == k);
assert(&A == *k_2);
assert( A == **k_2);
assert( k == *k_2);
assert(&k == k_2);
A pointer is a variable, 32-bit on 32-bit binaries and 64 in 64, storing a memory address. Like any other variable, it has an address of its own, and the syntax is identical. For most intents and purposes, pointers act like unsigned ints of the same size.
Now, when you take &k, you now have an int **, with all the interesting complexities that go with that.
From int ** k_2 = &k, *k_2 is k and **k_2 is *k, so everything works about how you'd expect.
This can be a useful way to handle creating objects cross-library (pointers are rather fundamental and mostly safe to pass). Double pointers are commonly used in interfaces where you want to pass a pointer that will be filled with an object, but not expose how the object is being created (DirectX, for example).
Yes. You can use &k to print out the memory location of *k.
int A = 5;
int *k = &A;
cout << k;
A's address is k;
k's address is &k.
Yo dawg, we put an address at your address so you can address while you address.

C tips and tricks explanation

Can you explain me which is the mechanism behind the next code samples (I think I know but I need second opinion):
1)--------------------------
using namespace std;
int * f(int x) {
return &x;
}
int * g(int x, int y) {
return &y;
}
int * h(int x, int y, int z) {
return &z;
}
int main() {
cout << *f(42) << endl;
int * y1 = g(43, 44);
int * y2 = g(45, 46);
cout << *y1 << ", " << *y2 << endl;
int * z1 = h(47, 48, 49);
int * z2 = h(50, 51, 52);
cout << *z1 << ", " << *z2 << endl;
return 0;
}
2)--------------------------
int *a, *b;
void f(int x) {
int i[3];
i[0] = x;
i[1] = x + 1;
i[2] = x + 2;
a = i;
}
void g(int x) {
int i[3];
i[0] = x;
i[1] = x + 1;
i[2] = x + 2;
b = i;
}
int main() {
f(1);
printf("a = {%d,%d,%d}\n", a[0], a[1], a[2]);
g(2);
printf("a = {%d,%d,%d}\n", a[0], a[1], a[2]);
}
3)--------------------------
int main() {
char * hello = "hello, world!" + 3;
char * charstring = 'h' + "ello, world!";
printf("hello=%s, charstring=%s.\n", hello, charstring);
return 0;
}
Thank you.
I would expect those programs to crash or do other weird things when you run them.
Example 1: The functions f, g and h are returning the memory addresses of their arguments. Note that those arguments are stored on the stack, and when the functions return, the stack is unwound and the addresses will not be valid anymore. You could get lucky and the value will still be there, but you could just as well have the program crash or return some random value that's not the value that you passed to the function.
Example 2: The functions f and g set the global variables a and b to the addresses of local variables declared in the functions. Just like in the first example, those local variables will be gone when the functions return, leaving a and b pointing to something invalid.
Example 3: This is doing weird pointer arithmetic. hello will probably point to the address of the text plus 3, so you'd probably get "lo, world!" printed for this (but it could also be different, depending on how pointer arithmetic works on your particular platform). The case with charstring is similar, only here you add 'h' (ASCII value 104 - so you're adding 104 to the pointer). This will most likely crash the program.
I think it's a little easier for a beginner to understand these concepts if you explain step by step what is happening in the background.
1.
cout << *f(42) << endl; // Call f with the value 42
int * f(int x) { // Push an integer, x, on the stack (x = 42)
return &x; // Return a pointer to var x
} // Pop x off the stack
// Pointer now points to a memory address that is unallocated,
// which will crash the program when it tries to use that memory,
// which it does with cout
2.
f(1); // Call f with the value 1
void f(int x) { // Push an integer, x, on the stack (x = 1)
int i[3]; // Declare an int* with space for 3 vals (local! stack!)
i[0] = x; // Define values of the array
a = i; // Set a equal to i, beginning of array
} // i is now out of scope, and since it was declared as locally,
// rather than with malloc (or new in c++), it is on the stack
// and has now been popped off, so a points to a memory address
// that the OS *should* have marked as inaccessible
3.
char * hello = "hello, world!" + 3; // hello is a char*, a pointer that
// points to the beginning of an array
// of characters. Adding 3 will increment
// the pointer three characters after the
// first character.
char * charstring = 'h' + "ello, world!"; // charstring is a char*, a pointer that
// points to the beginning of an array
// of characters. This time, it would point
// to "ello, world!". However, the addition
// of 'h' will shift the character position
// by 104 characters because that is the
// value of ascii 'h'.