constexpr pointers and memory management in C++

constexpr pointers and memory management in C++ - c++

Quoting from C++ Primer:
The address of an object defined outside of any function is a constant expression, and so may be used to initialize a constexpr pointer.
In fact, each time I compile and run the following piece of code:
#include <iostream>
using namespace std;
int a = 1;
int main()
{
constexpr int *p = &a;
cout << "p = " << p << endl;
}
I always get the output:
p = 0x601060
Now, how is that possible? How can the address of an object (global or not) be known at compile time and be assigned to a constexpr? What if that part of the memory is being used for something else when the program is executed?
I always assumed that the memory is managed so that a free portion is allocated when a program is executed, but doesn't matter what particular part of the memory. However, since here we have a constexpr pointer, the program will always require a specific portion, that has to be free to allow the program execution. This doesn't make sense to me, could someone explain this behaviour please? Thanks.
EDIT: After reading your answers and a few articles online, I realized that I missed the whole concept of virtual memory... now it makes sense. It's quite surprising that neither C++ Primer nor Accelerated C++ mention this concept (maybe they will do it in later chapters, I'm still reading...).
However, quoting again C++ Primer:
A constant expression is an expression whose value cannot change and that can be evaluated at compile time.
Given that the linker has a major role in computing the fixed address of global objects, the book would have been more precise if it said "constant expression can be evaluated at link time", not "at compile time".

It's not actually true that the address of an object is known at compile time. What is known at compile time is the offset. When the program is compiled, the address is not emitted into the object file, but a marker to indicate the offset and the section.
To be simplistic about it, the linker then comes along, measures the size of each section, stitches them together and calculates the address of each marker in each object file now that it has a concrete 'base address' for each section.
Of course it's not quite that simple. A linker can also emit a map of the locations of all these adjusted values in its output, so that a loader or load-time linker can re-adjust them just prior to run time.
The point is, logically, for all intents and purposes, the address is a constant from the program's point of view. It's just that the constant isn't given a value until link/load time. When that value is available, every reference to that constant is overwritten by the linker/loader.
If your question is "why is it always the same address?" It's because your OS uses a standard virtual memory layout layered over the virtual memory manager. Addresses in a process are not real memory addresses - they are logical memory addresses. The piece of silicon at that 'address' is mapped in by the virtual memory management circuitry. Thus each process can use the "same" address, while actually using a different area of the memory chips.
I could go on about paging memory in and out, which is related, but it's a long topic. Further reading is encouraged.

It works because global variables are in static storage.

This is because the space for the global/static variable is allocated at compile time within the binary your compiler generates, in a region next to the program's machine code called the "data" segment. When the binary is copied and loaded into memory, the data segment becomes read-write.
This Wikipedia article includes a nice diagram of where the "data" segment fits into the virtual address space:
https://en.wikipedia.org/wiki/Data_segment
Automatic variables are not stored in the data segment because they may be instantiated as many times as their parent function is called. Moreover, they may be allocated at any depth of the stack. Thus it is not possible to know the address of an automatic variable at compile time in the general case.
This is not the case for global variables, which are clearly unique throughout the lifetime of the program. This allows the compiler to assign a fixed address for the variable which is separate from the stack.

Related

How to set a constexpr pointer to a physical Address

In embedded programming you often need to set pointers that point to a physical address. The address is non relocatable and fixed. These are not set by the linker as typically they represent registers or in this case calibration data located at a predetermined address in OPT memory. This data is set when the device is first tested in production by the chip manufacturer.
so the first attempt was:
static constexpr uint16_t *T30_CAL = reinterpret_cast<uint16_t *>(0x1FFFF7B8u);
But that leads to following warning / error under GCC and is 'illegal' according to the standard (c++ 14) .
..xyz/xxxx/calibration.cpp:23:40: error: reinterpret_cast from integer to pointer
Now i can fudge it by
constexpr uint32_t T30_ADDR = 0x1FFFF7B8u;
static constexpr inline uint16_t *T30_CAL(){
return reinterpret_cast<uint16_t *>(T30_ADDR);
}
which compiles without warnings but ......
I suppose GCC can optionally compile this to a function instead of a constexpr, though it does inline this every time.
Is there a simpler and more standards compliant way of doing this ?
For embedded code these definitions are required all the time, so it would be nice if there was a simple way of doing this that does not require function definitions.
The answers to the previous questions generally resulted in a answer that says this is not allowed in the standard and leaves it at that.
That is not really what I want. I need to get a compliant way of using C++ to generate compile time constant pointers to a fixed address. I want to do it without using Macros as that sprinkles my code with casts that cause problems with compliance checkers. It results in the need to get compliance exceptions in multiple places rather then one. Each exception is a process and takes time and effort.
Constexpr guarantees, on embedded systems, that the constant is placed in .text section (flash) whilst const does not. It may be placed in valuable ram and initialised by the .bss startup code. Typically embedded devices have much more flash then RAM. Also the code to access variables in RAM is often much more inefficient as it typically involves at least two memory access on embedded targets such as ARM. One to load the variable's RAM address and the second to load the actual constant pointer value from the variable's location. Constexpr results in the constant pointer being coded directly into the instruction stream or results in a single constant load.
If this was just a single instance it would not be an issue, but you generally have many different peripherals each controlled via there own register sets and then this becomes a problem.
A lot of the embedded code ends up reading and writing peripheral registers.

Use this instead:
static uint16_t * const T30_CAL = reinterpret_cast<uint16_t *>(0x1FFFF7B8u);
GCC will store T30_CAL in flash on an ARM target, not RAM. The point is that the 'const' must come after the '*' because it is T30_CAL that is const, not what T30_CAL points to.

As was already pointed out in the comments: reinterpret_cast is not allowed in a constant expression This is because the compiler has to be able to evaluate constexpr at compile time, but the reinterpret_cast might use runtime instructions to do its job.
You already suggested to use macros. This seems a fine way for me, because the compiler will definitely not produce any overhead. However, I would not suggest using your second way of hiding the reinterpret_cast, because as you said, a function is generated. This function will likely take way more memory away than an additional pointer.
In any case, the most reasonable way seems to me, to just declare a const pointer. As soon as you use optimizations, the compiler will just insert the memory location into your executable instead of using a variable. (See https://godbolt.org/g/8KnUKg )

Address of variable needs to be loaded into memory?

I have an issue - with the following code I am trying to find out what is stored at a certain address and how long my static variable is stored at this specific position. (I read that static variables are stored infinitely and was quite surprised - wanted to test if this was true).
The code defines a static variable (its address on my system is 0x1000020c0 - This is probably rather random but was continuously the case)
If I now want to find out what integer value is stored at this address I have to first print out the address with $number, which then gives 0x1000020c0. The recasting/reinterpreting of the address (0x1000020c0) gives 100 only! if the address was printed before or if I use &number in the reinterpreting/recasting.
Can someone explain why this is the case?
int static number = 100;
// std::cout << number << std::endl; <- prints 100
// prints 100 and the address 0x1000020c0 in my case
// std::cout << number << " " << &number << std::endl;
// this does not work unless &number is printed previously
// std::cout << "Value is : " << *reinterpret_cast<int*>(0x1000020c0) << std::endl;
// this does work and show the correct value (100)
std::cout << "Value is : " << *reinterpret_cast<int*>(&number) << std::endl;

In any given program, the object might, or might not be stored in the address 0x1000020c0. There are no guarantees either way. The address of the object is decided at compile (or possibly at link) time. A change to the program can change the address.
If you never take the address of the local static object, and never modify it, the compiler may optimize the variable away, so that no memory is used. If the object doesn't exist at all, then it definitely doesn't exist at the memory location 0x1000020c0.
If you use the object in a way that requires the object to exist, it will be in some memory location. Taking the address of the object usually triggers such requirement. This is strikingly similar to observer effect in physics.
If you dereference a pointer which does not point to an object (of appropriate type), the behaviour is undefined.
when I print recast/reinterpret the value that is at 0x1000020c0 it prints nothing
As I explained above, the object is not guaranteed to exist at the memory location 0x1000020c0.
even though the object was used since i printed its value via std::cout << number;
Accessing the value of an object doesn't necessarily require the object to exist. The compiler may be able to prove that the value of the object is 100, so it can store that value as a constant and not store static object at all.
Besides, even if the static object did exist, it wouldn't necessarily exist in the address 0x1000020c0, unless you take the address and observe it to be so.
As a consequence: Don't ever cast an arbitrary number to a pointer (unless you work on some embedded platform that has hardcoded memory mappings). Seeing that the address of an object in one program is 0x1000020c0, doesn't make 0x1000020c0 non-arbitrary in another program.

Assuming specific address of number is not the best idea.
In C++, static inside a function body is created at first invocation or when first time C++ program flow encounters the variable. They are never created if never used.
If possible, a compiler may choose to optimize the static and replace it with the value.

You're missing one very major statement here - the platform you're running this code on.
A static variable isn't stored "infinitely"; it's stored in that location for the duration of the program execution. If you're running on an embedded platform where your code jumps to main() at power-up, then you don't have anything else which will get in the way. If you're running on any other platform (Windows, Linux or whatever), that location becomes free when the program completes, and is immediately available for anything else to use.
It's possible that if you run your program, it completes, and you run it again, then maybe you'll get the same chunk of memory for your program to run in. In that case you'll get the same addresses for static variables. If something else has asked for a chunk of memory between your first run finishing and the next run starting (e.g. Chrome needed a bit more space for the pictures you were browsing), then your program won't be given the same chunk of memory and the variable won't be in the same place.
It gets more fun for DLLs. The same kind of rules apply, except they apply for the duration of the DLL being loaded instead of for the duration of program execution. A DLL could be loaded on startup and stay loaded all the way through, or it could be loaded and unloaded by applications as needed.
All this means that you're making some very strange assumptions. If you get the address of a static variable in your program, and then your program checks the contents of that address, you'll always get whatever's in that static variable. That's how static variables work. If you run your code twice, you'll be getting the address of the location for that static variable at your next run, as set up by your program when you run it that second time. In between runs of your program, that address is 100% free for anything else to use.
As others have already pointed out, after this you may also be seeing effects of compiler optimisation in the specific behaviour you're asking about. But the reason you're asking about this specific behaviour is that you seem to have misunderstood something fundamental to how static variables work.

Why can function pointers be `constexpr`?

How does the compiler know where in memory the square root will be before the program is executed? I thought the address would be different everytime the program is executed, but this works:
constexpr double(*fp)(double) = &sqrt;
cout << fp(5.0);
Is it because the address is relative to another address in memory? I don't think so because the value of fp is large: 0x720E1B94.

At compile time, the compiler doesn't know the address of sqrt. However, you cannot do anything at compile time with a constexpr function pointer that would allow you to access that pointer's address. Therefore, a function pointer at compile time can be treated as an opaque value.
And since you can't change a constexpr variable after it has been initialized, every constexpr function pointer can be boiled down to the location of a specific function.
If you did something like this:
using fptr = float(*)(float);
constexpr fptr get_func(int x)
{
return x == 3 ? &sqrtf : &sinf;
}
constexpr fptr ptr = get_func(12);
The compiler can detect exactly which function get_func will return for any particular compile time value. So get_func(12) reduces down to &sinf. So whatever &sinf would compile to is exactly what get_func(12) would compile to.

Address value is assigned by a linker, so the compiler does not know the exact address value.
cout << fp(5.0);
This works because it is evaluated at run-time after exact address has been resolved.
In general, you cannot use the actual value (address) of constexpr pointer because it is not known at compile-time.
Bjarne Stroustrup's C++ Programming language 4th edition mentions:
10.4.5 Address Constant Expressions
The address of a statically allocated object (§6.4.2), such as a global variable, is a constant. However, its value is assigned by the linker, rather than the compiler, so the compiler cannot know the value of such an address constant. That limits the range of constant expressions of pointer and reference type. For example:
constexpr const char∗ p1 = "asdf";
constexpr const char∗ p2 = p1; // OK
constexpr const char∗ p2 = p1+2; // error : the compiler does not know the value of p1
constexpr char c = p1[2]; // OK, c==’d’; the compiler knows the value pointed to by p1

How does the compiler know where in memory the square root will be before the program is executed?
The tool chain gets to decide where it puts the functions.
Is it because the address is relative to another address in memory?
If the produced program is either relocatable or position independent then yes, that's the case. If the program is neither, then the address can even be absolute.
Why would the exact same memory spots be available next time the program is run?
Because the memory space is virtual.

It's simple.
Consider how compiler knows the address to call in this code:
puts("hey!");
Compiler has no idea of the location of puts, and it also doesn't add a runtime lookup for it (that'd be rather bad for performance, though it is actually what virtual methods of classes need to do). The possibility of having a different version of dynamic library at runtime (not to mention address space layout randomization even if it is the exact same library file) makes sure the build time toolchain linker doesn't know it either.
So it's up to the dynamic linker to fix the address, when it starts the compiled binary program. This is called relocation.
Exact same thing happens with your constexpr: compiler adds every place in the code using this address to the relocation table, and then dynamic linker does its job every time the program starts.

does a variable consume memory in addition to just its content (e.g. type, location)?

Quite likely this has been asked/answered before, but not sure how to phrase it best, a link to a previously answered question would be great.
If you define something like
char myChar = 'a';
I understand that this will take up one byte in memory (depending on implementation and assuming no unicode and so on, the actual number is unimportant).
But I would assume the compiler/computer would also need to keep a table of variable types, addresses (i.e. pointers), and possibly more. Otherwise it would have the memory reserved, but would not be able to do anything with it. So that's already at least a few more bytes of memory consumed per variable.
Is this a correct picture of what happens, or am I misunderstanding what happens when a program gets compiled/executed? And if the above is correct, is it more to do with compilation, or execution?

The compiler will keep track of the properties of a variable - its name, lifetime, type, scope, etc. This information will exist in memory only during compilation. Once the program has been compiled and the program is executed, however, all that is left is the object itself. There is no type information at run-time (except if you use RTTI, then there will be some, but only because you required it for your program to function - such as is required for dynamic_casting).
Everything that happens in the code that accesses the object has been compiled into a form that treats it exactly as a single byte (because it's a char). The address that the object is located at can only be known at run-time anyway. However, variables with automatic storage duration (like local variables), are typically located simply by some fixed offset from the current stack frame. That offset is hard-baked into the executable.

Wether a variable contains extra information depends on the type of the variable and your compiler options. If you use RTTI, extra information is stored. If you compile with debug information then there will also extra overhead be added.
For native datatypes like your example of char there is usually no overhead, unless you have structs which also can cotnain padding bytes. If you define classes, there may be a virtual table associated with your class. However, if you dynamically allocate memory, then there usually will be some overhead along with your allocated memory.
Somtimes a variable may not even exist, because the optimizer realizes that there is no storage needed for it, and it can wrap it up in a register.
So in total, you can not rely on counting your used variables and sum their size up to calculate the amount of memory it requires because there is not neccessarily a 1:1: relation.

Some types can be detected in compile type, say in this code:
void foo(char c) {...}
it is obvious what type of variable c in compile time is.
In case of inheritance you cannot know the real type of the variable in the compile type, like:
void draw(Drawable* drawable); // where drawable can be Circle, Line etc.
But C++ compiler can help to determine the type of the Drawable using dynamic_cast. In this case it uses pointer to a virtual method tables, associated with an object to determine the real type.

const variable in c++

these are the some silly question ..i want to ask..please help me to comprehend it
const int i=100; //1
///some code
long add=(long)&i; //2
Doubt:for the above code..will compiler first go through the whole code
for deciding whether memory should be allocated or not..or first it ll store the
variable in read only memory place and then..allocate stroage as well at 2
doubt:why taking address of variable enforce compiler to store variable on memory..even
though rom or register too have address

In your code example, add contains the address, not the value, of i. I believe you may have thought that i was not stored in normal memory unless/until you take its address. This is not the case.
const does not mean the value is stored in ROM. It is stored in normal memory (often the stack) just like any other variable. const means the compiler will go to some lengths to prevent you from modifying the value.
const is not, and was never intended, to be some sort of security mechanism. If you obtain the address of the memory and want to modify it, you can do so. Of course this is almost always a bad idea, but if you really need to do it, it is possible.

I never wrote a compiler implementing this, but I think that it would be simple to just handle the variable as a normal variable but using the constant value where the variable value is used and using the address of the variable if the address is used.
If at the end of the scope of the variable no one took the address then I can just drop it instead of doing a real allocation because for all other uses the constant value has been used instead of compiling a variable loading operation.

constant values (not the only use for const, but the one used here) are not 'stored in normal memory' (nor in ROM, of course). the compiler simply uses the value (100 in this case) whenever the code uses the variable.
Of course, if the value isn't stored anywhere, there's no meaning of an address for the constant.
Other uses of const are stored in 'normal memory', and you can take their address, but the result is a 'pointer to const value', so it's (in principle) unusable for modification of the value. A hard cast would of course change that, so they trigger a nasty compiler warning.
also, remember that the C/C++ compiler operates totally at compile time (by definition!), it's nothing unusual that some use at a later part affects the code generation of an early part.
A very obvious example is the declaration of stack variables: the compiler has to take into account all the variables declared at any given level to be able to generate the stack allocation at the block entry.

I am a little confused about what you are asking but looking at your code:
i = 100 with a address of 0x?????????????
add = whatever the address is stored as a long int

There is no (dynamic) memory allocation in this code. The two local variables are created on stack. The address of i is taken and brutally cast into long, which is then assigned to the second variable.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js