Why can function pointers be `constexpr`? - c++

How does the compiler know where in memory the square root will be before the program is executed? I thought the address would be different everytime the program is executed, but this works:
constexpr double(*fp)(double) = &sqrt;
cout << fp(5.0);
Is it because the address is relative to another address in memory? I don't think so because the value of fp is large: 0x720E1B94.

At compile time, the compiler doesn't know the address of sqrt. However, you cannot do anything at compile time with a constexpr function pointer that would allow you to access that pointer's address. Therefore, a function pointer at compile time can be treated as an opaque value.
And since you can't change a constexpr variable after it has been initialized, every constexpr function pointer can be boiled down to the location of a specific function.
If you did something like this:
using fptr = float(*)(float);
constexpr fptr get_func(int x)
{
return x == 3 ? &sqrtf : &sinf;
}
constexpr fptr ptr = get_func(12);
The compiler can detect exactly which function get_func will return for any particular compile time value. So get_func(12) reduces down to &sinf. So whatever &sinf would compile to is exactly what get_func(12) would compile to.

Address value is assigned by a linker, so the compiler does not know the exact address value.
cout << fp(5.0);
This works because it is evaluated at run-time after exact address has been resolved.
In general, you cannot use the actual value (address) of constexpr pointer because it is not known at compile-time.
Bjarne Stroustrup's C++ Programming language 4th edition mentions:
10.4.5 Address Constant Expressions
The address of a statically allocated object (§6.4.2), such as a global variable, is a constant. However, its value is assigned by the linker, rather than the compiler, so the compiler cannot know the value of such an address constant. That limits the range of constant expressions of pointer and reference type. For example:
constexpr const char∗ p1 = "asdf";
constexpr const char∗ p2 = p1; // OK
constexpr const char∗ p2 = p1+2; // error : the compiler does not know the value of p1
constexpr char c = p1[2]; // OK, c==’d’; the compiler knows the value pointed to by p1

How does the compiler know where in memory the square root will be before the program is executed?
The tool chain gets to decide where it puts the functions.
Is it because the address is relative to another address in memory?
If the produced program is either relocatable or position independent then yes, that's the case. If the program is neither, then the address can even be absolute.
Why would the exact same memory spots be available next time the program is run?
Because the memory space is virtual.

It's simple.
Consider how compiler knows the address to call in this code:
puts("hey!");
Compiler has no idea of the location of puts, and it also doesn't add a runtime lookup for it (that'd be rather bad for performance, though it is actually what virtual methods of classes need to do). The possibility of having a different version of dynamic library at runtime (not to mention address space layout randomization even if it is the exact same library file) makes sure the build time toolchain linker doesn't know it either.
So it's up to the dynamic linker to fix the address, when it starts the compiled binary program. This is called relocation.
Exact same thing happens with your constexpr: compiler adds every place in the code using this address to the relocation table, and then dynamic linker does its job every time the program starts.

Related

How does C++ know how many bytes to copy when retuning an object from a function?

This question popped into my head when I saw the following warning.
struct Wrapper{
char somechar[10];
};
auto returnFoo() {
char somechar[10];
return somechar; // Warning: address of stack memory returned!
}
auto returnFoo() {
Wrapper wpr;
return wpr; // OK
}
I understand the warning. But I don't get why it goes away when I use the wrapper.
I guess the compiler can figure out how big the class is and copy the right amount of bytes but can't do the same for the pointer (char[])??
Where is the "Size" of the class stored (I guess not in the stack since that would be wasteful, maybe in the data segment?)
EDIT:
I think I phrased the question wrong, I know about sizeof and have some experience with CPP. I get that char[] decays to a char*. And I understand that the first example returns a pointer, and the second one returns an object.
What I'm asking is more how the internals of the compiler work.
For the compiler the memory layout is excactly the same (give or take), what it needs to do is copy the array and return it back. It needs to know how many bytes to copy though. My question was, why is it able to figure it out when using an object, but not with a char[]
Functions cannot return arrays in C++. This is simply how the language is specified. This is the same in C.
In the first example, the array name decays to a pointer to first element of the array, and the deduced return type is pointer to char. Pointers can be returned, but the pointer will be invalid because the automatic storage of the pointed object has been deallocated when the function returns.
I don't get why it goes away when I use the wrapper.
When you use the wrapper, you don't return a pointer. You return copy of the local class instance. The array is stored within the class.
I guess the compiler can figure out how big the class is
The compiler knows how the class has been defined, and therefore it knows the size. You can find out what the compiler knows yourself by using sizeof(Wrapper).
Where is the "Size" of the class stored
Depends on language implementation. In a typical case, the compiler will generate instruction for CPU such as "increment stack pointer by 10", because the compiler knows that the size is 10. It knows the size because it knows the definition.
Where is this definition stored (the symbol table, the data segment, the stack, the text segement)?
None of those. You've stored the definition in the source file. The compiler will parse that and store the information in some internal representation within the memory of the compiler process.
It is the compiler which produces the instructions for the CPU, so only the compiler needs to know the size. The produced program doesn't produce instructions (self modification is not possible in C++ language), so it doesn't need to know about type definitions.

How is the type of a pointer implemented in c++?

Pointer types like int*, char*, and float* point to different types. But I have heard that pointers are simply implemented as links to other addresses - then how is this link associated with a type that the compiler can match with the type of the linked address (the variable at this location)?
Types are mostly compile time things in c++. A variable's type is used at compile time to determine what the operations (in other C++ code) do on that variable.
So a variable bob of type int* when you ++ it, maps at runtime to a generic pointer-sized integer being increased by sizeof(int).
To a certain extent this is a lie; C++'s behavior is specified in terms of an abstract machine, not a concrete one. The compiler interprets your code as expressing operations on that abtract machine (that doesn't exist), then writes concrete assembly code that realizes those operations (insofar as they are defined) on concrete hardware.
In that abstract machine, int* and double* are not just numbers. If you dereference an int* and write to some memory, then do the same with a double*, and the memory overlaps, in the abstract machine the result is undefined behavior.
In the concrete implementation of that abstract machine, pointers-as-numbers as int* or double* dereferenced with the same address results in quite well defined behavior.
This difference is important. The compiler is free to assume the abstract machine (where int* and double* are very distinct things) is the only reality that matters. So if you write to a int*, write to a double* then read back from the int* the compiler can skip the read back, because it can prove that in the abstract machine writing to a double* cannot change a the value that an int* points to.
So
int buf[10]={0};
int* a = &buff[0];
double* d = reinterpret_cast<double*>(&buff[0]);
*a = 77;
*d = 3.14;
std::cout << *a;
the apparent read at std::cout << *a can be skipped by the compiler. Meanwhile, if it actually happened on real hardware, it would read bits generated by the *d write.
When reasoning about C++ you have to think of 3 things at once; what happens at compile time, the abstract machine behavior, and the concrete implementation of your code. In two of these (compile time and abstract machine) int* is implemented differently than float*. At actual runtime, int* and float* are both going to be 64 or 32 bit integers in a register or in memory somewhere.
Type checking is done at compile time. The error happens then, or never, excluding cases of RTTI (runtime type information).
RTTI is things like dynamic_cast, which does not work on pointers to primitives like float* or int*.
At compile time that variable carries with it the fact it is a int* everywhere it goes. In the abstract machine, ditto. In the concrete compiled output, it has forgotten it is an int*.
There's no particular "link" at this stage, nor any hidden meta-data stored somewhere. Since C and C++ are compiled and eventually produce a standalone executable, the compiler "trusts" the programmer and simply provides him with a data type that represents a memory address.
If there's nothing explicitly defined at this address, you can use void * pointer. If you know that this will be the location of something in particular, you can qualify it with a certain data type like int * or char *. The compiler will therefore be able to directly access the object that lies behind but the way this address is stored remains the same in every case, and keep the same format.
Note that this qualification is done at compilation time only. It totally disappear in the definitive executable code. This means that this generated code will be produced to handle certain kinds of objects, but nothing will tell you which ones at first if you disassemble the machine code. You'll have to figure this out by yourself.
Variables represent data which is stored in one or more memory cells or "bytes". The compiler will associate this group of bytes with a name and a type when the variable is defined.
The hardware uses a binary number to access a memory cell. This is known as the "address" of the memory cell.
When you store some data in a variable, the compiler will look up the name of the variable and check that the data you want to store is compatible with its type. If it is, it then generates code which will save it in the memory cell(s) at that address.
Since this address is a number, it can itself be stored in a variable. The type of this address variable will be "pointer to T", where T is the type of the data stored in that address.
It is the responsibility of the programmer to make sure that this address variable does correspond to valid data and not some random area of memory. The compiler will not check this for you.

constexpr pointers and memory management in C++

Quoting from C++ Primer:
The address of an object defined outside of any function is a constant expression, and so may be used to initialize a constexpr pointer.
In fact, each time I compile and run the following piece of code:
#include <iostream>
using namespace std;
int a = 1;
int main()
{
constexpr int *p = &a;
cout << "p = " << p << endl;
}
I always get the output:
p = 0x601060
Now, how is that possible? How can the address of an object (global or not) be known at compile time and be assigned to a constexpr? What if that part of the memory is being used for something else when the program is executed?
I always assumed that the memory is managed so that a free portion is allocated when a program is executed, but doesn't matter what particular part of the memory. However, since here we have a constexpr pointer, the program will always require a specific portion, that has to be free to allow the program execution. This doesn't make sense to me, could someone explain this behaviour please? Thanks.
EDIT: After reading your answers and a few articles online, I realized that I missed the whole concept of virtual memory... now it makes sense. It's quite surprising that neither C++ Primer nor Accelerated C++ mention this concept (maybe they will do it in later chapters, I'm still reading...).
However, quoting again C++ Primer:
A constant expression is an expression whose value cannot change and that can be evaluated at compile time.
Given that the linker has a major role in computing the fixed address of global objects, the book would have been more precise if it said "constant expression can be evaluated at link time", not "at compile time".
It's not actually true that the address of an object is known at compile time. What is known at compile time is the offset. When the program is compiled, the address is not emitted into the object file, but a marker to indicate the offset and the section.
To be simplistic about it, the linker then comes along, measures the size of each section, stitches them together and calculates the address of each marker in each object file now that it has a concrete 'base address' for each section.
Of course it's not quite that simple. A linker can also emit a map of the locations of all these adjusted values in its output, so that a loader or load-time linker can re-adjust them just prior to run time.
The point is, logically, for all intents and purposes, the address is a constant from the program's point of view. It's just that the constant isn't given a value until link/load time. When that value is available, every reference to that constant is overwritten by the linker/loader.
If your question is "why is it always the same address?" It's because your OS uses a standard virtual memory layout layered over the virtual memory manager. Addresses in a process are not real memory addresses - they are logical memory addresses. The piece of silicon at that 'address' is mapped in by the virtual memory management circuitry. Thus each process can use the "same" address, while actually using a different area of the memory chips.
I could go on about paging memory in and out, which is related, but it's a long topic. Further reading is encouraged.
It works because global variables are in static storage.
This is because the space for the global/static variable is allocated at compile time within the binary your compiler generates, in a region next to the program's machine code called the "data" segment. When the binary is copied and loaded into memory, the data segment becomes read-write.
This Wikipedia article includes a nice diagram of where the "data" segment fits into the virtual address space:
https://en.wikipedia.org/wiki/Data_segment
Automatic variables are not stored in the data segment because they may be instantiated as many times as their parent function is called. Moreover, they may be allocated at any depth of the stack. Thus it is not possible to know the address of an automatic variable at compile time in the general case.
This is not the case for global variables, which are clearly unique throughout the lifetime of the program. This allows the compiler to assign a fixed address for the variable which is separate from the stack.

The address of const variable, C++

Recently I was rereading the Effective C++ by Scott Meyers (3-rd edition). And according to Meyers:
"Also, though good compilers won’t set
aside storage for const objects of integral types (unless you create a
pointer or reference to the object), sloppy compilers may, and you may
not be willing to set aside memory for such objects."
Here in my code I can print the address of const variable, but I have not created a pointer or reference on it. I use Visual Studio 2012.
int main()
{
const int x = 8;
std::cout<<x<<" "<<&x<<std::endl;
}
The output is:
8 0015F9F4
Can anybody explain my the mismatch between the book and my code? Or I have somewhere mistaken?
By using the address-of operator on a variable, you are in fact creating a pointer. The pointer is a temporary object, not a declared variable, but it's very much there.
Furthermore there is a declared variable of pointer type that points to your variable: the argument to the overloaded operator << that you used to print the pointer.
std::cout<<x<<" "<<&x<<std::endl;
You tried to get the address of the variable x,so the compiler thinks it is necessary to generate codes to set aside storage for const objects.
By &x, you ODR-used the variable, which makes allocating actual storage for x necessary.
A good compiler (when using optimizations) will try to replace any compile-time constant by its value in your code to avoid making a memory access. However, if you do request the address of a constant (like you do) it can't do the optimization of not allocating memory to it.
However, one important thing to note is that it doesn't mean the research and replace wasn't done in your code. As you are not supposed to change the value of the constant, the compiler will assume it is safe to do a "research and replace" on it. If you do change the value with a const_cast you will get undefined behavior. It tends to work fine if you compile in debug but usually fails if your compiler optimizes the code.
In C++,for basic data type constants, the compiler will put it in the symbol table without allocating storage space, and ADT(Abstract Data Type)/UDT(User Defined Type) const object will need to allocate storage space (large objects). There are some cases also need to allocate storage space, such as forcing declared as extern symbolic constants or take the address of symbolic constants,etc.

I'm changing the value of a const variable by accessing the memory location. Why doesn't it work?

I am trying to understand const in c++.
I wrote this following code snippet:
const int x=5;
int *ptr;
ptr=(int*)&x;
cout<<"address of x="<<&x<<endl;
cout<<"value of ptr="<<ptr<<endl;
*ptr=11;
cout<<"*ptr="<<*ptr<<endl;
cout<<"x="<<x;
The output is
address of x=0x28fef8
address of ptr=0x28fef8
*ptr=11
x=5
Since ptr is pointing to x, i was sure the value of the *ptr and x would be the same.
Why are the values different?
I understand that x is const, however, i am changing the value at the memory address by doing *ptr .
Please tell me what am i missing.
Your C style cast is removing the constness, making the assignment possible. After that you write to a value declared const. This invokes undefined behavior and afterwards everything goes. This also means that there is no way to explain the output your are seeing. Most likely the compiler assumed the value never changes and just used constant folding, hence you get x=5, but we will never know for sure.
The take-away: C-style casts are evil and almost never needed.
Official answer (according to the C++ language standard):
Undefined behavior.
Practical answer (depending on compiler implementation):
With a global const int x=5, the variable is allocated in the RO-data section of the executable image.
The result of executing *ptr=11 will therefore be an illegal memory access exception during runtime.
With a local const int x=5, the variable is allocated on the (RW) stack section of the executable image.
But since x is constant, the compiler replaces every r-value reference of this variable with the value of 5.
"const" tells the compiler, if anyone writes code that tries to modify this object/variable then throw a compile error to prevent them from doing so, and by the way you are free to do any optimization assuming that the value did not change in this context. A C style cast / const_cast says, I know what I am doing, don't bother throwing errors at me as I am going to do it anyway. So by doing both you are living dangerously. Sometimes your get away with it, sometimes the system gets you. Whether variables placed in .rodata cause exceptions when you try to write is platform dependent. If you do not have hardware memory protection and all your code runs from RAM, then you can pretty much write where you want, including overwrite code (.text). For me the beauty of const is that it is infectious (others have called it evil or messy just because of this ).