Does binding a reference actually evaluate the operand? - c++

Consider this code:
int& x=*new int;
Does RHS of the assignment actually dereference the newly-created pointer, leading to UB due to reading uninitialized variable? Or can this be legitimately used to later assign a value like x=5;?

As far as I can tell, none of what you've done involves undefined behavior.
It does, however, immediately create a risk of a memory leak. It can be quickly resolved (since &x would resolve to the address of the leaked memory, and can therefore be deleted) but if you were to leave scope, you would have no way of retrieving that pointer.
Edit: to the point, if you were to write
int& x=*new int;
x = 5;
std::cout << x << std::endl;
std::cin >> x;
std::cout << x << std::endl;
The code would behave as though you had simply declared x as int x;, except that the pointer would also be left dangling after the program exited scope.
You would achieve undefined behavior if you were to attempt to read the uninitialized variable before assigning it a value, but that wouldn't be untrue if x were stack allocated.

I would not advise doing this. Aliasing dynamic memory with a variable that looks static seems dangerous to me. Consider the following code
#include <iostream>
int* dynamicAlloc() {
int& x = *new int;
std::cout << "Inside dynamicAlloc: " << x << std::endl;
std::cout << "Modified: " << (x = 1) << std::endl;
return &x;
}
int main() {
int *dall = dynamicAlloc();;
std::cout << "dall (address): " << dall << std::endl;
std::cout << "dall -> " << *dall << std::endl; // Memory not cleaned
delete dall; // Don't forget to delete
return 0;
}
I get output like this:
Inside dynamicAlloc: -842150451
Modified: 1
dall (address): 00F642A0
dall -> 1
Note how dereferencing dall doesn't result in a segfault. It wasn't deallocated when x fell out of scope. Otherwise a segfault would have likely occurred.
In conclusion:
int& x makes x an alias for the dynamically allocated memory. You can legitimately modify that memory, but it will not cleanup as it falls out of scope automatically easily leading to potential memory leaks.

Related

What's going on, when trying to print uninitialized string

I'm just decided to test malloc and new. Here is a code:
#include <iostream>
#include <string>
struct C
{
int a = 7;
std::string str = "super str";
};
int main()
{
C* c = (C*)malloc(sizeof(C));
std::cout << c->a << "\n";
std::cout << c->str << "\n";
free(c);
std::cout << "\nNew:\n\n";
c = new C();
std::cout << c->a << "\n";
std::cout << c->str << "\n";
}
Why an output of this program stops at std::cout << c->a << "\n";:
-842150451
C:\Code\Temp\ConsoleApplication12\x64\Debug\ConsoleApplication12.exe (process 22636) exited with code 0.
Why does compiler show no errors - I thought, std::string isn't initialized properly in case of malloc, so it should break something.
If I comment out printing of the string, I'm getting a full output:
-842150451
New:
7
super str
C:\Code\Temp\ConsoleApplication12\x64\Debug\ConsoleApplication12.exe (process 21652) exited with code 0.
I use MSVS2022.
You've used malloc. One of the reasons to not do this is that it hasn't actually initialized your object. It's just allocated memory for it. As a result, when accessing member fields, you get undefined behavior.
You have also forgotten to delete the C object you created with new. But you may wish to use a std::unique_ptr in this scenario, to avoid having to explicitly delete the object at all. The smart pointer will automatically free the memory when it goes out of scope at the end of main.
auto c = std::make_unique<C>();
std::cout << c->a << std::endl;
std::cout << c->str << std::endl;
(C*) is an Explicit Conversion. It tells the compiler to shut up and do exactly what it was told to do, no matter how stupid that may be. You don't get an error message because the code explicitly told the compiler not to give you one.
To allow the programmer and compiler the maximum amount of leeway, C++ will not prevent you from doing some pretty insane things. It assumes you know what you are doing and are not trying to shoot yourself.

Force C++ to assign new addresses to arguments

It seems that when I pass different integers directly to a function, C++ assigns them the same address as opposed to assigning different addresses to different values. Is this by design, or an optimization that can be turned off? See the code below for an illustration.
#include <iostream>
const int *funct(const int &x) { return &x; }
int main() {
int a = 3, b = 4;
// different addresses
std::cout << funct(a) << std::endl;
std::cout << funct(b) << std::endl;
// same address
std::cout << funct(3) << std::endl;
std::cout << funct(4) << std::endl;
}
The bigger context of this question is that I am trying to construct a list of pointers to integers that I would add one by one (similar to funct(3)). Since I cannot modify the method definition (similar to funct's), I thought of storing the address of each argument, but they all ended up having the same address.
The function const int *funct(const int &x) takes in a reference that is bound to an int variable.
a and b are int variables, so x can be bound to them, and they will have distinct memory addresses.
Since the function accepts a const reference, that means the compiler will also allow x to be bound to a temporary int variable as well (whereas a non-const reference cannot be bound to a temporary).
When you pass in a numeric literal to x, like funct(3), the compiler creates a temporary int variable to hold the literal value. That temporary variable is valid only for the lifetime of the statement that is making the function call, and then the temporary goes out of scope and is destroyed.
As such, when you are making multiple calls to funct() in separate statements, the compiler is free to reuse the same memory for those temporary variables, eg:
// same address
std::cout << funct(3) << std::endl;
std::cout << funct(4) << std::endl;
Is effectively equivalent to this:
// same address
int temp;
{
temp = 3;
std::cout << funct(temp) << std::endl;
}
{
temp = 4;
std::cout << funct(temp) << std::endl;
}
However, if you make multiple calls to funct() in a single statement, the compiler will be forced to make separate temporary variables, eg:
// different addresses
std::cout << funct(3) << std::endl << funct(4) << std::endl;
Is effectively equivalent to this:
// different addresses
{
int temp1 = 3;
int temp2 = 4;
std::cout << funct(temp1) << std::endl << funct(temp2) << std::endl;
}
Demo
The function
const int *funct(const int &x) { return &x; }
will return the address of whatever x is referencing.
So this will, as you expected, print the address of a:
std::cout << funct(a) << std::endl;
The problem with the expression funct(3) is that it is impossible to make a reference of a constant and pass it as a parameter. A constant doesn't have an address, and therefore for practical reasons C++ doesn't support taking a reference of a constant. What C++ actually does support is making a temporary object, initializing it with the value 3, and taking the reference of that object.
Basically, the compiler will, in this case, translate this:
std::cout << funct(3) << std::endl;
into something equivalent to this:
{
int tmp = 3;
std::cout << funct(tmp) << std::endl;
}
Unless you do something to extend the lifetime of a temporary object, it will go out of scope after the function call (or right before the next sequence point, I am not sure).
Since the temporary created by 3 goes out of scope before you create a temporary from 4, the memory used by the first temporary may be reused for the second temporary.

Does a const variable store the value in the variable?

I was messing around in C++ and wrote this code
#include <iostream>
int main() {
const int x = 10;
const int* ptr = &x;
*((int*)ptr) = 11;
std::cout << "ptr -> " << ptr << " -> " << *((int*)ptr) << std::endl;
std::cout << "x -> " << &x << " -> " << x << std::endl;
}
Interestingly, the output in the console is:
ptr -> 00CFF77C -> 11
x -> 00CFF77C -> 10
I checked the value at the memory address of ptr and the value is 11, so does this imply that const variables store the value in them, and not in memory? And if this were the case, why have a memory address storing the value?
Your code has Undefined Behavior.
x is a compile time constant, so the compiler expects it to not change value at runtime. As such, the compiler is allowed to optimize code by substituting uses of x's value with the literal value 10 at compile time.
So, at the point where << x is used, your compiler is emitting code for << 10 instead.
But, since your code is also using &x, the compiler is required to make sure that x has a memory address accessible at runtime. That means allocating a physical int and initializing its value to 10. Which you are then fiddling with in an unsafe manner, despite the fact that most compilers will likely place that int in read-only memory.
That is why you are seeing the output you see. But you are lucky your code is not crashing outright, trying to assign a new value to a constant at runtime.
The behavior of a program that writes to a const variable is undefined. It could print "10", it could print "11", it could print "3822", it could crash, or it could make demons fly out of your nose.
That said, in this case the behavior makes perfect sense. The compiler knows x is never allowed to change, so it transforms your std::cout << x to basically std::cout << 10.

C++ Why does my code for overwriting const int x = *(&y); work?

Why does my code for overwriting const int variable work? Is it safe?
#include <iostream>
#include <cstring>
using namespace std;
int z = 5;
const int x = *(&z);
int main()
{
cout << "A:" << x << ", " << &x << endl;
int y = 7;
cout << "B:" << y << ", " << &y << endl;
memcpy((int*)&x, &y, sizeof(int));
cout << "C:" << x << ", " << &x << endl;
}
Output would be:
A:5, 0x600f94
B:7, 0x7a7efb68019c
C:7, 0x600f94
I am not sure if this has been asked before since I don't know what to search for in this situation.
Answering your questions:
It's not safe; should never be done, it's example of how not to use C.
It's based on undefined behaviour; that means specification doesn't give exact instruction how such attempt should be treated in the code.
Final answer to question why it works? The answer here is true for the GCC only as other compilers may optimise/treat const different way. You need to understand that technically const int x is a declaration of a variable with qualifier. That means (until it's optimised) it has it's place in the memory (in some circumstances in read-only section). When you removed the qualifier and give the address of the variable to the memcpy() (which is dummy library call unaware of memory protection) it makes attempt to write the new data to that address. If compiler puts that variable into read-only section (I faced that epic failure in the past) the execution on any Unix would end with segmentation fault caused by writing instruction violation memory protection of the read-only memory segment used by your program to hold constant data.
In C++ real constants are qualified by constexpr, however there are other implications.

Address of Stack and Heap in C++

Correction:
I messed up with the concept of pointer address and the address the pointer points to, so the following code has been modified. And now it prints out what I want, variable a, c, i, j, k, p are on the stack, and variable b,d are on the heap. Static and global variables are on another segment. Thanks a lot for all of you!
Well, I know that these two concepts are deeply discussed...but I still have questions for the following code:
#include <iostream>
using namespace std;
class A {
};
int N = 10;
void f(int p) {
int j = 1;
float k = 2.0;
A c;
A* d = new A();
static int l = 23;
static int m = 24;
cout << "&c: " << &c << endl;
cout << "&d: " << d << endl;
cout << "&j: " << &j << endl;
cout << "&k: " << &k << endl;
cout << "&l: " << &l << endl;
cout << "&m: " << &m << endl;
cout << "&p: " << &p << endl;
}
int main() {
int i = 0;
A* a;
A* b = new A();
cout << "&a: " << &a << endl;
cout << "&b: " << b << endl;
cout << "&i: " << &i << endl;
cout << "&N: " << &N << endl;
f(10);
return 0;
}
My result is:
&a: 0x28ff20
&b: 0x7c2990
&i: 0x28ff1c
&N: 0x443000
&c: 0x28fef3
&d: 0x7c0f00
&j: 0x28feec
&k: 0x28fee8
&l: 0x443004
&m: 0x443008
&p: 0x28ff00
This is pretty interesting, coz except the global variable N, and two static variables in function f, which are l and m, the addresses of all the other variables seem to be together. (Note: The code and the results have been modified and not corresponding to what is said here.)
I've searched a lot about stack and heap. The common sense is that, if an object is created by "new", then it is on the heap. And local variables (such as j and k in the above sample) are on stack. But it seems not to be the case in my example. Does it depend on different compilers, or my understanding is wrong?
Thanks a lot for all of you.
Your understanding is wrong. For example, b is a pointer - if you want the address of the object created by new, you need to print out b, not &b. b is a local variable, so it itself (found at &b) is on the stack.
For your example, N, l, and m are presumably somewhere in your executable's data section. As you can see, they have similar addresses. Every other variable you are printing out is on the stack - their addresses are likewise similar to one another. Some of them are pointers pointing to objects allocated from the heap, but none of your printouts would show that.
Your understanding is correct.
local variables are allocated on the stack.
dynamically allocated objects are allocated on the heap.
Though you are taking the address of the local variable consistently in your example.
Example: print out d not the address of d. As d is a local variable (so the address is similar to c) but it is a pointer variable that points at a dynamically allocated object (that is on the heap).
How the compiler implements the stack and heap though will vary.
In modern OS the stack and heap may even share the same area (ie you can implement the stack by allocating chunks in the heap).
If you want to print the address of whatever d points to (in this case it points to an object on the heap), do
cout << "d: " << d << endl;
That will print the value of the pointer, and the value of a pointer is the address of the object it points to.
Your code had
cout << "&d: " << &d << endl;
This prints the address of d , as you defined d inside main, it'll be on the stack, you're printing the address of your pointer. There's the pointer itself, and the object it points to, they're 2 seperate things, with seperate addresses.
The only ones that are not 'together' in your example are l, m and N. They are two statics and one global, therefore they are not allocated on the stack for sure. They aren't from a heap neither, most likely, they are from a .data segment of your module. The only one from a heap should be the address b points to, but you're printing the address of b itself, not what it points to.
The distinction between the two forms, from a pure C++ perspective, is only to do with how lifetime of objects are managed.
From here, good read
Static variables are in the data segment. Also you mix the address of a pointer and it value. For example a:
a is a local variable of type A* on the stack. &a also gives the address where a actaully resieds (on the stack). The value of a is an address of a heap object of type A;
You can't depend on various compilers doing things the same way. For almost every bit of code you will write the distinction between stack and heap are meaningless. Don't worry about it.
One may not safely assume anything about the relative addresses of things in the stack relative to those on the heap nor, for that matter, the relative addresses of any pointers that are not all derived from the same array or allocated (via malloc, calloc, etc.) block. I'm not even sure pointers are required to be rankable.