Hello guys) I would love to get an answer for this question:
Let me share a small portion of code I have written in C++:
#include <iostream>
using namespace std;
int main() {
int* ptr;
int var = 7;
ptr = &var;
cout << "var address:" << &var << endl;
int &ref = var;
cout << "ref address:" << &ref << endl;
cout << "Var value: " << var << endl;
cout << "ref value: " << ref << endl;
return 0;
}
after writing this block of code, I've realized that the addresses that store the variables var and ref are equal. The values these two variables store are the same too. However, I just don't get it in terms of actual numbers of variables the OS memory has allocated for: did the OS allocate memory for ref and var separately or did it allocate only 1 spot in memory for these two(var and ref) variables (if so, does it mean that the spot in memory has two names/identifiers (ref and var)?
I have tried to understand what is actually going on in memory and how does it allocate memory. I wanted to know also how the memory system is actually designed since I cannot comprehend this problem just by drawing box/pointer diagrams.
References are not objects and don't have all the basic properties of object such as storage or size. There may be no storage to get an address to. If you try to get the address or size of a reference, you will instead get the address of the referred object or the size of the referred object.
From https://en.cppreference.com/w/cpp/language/reference :
References are not objects; they do not necessarily occupy storage, although the compiler may allocate storage if it is necessary to implement the desired semantics
So a reference might occupy addressable storage, but only when the implementation needs that storage to implement reference semantics (such as by using a pointer in the background). But this is an implementation detail. It is mostly hidden from the developer, and even then trying to get its address will still give a pointer to the referred object instead.
Note that pointers are objects. They occupy storage, have a size and their address can be obtained with &, just like an int. This is why you observe different behavior with references and with pointers.
In general, references are glorified pointers, that auto-deference on use, can't be changed to point to a different object after being created, and aren't allowed to be null.
In simple cases like this one, the reference will almost surely be optimized away, and every access to ref will be replaced directly with var.
But in general case, they occupy storage like pointers, though you can't easily get the address of that storage, because applying & to a reference returns the address of the target object.
Related
Sometimes, I see that there is a mix of concepts between the duration of the storage and where does this occur. That is because sometimes I've seen the following statement:
int i; // This is in the stack!
int* j = new int; // This is in the heap!
But is this really true 100% of the time? Does C++ ensure where the storage takes place? Or, is it decided by the compiler?
Is the location of the storage independent from the duration?
For example, taking those two snippets:
void something()
{
int i;
std::cout << "i is " << i << std::endl;
}
vs:
void something()
{
int* i = new int;
std::cout << "i is " << i << std::endl;
delete i;
}
Both are more or less equivalent regarding the lifetime of i, which is created at the begining and deleted at the end of the block, here the compiler could just use the stack (I don't know!), and the opposite could happen too:
void something()
{
int n[100000000]; // Man this is big
}
vs:
void something()
{
int* n = new int[100000000];
delete n;
}
Those two cases should be in the heap to avoid stack-overflow (Or at least is what I've been told so far...), does the compiler that also that into account, besides the storage duration?
Is the location of the storage independent from the duration?
A0: Duration specifies expected/required behavior.
A1: The standard does not specify how that is implemented.
A2: The standard does not even require a heap or stack!
void something()
{
int i;
std::cout << "i is " << i << std::endl;
}
void something()
{
int* i = new int;
std::cout << "i is " << i << std::endl;
delete i;
}
In the first example you have "automatic" storage duration and the second case is "dynamic" storage duration. The difference is that "automatic" will always be destroyed at the end of scope while the second will only be destroyed if the delete is executed.
Where the objects are created is not specified by the standard and completely left to the implementing.
On implementations that use an underlying heap that would be an easy implementation choice for the first example; but not a requirement. The implementation can quite easily call the OS for dynamic memory for the space required for the integer and still behave like the standard defines as long as the code to release the memory is also planted and executed when the object goes out of scope.
Conversely the easy way to implement the dynamic storage duration (second example) is to allocate memory from the runtime and then release it (assuming your implementation has this ability) when you hit the delete. But this is not a requirement. If the compiler can prove that there are not exceptions and you will always hit the delete then it could just as easily put it on the heap and destroy it normally. NOTE: If the compiler determines that the object is always leaked. It could still put it on the heap and simply not destroy it when it goes out fo scope (that is a perfectly valid implementation).
The second set of examples adds some complications:
Code:
int n[100000000]; // Man this is big
This is indeed very large. Some implementations may not be able to support this on a stack (the stack frame size may be limited by the OS or hardware or compiler).
A perfectly valid implementation is to dynamically allocate the memory for this and ensure that the memory is released when the object goes out of scope.
Another implementation is to simply pre-allocate the memory not on the stack but in the bzz (going from memory here. This an assembler zone of an application that stores memory). As long as it implements the expected behavior of calling any destructors at the end of scope (I know int does not have a destructor so it makes that easy).
Does C++ ensure where the storage takes place? Or it is decided by the compiler?
When you declare a variable like:
int i;
It has automatic storage. It could indeed be on the stack, but it's also common to just allocate a register for it, if enough registers are available. Theoretically it is also valid for the compiler to allocate heap memory for this variable, but in practice this does not happen.
When you use new, it is actually up to the standard library to allocate the memory for you. By defeault, it will use the heap. However, it could in theory also allocate the memory on the stack, but of course this would normally be the wrong thing to do, as any stack storage disappears when you return from the function where you called new.
In fact, new is just an operator, like +, and you can overload it. Typically, you would overload it inside a class, but you can also overload the global new operator (and similarly, the delete operator), and have it allocate storage from whereever you want.
Is the location of the storage independent from the duration?
In principle yes, but in practice automatic variables that only have the lifetime of the duration of a function are placed on the stack, whereas data you allocate with new is usually intended to outlive the function that called it, and that goes on the heap.
Those two cases should be in the heap to avoid stack-overflow (Or at least is what I've been told so far...), does the compiler that also that into account, besides the storage duration?
GCC and Clang never use heap allocation for variables with automatic storage as far as I can tell, regardless of their size. So you have to either use new and delete yourself, or use a container that manages the storage for you. Some containers, like std::string, will avoid heap allocations if you only store a small number of elements in them.
I am playing around with boost scoped pointers and I don't understand this behaviour:
#include <iostream>
#include <boost/scoped_ptr.hpp>
int main()
{
boost::scoped_ptr<int> p{new int{1}};
std::cout << &p << '\n';
p.reset(new int {2});
std::cout << &p << '\n';
return 0;
}
I get the following output:
0x7fff5fbff650
0x7fff5fbff650
Shouldn't the reset function change the address pointed by p?
this is the case if use a scoped array instead of a scoped pointer and print the address pointed by the first element in the code above.
When you do
std::cout << &p << '\n';
you are getting the address of p, not what p points to. To get that you need
std::cout << static_cast<void*>(p.get()) << '\n';
The static_cast<void*>() is not really needed in this example as printing a pointer, other than a char*/const char* will give you its address but I added it to just be safe.
You're taking the address of the scoped_ptr called p. There's only one of them!
If you'd printed &*p or p.get() instead (though prefer (void*)p.get() for sanity) then you'd be printing the address of the thing it currently points to.
This address will always change, because you create the second object (using new) slightly before the first one is destroyed, and objects cannot share addresses.
If you'd done a .reset() first, though, then you may or may not see this address changing, depending on what the innards of new did; objects don't have to have addresses unique to the lifetime of your program, as long as they don't share the address of another object that still exists! However, even then, in practice, to be honest, I'd be surprised if the second dynamically-allocated int wound up at the same address as the first.
You are printing address of object p with is a boost::scoped_ptr.
You should use p.get() to get address of handle object;
As I know std::move (same as static_cast<T&&>) casts variable to rvalue and assigns to lvalue, and because of this I think in following code:
int a = 1;
int b = static_cast<int&&>(a);
b and a have the same address, but in VS, it prints different addresses.
int a = 1;
int b = static_cast<int&&>(a);
cout << hex << &a << endl;
cout << hex << &b << endl;
If after this a still points to a different memory location, what is the benefit of using std::move in this case?
Just because you "move" them doesn't mean they will share the same address. Moving a value is a high level abstraction, with basic types like int moving and copying is completely the same, which is happening here. I suggest you read the excellent post on std::move to know what it does and what it's uses are.
No, b is its own object, which is copy initialized from an rvalue reference to another int. This is the same as just copying the referenced object.
Move semantics only shines when the "copying" can be preformed by resource stealing (since we know the other objects storage is about to go, anyway).
For a type like an integer, it's still a plain copy.
There is no benefit of using std::move on an int. In your example you are basically copying the value from a to b.
Move semantics is only meaningfull on resources where you want to transfer ownership, e.g. dynamically allocated memory. Take the std::unique_ptr as an example.
auto ptr = std::make_unique<int>(1);
auto ptrCopy = ptr; // copy will not work compilation error.
auto ptrMove = std::move(ptr);
In the above example ptrMove has taken over the ownership of ptr and ptr is now empty.
When you use std::move in C++ you do not move the object itself, you move the value of the object or its contents. So its address does not change.
Moving is no different from copying with an int. But for a complex type with internal pointers to allocated memory, that memory can be transferred without copying using a std::move (assuming it has been designed to respond to std::move).
I have a code snippet that looks as follows.
I assume this is a bad way to return a pointer since I am returning a local reference. What is good practice, returning DbTable copy or pointer DbTable *?
DbTable * Catalog::addTable(PartitionScheme &partScheme, BoundBases &bounds, std::vector<int> &colsPartitioned, const size_t defaultMaxFragmentSize, const TupleDesc &tupleDesc , std::string tableName){
// some code ...
DbTable * dbTable = new DbTable(tableId, basePath, defaultMaxFragmentSize, tupleDesc, partScheme, bounds, colsPartitioned);
cout << "adding dbTable with name: " << tableName << " and table Id " << tableId << endl;
// some code ...
return dbTable;
}
}
Unless there is a valid reason not to, prefer to return an object over a pointer allocated on heap.
Advantages of returning an object
Performance. It takes more time to allocate memory from heap.
Less programming errors. You have to deal with memory management issues -- make sure the pointer returned is valid, make sure the allocated memory is deallocated, make sure that the object is not deallocated behind your back leaving you with a dangling pointer, etc.
When does it make sense to return a pointer?
The sizes of your objects are large. Passing them around and keeping multiple copies would be expensive, both in the memory usage and performance.
You have a comprehensive system in place to manage life time of objects -- where they are allocated, who manages them while they are alive, and who manages their deallocation.
You have an application in which there is a deep hierarchy of objects and many functions take as input pointers to base classes but rely on the polymorphic behavior of the objects to work correctly.
I suspect there are many other reasons that support both use cases. I just listed a few that jumped out in my mind.
When returning a pointer from a function, if the pointer points to a memory which is locally allocated (not on the heap), it will get deallocated as soon as the function ends. So in that case, it is not a good practice to return the pointer.
In your case, since the memory is allocated on the heap, you can return the pointer.
#include <list>
#include <iostream>
struct Foo
{
Foo(int a):m_a(a)
{}
~Foo()
{
std::cout << "Foo destructor" << std::endl;
}
int m_a;
};
int main( )
{
std::list<Foo> a;
Foo b(10);
std::cout << &b << std::endl;
a.push_back(b);
Foo* c = &(*a.begin());
std::cout << c << std::endl;
a.erase(a.begin());
std::cout << a.size() << std::endl;
c->m_a = 20;
std::cout << c->m_a << std::endl;
std::cout << b.m_a << std::endl;
}
The result is:
0x7fff9920ee70
0x1036020
Foo destructor
0
20
10
Foo destructor
I usually think after i erase an object in a list i can't access member variable of thar object any more. But in the above I can still access c->m_a after I have erased the object what c points to,Why?
with Foo* c = &(*a.begin()); you have created a pointer to an object which you intentionally destroy (via the erase()). However the memory for the object for is still there (since this is a very simple application, and the OS did not claim it for something else).
So you effectively use memory which is not yours anymore to use.
Well, data integrity is only guaranteed as long as you have allocated that part of the memory (whether by it on the stack or on the heap using new/malloc).
What happens with your data once you have freed it, is undefined (meaning it is implementation dependent). The most efficient way of freeing memory is by simply marking the memory as being available, leaving your data there until another program claims that part of the memory using malloc and overwrites it. This is how most implementations will handle this.
C++ does not check whether the data you are reading or writing belongs to your program. That is why you get a segmentation fault when your program tries to write data to a place in the memory it does not have access to.
In your case you will free the memory and then you check for its value immediately. C++ will happily execute your code. Since you only freed it recently, the odds are very high that your data is still there (but certainly not guaranteed: sooner or later it will get overwritten).
Welcome to the wild world of pointers. What exactly you got here is case of Dangling Pointer (Read up the wiki article, it explains it in detail).
Basically what happened here is that after removing the item from the list, the pointer c became dangling pointer (it is pointer to a memory location which is no more occupied by Foo object). But still C++ will allow you to read/write through this pointer, but the side affects will be totally non deterministic (means anything can happen). As the code is simple s during testing your code you just got lucky (or unlucky as these type of problems can become very difficult and dangerous as they get old).