Got to thinking about the this pointer (from what I can tell it's not a pointer per se but rather an expression resulting in the address of the object) and started to wonder about what "this" actually refers to when an object is created and destroyed within a function scope? So not created using the "new" operator. So something like this:
void Foo()
{
SomeObject o;
}
What exactly happens when an object is created as described above and what happens with "this" when it is?
this is a pointer to that object, within the scope of its member functions.
Every object has an address, no matter how it was allocated or what its storage duration is. So, whether or not you used new is irrelevant.
You will find, though, that the address of dynamically allocated objects is numerically distant from the address of other ones, because they're typically stored in different places in virtual memory (your "heap" vs "stack" nomenclature).
C pointers are not limited to manualy allocated memory, they can point on any part of the memory. Including zones that are not designed to be used by variables, like the code segment which is the part of memory where the machine instructions are stored to be executed.
You can see pointers as a sort of big indexes of the computer RAM. And the RAM as a big array of bytes.
When you declare an object, like in you example, the compiler take memory somewhere. This memory have its own address (the big index I was talking above) and we can use it like any other memory address.
So, in your case, if you declare:
SomeObject O;
...then the "this" pointer have the same value as a manualy declared pointer like that:
SomeObject 0;
SomeObject *MyThis = &O;
Related
This is an educational question:
If I have created a class
class bank_account
And in the main function, I declared
bank_account *pointer = new bank_account();
Then I am initializing variables such as follows
(*pointer).account_name ="Random Name";
My confusion is what is happening here because I usually declare an object with a NAME, not a pointer, if that object is a pointer, and a pointer is just some variable which holds an address to a variable. What does it mean if a pointer is declared as an object and what it is actually representing? Is the pointer to an object is referring to an invisible object?
and a pointer is just some variable which holds an address to a variable
Correction: A pointer can point at any object; Not necessarily a variable. Variables have names. There can be objects that are not directly named by a variable such as sub-objects, temporaries, and objects in dynamic storage.
In your program for example, the expression new bank_account() creates an object in dynamic storage.
What does it mean if a pointer is declared as an object
It's really unclear what you mean by "declared as an object". If you declare a pointer to have the type bank_account*, it means that it can point at an object of type bank_account, which happens to be a class.
If you declare a variable to have a pointer type, then the object named by the variable is a pointer.
and what it is actually representing?
A pointer represents the address of an object. Besides containing an address of an object, it can also have the null pointer value (which points to no object) or it can have an invalid value (an address that may have contained an object, but that object no longer exists).
Then I am initializing variables such as follows
(*pointer).account_name ="Random Name";
To be pedantic, this technically does not initialise a variable. Initialisation is performed on objects when they are created. This member variable has been created earlier and this expression assigns a value to it. But if the variable is previously uninitialised, then colloquially speaking, it would not be terribly wrong to talk about initialisation.
when I declare an object as pointer what is the pointer pointing to?
In your example program, pointer points to an object that was created in dynamic storage, using the keyword new.
In general, pointer points at some object whose address is stored in the pointer, or a pointer might not point at an object at all (invalid, or null value).
You said an object is created
Yes. The new-expression creates an object in dynamic storage.
but I declared a pointer
Yes. You did.
so the pointer is pointing to the object?
You've initialised the value of the pointer with the result of the new-expression. The pointer points at the object that was created in dynamic storage.
and what is the name of that object then?
Objects do not have names. However: Variables do have names, and variables are associated with an object, so one could colloquially say that those objects associated with a variable have a name. But objects in dynamic storage are not named by a variable.
a pointer is a variable which contains an address of another variable. Any pointer uses space in memory needed to keep the address. on 64-bit platforms it usually needs 8 bytes.
When you create an class object, it also is resided in memory and occupies as many bytes as it needs. The pointer gets assigned a value of the address of this class object.
bank_account *pointer = new bank_account();
The above declares a pointer to the object of type bank_account. new allocates space for the object in the memory and returns its address. It also calls a constructor of the class. The address returned by the new gets assigned to the pointer variable named pointer. Later you can use it to access the object as
(*pointer).account_name ="Random Name";
or equivalently
pointer->account_name ="Random Name";
pointer is just an address. Pointer type is just a syntactic sugar which allows the compiler to do its job and to provide you with useful information about your program.
A pointer is a variable that holds a memory address, and it exists wherever it is pointing to something that makes sense or not, meaning you can declare this pointer and not necessarily instance an object for it to point to, it can just point to nullptr which generally signifies the object does not exist at that moment. This alone is useful. You can use it as place holder or to keep track of the program's state.
Another property is that it can point to an array of objects, so you may use it to create a dynamic number of objects at once instead of just one or a predetermined number of objects.
But the most important property is that the object you instance with new does not belong to that particular scope, if the function ends it will not be automatically deleted. This object can be created in a subroutine and then exist throughout the program's life or until you delete it, and all you have to do to pass this object around is pass it's pointer, which is a quite small piece of data compared to doing something silly like copying the object around, this is huge for performance.
You have to pay attention to memory leaks though. Since this object is not deleted automatically, you have to do it yourself when necessary, otherwise the longer the program runs, more memory it will use until it runs out of it.
You can also have multiple pointers pointing to the same place, which is very useful when iterating through linked lists, arrays and all sorts of structures, so a pointer's purpose doesn't necessarily have to be that of holding a specific object, but that of a tool to browse data in memory efficiently.
Suppose that you have the following function:
void doSomething(){
int *data = new int[100];
}
Why will this produce a memory leak? Since I can not access this variable outside the function, why doesn't the compiler call delete by itself every time a call to this function ends?
Why will this produce a memory leak?
Because you're responsible for deleting anything you create with new.
Why doesn't the compiler call delete by itself every time a call to this function ends?
Often, the compiler can't tell whether or not you still have a pointer to the allocated object. For example:
void doSomething(){
int *data = new int[100];
doSomethingElse(data);
}
Does doSomethingElse just use the pointer during the function call (in which case we still want to delete the array here)? Does it store a copy of the pointer for use later (in which case we don't want to delete it yet)? The compiler has no way of knowing; it's up to you to tell it. Rather than making a complicated, error-prone rule (like "you must delete it unless you can figure out that the compiler must know that there are no other references to it"), the rule is kept simple: you must delete it.
Luckily, we can do better than juggling a raw pointer and trying to delete it at the right time. The principle of RAII allows objects to take ownership of allocated resources, and automatically release them when their destructor is called as they go out of scope. Containers allow dynamic objects to be maintained within a single scope, and copied if needed; smart pointers allow ownership to be moved or shared between scopes. In this case, a simple container will give us a dynamic array:
void doSomething(){
std::vector<int> data(100);
} // automatically deallocated
Of course, for a small fixed-size array like this, you might as well just make it automatic:
void doSomething(){
int data[100];
} // all automatic variables are deallocated
The whole point of dynamic allocation like this is that you manually control the lifetime of the allocated object. It is needed and appropriate a lot less often than many people seem to think. In fact, I cannot think of a valid use of new[].
It is indeed a good idea to let the compiler handle the lifetime of objects. This is called RAII. In this particular case, you should use std::vector<int> data(100). Now the memory gets freed automatically as soon as data goes out of scope in any way.
Actually you can access this variable outside of your doSomething() function, you just need to know address which this pointer holds.
In picture below, rectangle is one memory cell, on top of rectangle is memory cell address, inside of rectangle is memory cell value.
For example, in picture above, if you know that allocated array starts from address 0x200, then outside of your function you can do next:
int *data = (int *)0x200;
std::cout << data[0];
So compiler can't delete this memory for you, but please pay attention to compiler warning:
main.cpp: In function ‘void doSomething()’:
main.cpp:3:10: warning: unused variable ‘data’ [-Wunused-variable]
int *data = new int[100];
When you do new, OS allocates memory in RAM for you, so you need to make OS know, when you don't need this memory anymore, doing delete, so it's only you, who knows when execute delete, not a compiler.
This is memory allocated on the heap, and is therefore available outside the function to anything which has its address. The compiler can't be expected to know what you intend to do with heap-allocated memory, and it will not deduce at compile time the need to free the memory by (for example) tracking whether anything has an active reference to the memory. There is also no automatic run-time garbage collection as in Java. In C++ the responsibility is on the programmer to free all memory allocated on the heap.
See also: Why doesn't C++ have a garbage collector?
Here you have an int array dynamically allocated on the heap with a pointer data pointing to the first element of your array.
Since you explicitly allocated your array on the heap with new, this one will not be deleted from the memory when you go out of scope. When you go out of scope, your pointer will be deleted, and then you will not have access to the array -> Memory leak
Since I can not access this variable outside the function, why doesn't
the compiler call delete by itself every time a call to this function
ends?
Because you should not allocate memory on the heap that you wouldn't use anyway in the first place.
That's how C++ works.
EDIT:
also if compiler would delete the pointer after function returns then there would be way of returning a pointer from function.
This is a basic question of which I can't find any answer.
Given the next code, a memory leak will occur:
int main(){
A* a = new A();
// 1
}
//2
Lets say that a got the value 1000. That is, the address 1000 on the heap is now taken by an A object. On 1, a == 1000 and on 2 a is out of scope. But some information is missing.
In real life, the address 1000 is the address of a byte in the memory. This byte does not have the information that it stores a valuable information.
My questions:
who keeps this information?
how is this information is kept?
which component "knows" from where to where the pointer a points to? How can the computer know that a points to sizeof(A) bytes?
Thanks!
This information is kept in your program, in the variable a
The compiler knows this while compiling. Run-time only the allocator knows that "at this particular address sizeof(A) bytes are reserved" - and you can't use that info, you're simply expected to treat these bytes as if they contained an A
The language standard doesn't say.
All we know is that if we do delete a, the memory is released again.
There are several options, like allocating everything that is sizeof(a) from a certain memory pool with addresses 1000 to 1000+x. Or someone (the language runtime or the OS) can keep at table somewhere. Or something else.
Typically new and delete operators are implemented on top of malloc and free, though this detail is unspecified. malloc and free both point to a data structure which tracks which regions of memory are allocated, which are not, and how big each region is. Knuth's Art of Computer Programming Vol 1 has a pretty good description of a few allocator designs.
The memory address is stored in the variable a.
It's simply kept as a value on the stack like any other local variable. Just a plain old number that refers to a memory location that stores an A object.
The compiler knows that the type of a is A*. So it knows that the object at the location held in a should be an A object. That's why you get problems if you have other objects at that location. The generated machine code will act as though an A object is there, so if there isn't (perhaps there is a derived class there) then it must be built to be able to survive it (virtual function tables in the case of a derived class).
Once a local pointer goes out of scope, it will be taken off the stack. If that was the only pointer to the object on the heap, then you've lost any way to delete it. That's why you get a memory leak. Suddenly you have no idea where the object is any more because you lost your only pointer.
Q1. In Java, all objects, arrays and class variables are stored on the heap? Is the same true for C++? Is data segment a part of Heap?
What about the following code in C++?
class MyClass{
private:
static int counter;
static int number;
};
MyClass::number = 100;
Q2. As far as my understanding goes, variables which are given a specific value by compiler are stored in data segment, and unintialized global and static variables are stored in BSS (Block started by symbol). In this case, MyClass::counter being static is initialized to zero by the compiler and so it is stored at BSS and MyClass::number which is initialized to 100 is stored in the data segment. Am I correct in making the conclusion?
Q3. Consider following piece of codes:
void doHello(MyClass &localObj){
// 3.1 localObj is a reference parameter, where will this get stored in Heap or Stack?
// do something
}
void doHelloAgain(MyClass localObj){
// 3.2 localObj is a parameter, where will this get stored in Heap or Stack?
// do something
}
int main(){
MyClass *a = new MyClass(); // stored in heap
MyClass localObj;
// 3.3 Where is this stored in heap or stack?
doHello(localObj);
doHelloAgain(localObj);
}
I hope I have made my questions clear to all
EDIT:
Please refer this article for some understanding on BSS
EDIT1: Changed the class name from MyInstance to MyClass as it was a poor name. Sincere Apologies
EDIT2: Changed the class member variable number from non-static to static
This is somewhat simplified but mostly accurate to the best of my knowledge.
In Java, all objects are allocated on the heap (including all your member variables). Most other stuff (parameters) are references, and the references themselves are stored on the stack along with native types (ints, longs, etc) except string which is more of an object than a native type.
In C++, if you were to allocate all objects with the "new" keyword it would be pretty much the same situation as java, but there is one unique case in C++ because you can allocate objects on the stack instead (you don't always have to use "new").
Also note that Java's heap performance is closer to C's stack performance than C's heap performance, the garbage collector does some pretty smart stuff. It's still not quite as good as stack, but much better than a heap. This is necessary since Java can't allocate objects on the stack.
Q1
Java also stores variables on the stack but class instances are allocated on the heap. In C++ you are free to allocate your class instances either on the stack or on the heap. By using the new keyword you allocate the instance on the heap.
The data segment is not part of the heap, but is allocated when the process starts. The heap is used for dynamic memory allocations while the data segment is static and the contents is known at compile time.
The BSS segment is simply an optimization where all the data belongning to the data segment (e.g. string, constant numbers etc.) that are not initialized or initialized to zero are moved to the BSS segment. The data segment has to be embedded into the executable and by moveing "all the zeros" to the end they can be removed from the executable. When the executable is loaded the BSS segment is allocated and initialized to zero, and the compiler is still able to know the addresses of the various buffers, variables etc. inside the BSS segment.
Q2
MyClass::number is stored where the instance of MyClass class is allocated. It could be either on the heap or on the stack. Notice in Q3 how a points to an instance of MyClass allocated on the heap while localObj is allocated on the stack. Thus a->number is located on the heap while localObj.number is located on the stack.
As MyClass::number is an instance variable you cannot assign it like this:
MyClass::number = 100;
However, you can assign MyClass::counter as it is static (except that it is private):
MyClass::counter = 100;
Q3
When you call doHello the variable localObj (in main) is passed by reference. The variable localObj in doHello refers back to that variable on the stack. If you change it the changes will be stored on the stack where localObj in main is allocated.
When you call doHelloAgain the variable localObj (in main) is copied onto the stack. Inside doHelloAgain the variable localObj is allocated on the stack and only exists for the duration of the call.
In C++, objects may be allocated on the stack...for example, localObj in your Q3 main routine.
I sense some confusion about classes versus instances. "MyInstance" makes more sense as a variable name than a class name. In your Q1 example, "number" is present in each object of type MyInstance. "counter" is shared by all instances. "MyInstance::counter = 100" is a valid assignment, but "MyInstance::number = 100" is not, because you haven't specified
which object should have its "number" member assigned to.
Q1. In Java, all objects, arrays and
class variables are stored on the
heap? Is the same true for C++? Is
data segment a part of Heap?
No, the data section is separate from the heap. Basically, the data section is allocated at load time, everything there has a fixed location after that. In addition, objects can be allocated on the stack.
The only time objects are on the heap is if you use the new keyword, or if you use something from the malloc family of functions.
Q2. As far as my understanding goes,
variables which are given a specific
value by compiler are stored in data
segment, and unintialized global and
static variables are stored in BSS
(Block started by symbol). In this
case, MyInstance::counter being static
is initialized to zero by the compiler
and so it is stored at BSS and
MyInstance::number which is
initialized to 100 is stored in the
data segment. Am I correct in making
the conclusion?
Yes, your understanding of the BSS section is correct. However, since number isn't static the code:
MyInstance::number = 100;
isn't legal, it needs to be either made static or initialized in the constructor properly. If you initialize it in the constructor, it will exist wherever the owning object is allocated. If you make it static, it will end up in the data section... if anywhere. Often static const int variables can be inlined directly into the code used such that a global variable isn't needed at all.
Q3. Consider following piece of codes: ...
void doHello(MyInstance &localObj){
localObj is a reference to the passed object. As far as you know, there is no storage, it refers to wherever the variable being passed is. In reality, under the hood, a pointer may be passed on the stack to facilitate this. But The compiler may just as easily optimize that out if it can.
void doHelloAgain(MyInstance localObj){
a copy of the passed parameter is placed on the stack.
MyInstance localObj;
// 3.3 Where is this stored in heap or stack?
localObj is on the stack.
All memory areas in C++ are listed here
I have this situation:
{
float foo[10];
for (int i = 0; i < 10; i++) {
foo[i] = 1.0f;
}
object.function1(foo); // stores the float pointer to a const void* member of object
}
object.function2(); // uses the stored void pointer
Are the contents of the float pointer unknown in the second function call? It seems that I get weird results when I run my program. But if I declare the float foo[10] to be const and initialize it in the declaration, I get correct results. Why is this happening?
For the first question, yes using foo once it goes out of scope is incorrect. I'm not sure if it's defined behavior in the spec or not but it's definitely incorrect to do so. Best case scenario is that your program will immediately crash.
As for the second question, why does making it const work? This is an artifact of implementation. Likely what's happenning is the data is being written out to the data section of the DLL and hence is valid for the life of the program. The original sample instead puts the data on the stack where it has a much shorter lifetime. The code is still wrong, it just happens to work.
Yes, foo[] is out of scope when you call function2. It is an automatic variable, stored on the stack. When the code exits the block it was defined in, it is deallocated. You may have stored a reference (pointer) to it elsewhere, but that is meaningless.
In both cases you are getting undefined behaviour. Anything might happen.
You are storing a pointer to the locally declared array, but once the scope containing the array definition is exited the array - and all its members are destroyed.
The pointer that you have stored now no longer points to a float or even a valid memory address that could be used for a float. It might be an address that is reused for something else or it might continue to contain the original data unchanged. Either way, it is still not valid to attempt to dereference the pointer, either for reading or writing a float value.
For any declaration like this:
{
type_1 variable_name_1;
type_2 variable_name_2;
type_3 variable_name_3;
}
declaration, the variables are allocated on the stack.
You can print out the address of each variable:
printf("%p\n", variable_name )
and you'll see that addresses increase by small amount roughly (but not always exactly equal to), the amount of space each variable needs to store its data.
The memory used by stack variables is recycled when the '}' is reached and the variables go out of scope. This is done nice an efficiently just by subtracting some number from a special pointer called the 'stack pointer', which says where the data for new stack variables will have their data allocated. By incrementing and decrementing the stack pointer, programs have an extremely fast way of working out were the memory for variables will live. Its such and important concept that every major processor maintains a special piece of memory just for the stack pointer.
The memory for your array is also pushed and popped from the program's data stack and your array pointer is a pointer into the program's stack memory. While the language specification says accessing the data owned by out-of-scope variables has undefined consequences, the result is typically easy to predict. Usually, your array pointer will continue to hold its original data until new stack variables are allocated and assigned data (i.e. the memory is reused for other purposes).
So don't do it. Copy the array.
I'm less clear about what the standard says about constant arrays (probably the same thing -- the memory is invalid when the original declaration goes out of scope). However, your different behavior is explainable if your compiler allocated a chunk of memory for constants that is initialized when your program starts, and later, foo is made to point to that data when it comes into scope. At least, if I were writing a compiler, that's probably what I'd do as its both very fast and leads to using the smallest amount of memory. This theory is easily testable in the following way:
void f()
{
const float foo[2] = {99, 101};
fprintf( "-- %f\n", foo[0] );
const_cast<foo*>(foo)[0] = 666;
}
Call foo() twice. If the printed value changed between calls (or an invalid memory access exception is thrown), its a fair bet that the data for foo is allocated in special area for constants that the above code wrote over.
Allocating the memory in a special area doesn't work for non-const data because recursive functions may cause many separate copies of a variable to exist on the stack at the same time, each of which may hold different data.
It's undefined behavior in both cases. You should consider the stack based variable deallocated when control leaves the block.
What's happening is currently you're probably just setting a pointer (can't see the code, so I can't be sure). This pointer will point to the object foo, which is in scope at that point. But when it goes out of scope, all hell can break loose, and the C standard can make no guarantees about what happens to that data once it goes out of scope. It can be overwritten by anything. It works for a const array because you're lucky. Don't do that.
If you want the code to work correctly as it is, function1() is going to need to copy the data into the object member. Which means you'll also have to know the length of the array, which means you'll have to pass it in or have some nice termination method.
The memory associated with foo goes out of scope and is reclaimed.
Outside the {}, the pointer is invalid.
It is a good idea to make objects manage their own memory rather than refer to an external pointer. In this specific case your object could allocate its own foo internally and copy the data into it. However it really depends on what you are trying to achieve.
For simple problems like this it is better to give a simple answer, not 3 paragraphs about stacks and memory addresses.
There are 2 pairs of braces {}, one is inside the other. The array was declared after the first left brace { so it stops existing before the last brace }
The end
When answering a question you must answer it at the level of the person asking regardless of how well you yourself comprehend the issue or you may confuse the student.
-experienced ESL teacher