How to know static/global objects which are in memory

How to know static/global objects which are in memory - c++

We have static and global objects allocated on stack. These objects would be destroyed during process's main (…) function exit.
But I would like to manage destruction of these before main's exit. To destroy these I will have to move the creation logic of these objects to heap and deallocate them.
Is there any tool which tells us what all the static and global objects are still in the memory so that I can go ahead and make the code changes to destroy them.
Use case:
There is a desktop application which uses static and global extensively. I would like to do little bit changes to this and use this as a library which should be stateless. For this during terminateLibrary all these static and global should be destroyed so that we can do subsequent InitLibrary afresh without actually restarting the process.
moving existing architecture to totally a new design is very difficult and time taking, as there are huge number of these objects scattered through out the code. that's why I wanted to find out those static and global which is there in memory even after terminateLibrary called.

Related

Freeing static object memory completely

Going through Modern C++ Design book, I came across this when the author talks about Singletons.
C++ guarantees that static objects memory lasts for the duration of the program
(He uses this to ensure its always possible to recreate the "Phoneix" Singleton)
Is it the case that I can delete the static object but can never release it's memory?
If so, creating large static objects would mean it's memory is lost forever (till life of program ends)

If so, creating large static objects would mean it's memory is lost forever (till life of program ends)
This sounds as if it was a downside of static objects. Not sure, but maybe you missed the point. Sometimes you need an object to be alive for the whole duration of the program. How do you ensure that? You make it a static object, because...
C++ guarantees that static objects memory lasts for the duration of the program
If you don’t want the object to be alive for the whole duration of the program, then don’t make it static.

static variable vs stack variable: thread safety vs stack size?

I have a method that needs a lot of memory (compared with stack size) to store temporary results.
I'm considering using static variables as local temporary storage.
I did some research and found out that initialization of static variables is not thread-safe in C++98.
So, question is, what if I just need the space but don't care about the initialization?
Or, further, what is the problem of using static variables as local storage?

It's not just a matter of initialization. A static variable, even one defined within a function, is a single instance that's shared by all calls to that function — even calls in different threads. If you use a static variable as a scratch buffer, all your threads will be sharing the same scratch buffer, and you'll need to use explicit synchronization to keep them from interfering with each other. You'll also need to figure out how to make the threads share the buffer effectively; if they're all trying to use the same parts at the same time and frequently have to wait for mutex locks, you lose the benefit of concurrency and might as well just use a single thread.
It'd be much easier to have the function just allocate its scratch area on the heap and delete it before returning. (You can use std::unique_ptr in C++11 to ensure that the buffer is deleted when the pointer goes out of scope, or std::auto_ptr if you're stuck with C++98.)

static variables exist only once. That means if you have multithreaded code where two threads would need the space, you are out of luck. It will bite you one day.
static variables exist forever. That means once you are done with the task needing the space, the space is gone forever. That will bite you from day one.
If you need more memory than you are willing to allocate on the stack (and many implementations have much more severe restrictions for stack memory), then allocate the memory on the heap, and free it when you are done with it.

Is there any way to make sure my function to be the last function to be called?

I have overridden the global new/delete to catch memory leak. When the process exits, I need to call assertNoMemoryLeak to assert all the allocated memory by new has been released.
But it seems I cannot make assertNoMemoryLeak to be the last function called in my process, because some global variable destructor will be the last.
atexit doesn't work, for the following reason. AFAIK, the global destructor generated by gcc will be pushed to atexit list when the constructor is called, and I also cannot make my push 'assertNoMemoryLeak' snippet run before that during startup, so my assertNoMemoryLeak will still not run as the last function.
Another work around way is to write the new/delete info to file, and then after the process exit, analyze the file (using a script). I don't want to do it this way because it is complex to automatically do it for each developer.
So any way to make assertNoMemoryLeak the last function called?

If finding and managing all the global static objects is practical, then you can make one global static object whose job it is to construct and destruct all other global objects in the desired order. If you don't have control over all of them, or they are in too many different places in different libraries, this will not be practical.
You can do this by changing all global static objects to pointers, or by making them members of the one master object, or rewriting the code without global objects. The point is eliminate anything that has startup and shutdown that you don't control, then run your assert checks at the proper time in the destructor of this one global object which you do control.

Why should/shouldn't I use the "new" operator to instantiate a class, and why?

I understand that this may be construed as one of those "what's your preference" questions, but I really want to know why you would choose one of the following methods over the other.
Suppose you had a super complex class, such as:
class CDoSomthing {
public:
CDoSomthing::CDoSomthing(char *sUserName, char *sPassword)
{
//Do somthing...
}
CDoSomthing::~CDoSomthing()
{
//Do somthing...
}
};
How should I declare a local instance within a global function?
int main(void)
{
CDoSomthing *pDoSomthing = new CDoSomthing("UserName", "Password");
//Do somthing...
delete pDoSomthing;
}
-- or --
int main(void)
{
CDoSomthing DoSomthing("UserName", "Password");
//Do somthing...
return 0;
}

Prefer local variables, unless you need the object's lifetime to extend beyond the current block. (Local variables are the second option). It's just easier than worrying about memory management.
P.S. If you need a pointer, because you need it to pass to another function, just use the address-of operator:
SomeFunction(&DoSomthing);

There are two main considerations when you declare a variable on the stack vs. in the heap - lifetime control and resource management.
Allocating on the stack works really well when you have tight control over the lifetime of the object. That means you are not going to pass a pointer or a reference of that object to code outside of the scope of the local function. This means, no out parameters, no COM calls, no new threads. Quite a lot of limitations, but you get the object cleaned up properly for you on normal or exceptional exit from the current scope (Though, you might want to read up on stack unwinding rules with virtual destructors). The biggest drawback of the stack allocation - the stack is usually limited to 4K or 8K, so you might want to be careful what you put on it.
Allocating on the heap on the other hand would require you to cleanup the instance manually. That also means that you have a lot of freedom how to control the lifetime of the instance. You need to do this in two scenarios: a) you are going to pass that object out of scope; or b) the object is too big and allocating it on the stack could cause stack overflow.
BTW, a nice compromise between these two is allocating the object on the heap and allocating a smart pointer to it on the stack. This ensures that you are not wasting precious stack memory, while still getting the automatic cleanup on scope exit.

The second form is the so called RAII (Resource Acquisition Is Initialization) pattern. It has many advantages over the first.
When use new, you have to use delete yourself, and guarantee it will always be deleted, even if an exception is thrown. You must guarantee all that yourself.
If you use the second form, when the variable goes out of scope, it is always cleaned up automatically. And if an exception is thrown, the stack unwinds and it is also cleaned up.
So, you should prefer RAII (the second option).

In addition to what has been said so far, but there are additional performance considerations to be taken into account, particularly in memory-allocation-intensive applications:
Using new will allocate memory from the heap. In the case of intense (extremely frequent) allocation and deallocation, you will be paying a high price in:
locking: the heap is a resource shared by all threads in your process. Operations on the heap may require locking in the heap manager (done for you in the runtime library), which may slow things down significantly.
fragmentation: heap fragments. You may see the time it takes malloc/new and free/delete to return increase 10-fold. This compounds with the locking problem above, as it takes more time to manage a fragmented heap and more threads queue up waiting for the heal lock. (On Windows there is a special flag you can set for the heap manager so it heuristically attempts to reduce fragmentation.)
Using the RAII pattern, memory is simply taken off the stack. Stack is a per-thread resource, it does not fragment, there is no locking involved, and may turn out to play in your advantage in terms of memory locality (i.e. memory caching at the CPU level.)
So, when you need objects for a brief (or scoped) period of time, definitely use the second approach (local variable, on the stack.) If you need to share data between threads, use new/malloc (on one hand you have to, on the second hand these objects are typically long-lived enough so you pay essentially 0-cost vis-a-vis the heap manager.)

The second version will unwind the stack if an exception is thrown. The first will not. I don't see much difference otherwise.

The biggest difference between the two is that the new initiates a pointer to the object.
By creating the object without new, the object initiated is stored on the stack. If it is initiated with new, it returns a pointer to the new object that has been created on the heap. It actually returns a memory address that points to the new object. When this happens, you need to memory manage the variable. When you are done using the variable, you would need to call delete on it to avoid a memory leak. Without the new operator, when the variable goes out of scope the memory will be freed automatically.
So if you need to pass the variable outside of the current scope, using new is more efficient. However, if you need to make a temporary variable, or something that will only be used temporarily, having the object on the stack is going to be better, since you don't have to worry about memory management.

Mark Ransom is right, also you'll need to instantiate with new if you are going to pass the variable as a parameter to the CreateThread-esque function.

Stack, Static, and Heap in C++

I've searched, but I've not understood very well these three concepts. When do I have to use dynamic allocation (in the heap) and what's its real advantage? What are the problems of static and stack? Could I write an entire application without allocating variables in the heap?
I heard that others languages incorporate a "garbage collector" so you don't have to worry about memory. What does the garbage collector do?
What could you do manipulating the memory by yourself that you couldn't do using this garbage collector?
Once someone said to me that with this declaration:
int * asafe=new int;
I have a "pointer to a pointer". What does it mean? It is different of:
asafe=new int;
?

A similar question was asked, but it didn't ask about statics.
Summary of what static, heap, and stack memory are:
A static variable is basically a global variable, even if you cannot access it globally. Usually there is an address for it that is in the executable itself. There is only one copy for the entire program. No matter how many times you go into a function call (or class) (and in how many threads!) the variable is referring to the same memory location.
The heap is a bunch of memory that can be used dynamically. If you want 4kb for an object then the dynamic allocator will look through its list of free space in the heap, pick out a 4kb chunk, and give it to you. Generally, the dynamic memory allocator (malloc, new, et c.) starts at the end of memory and works backwards.
Explaining how a stack grows and shrinks is a bit outside the scope of this answer, but suffice to say you always add and remove from the end only. Stacks usually start high and grow down to lower addresses. You run out of memory when the stack meets the dynamic allocator somewhere in the middle (but refer to physical versus virtual memory and fragmentation). Multiple threads will require multiple stacks (the process generally reserves a minimum size for the stack).
When you would want to use each one:
Statics/globals are useful for memory that you know you will always need and you know that you don't ever want to deallocate. (By the way, embedded environments may be thought of as having only static memory... the stack and heap are part of a known address space shared by a third memory type: the program code. Programs will often do dynamic allocation out of their static memory when they need things like linked lists. But regardless, the static memory itself (the buffer) is not itself "allocated", but rather other objects are allocated out of the memory held by the buffer for this purpose. You can do this in non-embedded as well, and console games will frequently eschew the built in dynamic memory mechanisms in favor of tightly controlling the allocation process by using buffers of preset sizes for all allocations.)
Stack variables are useful for when you know that as long as the function is in scope (on the stack somewhere), you will want the variables to remain. Stacks are nice for variables that you need for the code where they are located, but which isn't needed outside that code. They are also really nice for when you are accessing a resource, like a file, and want the resource to automatically go away when you leave that code.
Heap allocations (dynamically allocated memory) is useful when you want to be more flexible than the above. Frequently, a function gets called to respond to an event (the user clicks the "create box" button). The proper response may require allocating a new object (a new Box object) that should stick around long after the function is exited, so it can't be on the stack. But you don't know how many boxes you would want at the start of the program, so it can't be a static.
Garbage Collection
I've heard a lot lately about how great Garbage Collectors are, so maybe a bit of a dissenting voice would be helpful.
Garbage Collection is a wonderful mechanism for when performance is not a huge issue. I hear GCs are getting better and more sophisticated, but the fact is, you may be forced to accept a performance penalty (depending upon use case). And if you're lazy, it still may not work properly. At the best of times, Garbage Collectors realize that your memory goes away when it realizes that there are no more references to it (see reference counting). But, if you have an object that refers to itself (possibly by referring to another object which refers back), then reference counting alone will not indicate that the memory can be deleted. In this case, the GC needs to look at the entire reference soup and figure out if there are any islands that are only referred to by themselves. Offhand, I'd guess that to be an O(n^2) operation, but whatever it is, it can get bad if you are at all concerned with performance. (Edit: Martin B points out that it is O(n) for reasonably efficient algorithms. That is still O(n) too much if you are concerned with performance and can deallocate in constant time without garbage collection.)
Personally, when I hear people say that C++ doesn't have garbage collection, my mind tags that as a feature of C++, but I'm probably in the minority. Probably the hardest thing for people to learn about programming in C and C++ are pointers and how to correctly handle their dynamic memory allocations. Some other languages, like Python, would be horrible without GC, so I think it comes down to what you want out of a language. If you want dependable performance, then C++ without garbage collection is the only thing this side of Fortran that I can think of. If you want ease of use and training wheels (to save you from crashing without requiring that you learn "proper" memory management), pick something with a GC. Even if you know how to manage memory well, it will save you time which you can spend optimizing other code. There really isn't much of a performance penalty anymore, but if you really need dependable performance (and the ability to know exactly what is going on, when, under the covers) then I'd stick with C++. There is a reason that every major game engine that I've ever heard of is in C++ (if not C or assembly). Python, et al are fine for scripting, but not the main game engine.

The following is of course all not quite precise. Take it with a grain of salt when you read it :)
Well, the three things you refer to are automatic, static and dynamic storage duration, which has something to do with how long objects live and when they begin life.
Automatic storage duration
You use automatic storage duration for short lived and small data, that is needed only locally within some block:
if(some condition) {
int a[3]; // array a has automatic storage duration
fill_it(a);
print_it(a);
}
The lifetime ends as soon as we exit the block, and it starts as soon as the object is defined. They are the most simple kind of storage duration, and are way faster than in particular dynamic storage duration.
Static storage duration
You use static storage duration for free variables, which might be accessed by any code all times, if their scope allows such usage (namespace scope), and for local variables that need extend their lifetime across exit of their scope (local scope), and for member variables that need to be shared by all objects of their class (classs scope). Their lifetime depends on the scope they are in. They can have namespace scope and local scope and class scope. What is true about both of them is, once their life begins, lifetime ends at the end of the program. Here are two examples:
// static storage duration. in global namespace scope
string globalA;
int main() {
foo();
foo();
}
void foo() {
// static storage duration. in local scope
static string localA;
localA += "ab"
cout << localA;
}
The program prints ababab, because localA is not destroyed upon exit of its block. You can say that objects that have local scope begin lifetime when control reaches their definition. For localA, it happens when the function's body is entered. For objects in namespace scope, lifetime begins at program startup. The same is true for static objects of class scope:
class A {
static string classScopeA;
};
string A::classScopeA;
A a, b; &a.classScopeA == &b.classScopeA == &A::classScopeA;
As you see, classScopeA is not bound to particular objects of its class, but to the class itself. The address of all three names above is the same, and all denote the same object. There are special rule about when and how static objects are initialized, but let's not concern about that now. That's meant by the term static initialization order fiasco.
Dynamic storage duration
The last storage duration is dynamic. You use it if you want to have objects live on another isle, and you want to put pointers around that reference them. You also use them if your objects are big, and if you want to create arrays of size only known at runtime. Because of this flexibility, objects having dynamic storage duration are complicated and slow to manage. Objects having that dynamic duration begin lifetime when an appropriate new operator invocation happens:
int main() {
// the object that s points to has dynamic storage
// duration
string *s = new string;
// pass a pointer pointing to the object around.
// the object itself isn't touched
foo(s);
delete s;
}
void foo(string *s) {
cout << s->size();
}
Its lifetime ends only when you call delete for them. If you forget that, those objects never end lifetime. And class objects that define a user declared constructor won't have their destructors called. Objects having dynamic storage duration requires manual handling of their lifetime and associated memory resource. Libraries exist to ease use of them. Explicit garbage collection for particular objects can be established by using a smart pointer:
int main() {
shared_ptr<string> s(new string);
foo(s);
}
void foo(shared_ptr<string> s) {
cout << s->size();
}
You don't have to care about calling delete: The shared ptr does it for you, if the last pointer that references the object goes out of scope. The shared ptr itself has automatic storage duration. So its lifetime is automatically managed, allowing it to check whether it should delete the pointed to dynamic object in its destructor. For shared_ptr reference, see boost documents: http://www.boost.org/doc/libs/1_37_0/libs/smart_ptr/shared_ptr.htm

It's been said elaborately, just as "the short answer":
static variable (class)
lifetime = program runtime (1)
visibility = determined by access modifiers (private/protected/public)
static variable (global scope)
lifetime = program runtime (1)
visibility = the compilation unit it is instantiated in (2)
heap variable
lifetime = defined by you (new to delete)
visibility = defined by you (whatever you assign the pointer to)
stack variable
visibility = from declaration until scope is exited
lifetime = from declaration until declaring scope is exited
(1) more exactly: from initialization until deinitialization of the compilation unit (i.e. C / C++ file). Order of initialization of compilation units is not defined by the standard.
(2) Beware: if you instantiate a static variable in a header, each compilation unit gets its own copy.

The main difference is speed and size.
Stack
Dramatically faster to allocate. It is done in O(1), since it is allocated when setting up the stack frame, so it is essentially free. The drawback is that if you run out of stack space you are in deep trouble. You can adjust the stack size, but, IIRC, you have ~2MB to play with. Also, as soon as you exit the function everything on the stack is cleared. So, it can be problematic to refer to it later. (Pointers to stack allocated objects lead to bugs.)
Heap
Dramatically slower to allocate. But, you have GB to play with, and point to.
Garbage Collector
The garbage collector is some code that runs in the background and frees memory. When you allocate memory on the heap it is very easy to forget to free it, which is known as a memory leak. Over time, the memory your application consumes grows and grows until it crashes. Having a garbage collector periodically free the memory you no longer need helps eliminate this class of bugs. Of course, this comes at a price, as the garbage collector slows things down.

What are the problems of static and stack?
The problem with "static" allocation is that the allocation is made at compile-time: you can't use it to allocate some variable number of data, the number of which isn't known until run-time.
The problem with allocating on the "stack" is that the allocation is destroyed as soon as the subroutine which does the allocation returns.
I could write an entire application without allocate variables in the heap?
Perhaps but not a non-trivial, normal, big application (but so-called "embedded" programs might be written without the heap, using a subset of C++).
What garbage collector does ?
It keeps watching your data ("mark and sweep") to detect when your application is no longer referencing it. This is convenient for the application, because the application doesn't need to deallocate the data ... but the garbage collector might be computationally expensive.
Garbage collectors aren't a usual feature of C++ programming.
What could you do manipulating the memory by yourself that you couldn't do using this garbage collector?
Learn the C++ mechanisms for deterministic memory deallocation:
'static': never deallocated
'stack': as soon as the variable "goes out of scope"
'heap': when the pointer is deleted (explicitly deleted by the application, or implicitly deleted within some-or-other subroutine)

Stack memory allocation (function variables, local variables) can be problematic when your stack is too "deep" and you overflow the memory available to stack allocations. The heap is for objects that need to be accessed from multiple threads or throughout the program lifecycle. You can write an entire program without using the heap.
You can leak memory quite easily without a garbage collector, but you can also dictate when objects and memory is freed. I have run in to issues with Java when it runs the GC and I have a real time process, because the GC is an exclusive thread (nothing else can run). So if performance is critical and you can guarantee there are no leaked objects, not using a GC is very helpful. Otherwise it just makes you hate life when your application consumes memory and you have to track down the source of a leak.

What if your program does not know upfront how much memory to allocate (hence you cannot use stack variables). Say linked lists, the lists can grow without knowing upfront what is its size. So allocating on a heap makes sense for a linked list when you are not aware of how many elements would be inserted into it.

An advantage of GC in some situations is an annoyance in others; reliance on GC encourages not thinking much about it. In theory, waits until 'idle' period or until it absolutely must, when it will steal bandwidth and cause response latency in your app.
But you don't have to 'not think about it.' Just as with everything else in multithreaded apps, when you can yield, you can yield. So for example, in .Net, it is possible to request a GC; by doing this, instead of less frequent longer running GC, you can have more frequent shorter running GC, and spread out the latency associated with this overhead.
But this defeats the primary attraction of GC which appears to be "encouraged to not have to think much about it because it is auto-mat-ic."
If you were first exposed to programming before GC became prevalent and were comfortable with malloc/free and new/delete, then it might even be the case that you find GC a little annoying and/or are distrustful(as one might be distrustful of 'optimization,' which has had a checkered history.) Many apps tolerate random latency. But for apps that don't, where random latency is less acceptable, a common reaction is to eschew GC environments and move in the direction of purely unmanaged code (or god forbid, a long dying art, assembly language.)
I had a summer student here a while back, an intern, smart kid, who was weaned on GC; he was so adament about the superiorty of GC that even when programming in unmanaged C/C++ he refused to follow the malloc/free new/delete model because, quote, "you shouldn't have to do this in a modern programming language." And you know? For tiny, short running apps, you can indeed get away with that, but not for long running performant apps.

Stack is a memory allocated by the compiler, when ever we compiles the program, in default compiler allocates some memory from OS ( we can change the settings from compiler settings in your IDE) and OS is the one which give you the memory, its depends on many available memory on the system and many other things, and coming to stack memory is allocate when we declare a variable they copy(ref as formals) those variables are pushed on to stack they follow some naming conventions by default its CDECL in Visual studios
ex: infix notation:
c=a+b;
the stack pushing is done right to left PUSHING, b to stack, operator, a to stack and result of those i,e c to stack.
In pre fix notation:
=+cab
Here all the variables are pushed to stack 1st (right to left)and then the operation are made.
This memory allocated by compiler is fixed. So lets assume 1MB of memory is allocated to our application, lets say variables used 700kb of memory(all the local variables are pushed to stack unless they are dynamically allocated) so remaining 324kb memory is allocated to heap.
And this stack has less life time, when the scope of the function ends these stacks gets cleared.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js