I'm looking for a way to quickly exit a C++ that has allocated a lot of structures in memory using C++ classes. The program finishes correctly, but after the final "return" in the program, all of the auto-destructors kick in. The problem is the program has allocated about 15GB of memory through lots of C++ class structures, and this auto-destruct process takes about 1 more hour itself to complete as it walks through all of the structures - even though I don't care about the results. The program only took 1 hour to complete the task up to this point. I would like to just return to the OS and let it do its normal wholesale process allocation deletion - which is very quick. I've been doing this by manually killing the process during the cleanup stage - but am looking for a better programic solution.
I would like to return a success to the OS, but don't care to keep any of the memory content. The program does perform a lot of dynamic allocation/deallocation during the normal processing, so it's not just simple heap management.
Any opinions?
In Standard C++ you only have abort(), but that has the process return failure to the OS.
On many platforms (Unix, MS Windows) you can use _exit() to exit the program without running cleanup and destructors.
C++0x std::quick_exit is what you are looking for if your compiler already supports it (g++-4.4.5 does).
If the 15 GB of memory is being allocated to a reasonably small number of classes, you could override operator delete for those classes. Just pass the call to the standard delete, but set up a global flag that, if set, will make the call to delete a no-op. Or, if the logic of your program is such that these objects are not deleted in the normal course of building your data structures, you could simply ignore delete in all cases for these classes.
As Naveen says, this can't be a matter of memory deallocation. I've written neural network simulations with evolutionary algorithms that where allocating and freed lots of memory in small and large chunks and this was never a major issue.
If you have a C99 compiler, you can use the _Exit function to end immediately without having global object destructors or any functions registered with atexit to be called; whether or not unwritten buffered file data is flushed, open streams are closed, or temporary files are removed is implementation-defined (C99 §7.20.4.4).
If you're on Windows, you can also use ExitProcess to achieve the same effect.
But, as others have said, your destructors should really not be taking an hour to run unless you're doing a fair amount of I/O (writing log files, etc.). I strongly, strongly recommend you profile your program to see where the time is spent.
The possible strategies depend on the number of objects that are directly visible in main through which you access the 15GB of data and if these are local to main or statically allocated.
If all access to the 15GB of data is through local objects in main, then you can simply replace the return 0; at the end of main with exit(0);.
exit will terminate your application and trigger cleanup of statically allocated variables, but not of local variables.
If the data is accessed through a handful of statically allocated variables, you could turn them into pointers (or references) to dynamically allocated memory and deliberately leak that.
Related
It is well known that the usual cause of a std::bad_malloc being thrown is when memory is exhausted.
I'm executing an embedded, bare metal (without Operating System) application. The initial allocation sometimes succeeds, and sometimes fails. Since there is no other code running (no OS, no other processes), I have good reason to believe that this std::bad_alloc is more complex than allocating more memory than is available to the system. When it works, there is ~500kB of memory allocated. The hardware has been allocated 16MB.
Instead, it seems that the system has an incorrect record of how much memory is allocated, specifically that when the allocator begins, it thinks some non-zero amount of memory has already been allocated.
When a bare metal application starts, it should have identically zero memory allocated. It seems that is not the case here.
Is it possible that some state in the memory allocator is being retained between soft resets? How can I find out how the allocation is being done?
This is running on an ARMv7 with gcc (Sourcery CodeBench Lite 2013.05-40) 4.7.3, and linking against libstdc++6.0. We are compiling with -O0 -g.
We are allocating both std:: objects and user-defined objects with the new keyword, and the standard allocator.
If state is retained between restarts, then this is most likely due to incorrect initialisation at start-up.
It is the responsibility of the C++ runtime start-up to initialise the standard library. In a bare-metal system this includes initialising the heap and memory allocator. How this is done, and whether you have to perform any special action will depend on your particular tool-chain and library.
In ARM RealView (also used by Keil MDK-ARM) for example, you normally specify the heap size explicitly in the start-up code configuration. In other cases it is common for the linker script to automatically allocate all available memory that is not statically allocated or allocated to the stack to be allocated to the heap automatically. The linker script still needs to know the location and size of available RAM of course. Check your linker's map file output to verify the size and location of the heap.
Most embedded system libraries include stubs that must be user-implemented to match the library to the target - mostly this is related to I/O, but in the Newlib library (often used with GCC bare-metal toolchains) for example, the sbrk (or sbrk_r) stub must be reimplemented for correct heap operation. The implementation of sbrk is critical to the correct operation of the heap.
Added
It appears that Sourcery CodeBench Lite uses Newlib. I have seen implementations that start as follows:
caddr_t _sbrk(int incr)
{
extern char _ebss; // Defined by the linker
static char *heap_end;
char *prev_heap_end;
if (heap_end == 0) { ...
This is potentially unsafe if the runtime initialisation does not correctly initialise static data to zero (some embedded systems do that for faster start-up; though it is seldom worth the potential for bugs). Verify that your start-up performs zero-initialisation correctly and in any case explicitly initialise heap_end to zero:
static char *heap_end = 0 ;
in order to guarantee that it will work regardless of the strict correctness or otherwise of the start-up.
A good implementation of memory allocation should not get corrupted by allocating all available memory (or trying to allocate more than all) , such that std::bad_alloc is called. It is of course possible that there are bugs in the memory allocator in the OS you are using - there is no telling. But a "good" allocator should not get corrupted or stop working simply because your application allocated until it got std::bad_alloc.
However, whether just exiting the application with exit(1) at that point will free up the memory allocated by the application code or not depends on your operating system. If it handles cleanup of exiting application code, then you're safe. If the OS doesn't, then you need to catch the exception and clean everything up yourself. Since there are literally many hundreds of operating systems in existance, it's impossible to answer the question in a general way.
It should be pretty easy to test this: Just write an application that allocates a lot of memory in a loop, run it several times times and determine if the number of iterations in the loop is about the same number or, for example, reduces for each run of the application - in the latter case, you have some sort of problem.
I have a C++ app I'm developing in Linux. I'm allocating some dynamic memory and ultimately calling forkpty(). The child process is calling execl() and as we know, execl() never returns if it succeeds to execute the command. Furthermore, as we know, forkpty() makes a copy of all the parent's data. So, if the child() process never returns control back to my application in order to ultimately do memory cleanup, is it safe to say one better not have any dynamic memory allocated at the time execl() is called from the child process??? I can't believe I could not find this one on here... Thanks in advance.
Allocated memory is part of the process image; when you call
execl, the entire process image is replaced, and any memory in
it simply "disappears" like the rest of it, returning to the OS,
which will then use it elsewhere.
All of the "forked" process memory is freed as part of execl() (if the call is successful).
If this wasn't the case, there would be a lot of memory leaks all over a regular linux system, as it's almost impossible to write anything even a little complex without allocating memory, and, for example, if the arguments to execl() are allocated, you couldn't possibly free them before calling execl().
I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.
I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.
How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.
for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.
Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.
Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.
Every process can use heap memory to store and share data within the process. We have a rule in programming whenever we take some space in heap memory, we need to release it once job is done, else it leads to memory leaks.
int *pIntPtr = new int;
.
.
.
delete pIntPtr;
My question: Is heap memory per-process?
If YES,
then memory leak is possible only when a process is in running state.
If NO,
then it means OS is able to retain data in a memory somewhere. If so, is there a way to access this memory by another process. Also this may become a way for inter-process communication.
I suppose answer to my question is YES. Please provide your valuable feedback.
On almost every system currently in use, heap memory is per-process. On older systems without protected memory, heap memory was system-wide. (In a nutshell, that's what protected memory does: it makes your heap and stack private to your process.)
So in your example code on any modern system, if the process terminates before delete pIntPtr is called, pIntPtr will still be freed (though its destructor, not that an int has one, would not be called.)
Note that protected memory is an implementation detail, not a feature of the C++ or C standards. A system is free to share memory between processes (modern systems just don't because it's a good way to get your butt handed to you by an attacker.)
In most modern operating systems each process has its own heap that is accessible by that process only and is reclaimed once the process terminates - that "private" heap is usually used by new. Also there might be a global heap (look at Win32 GlobalAlloc() family functions for example) which is shared between processes, persists for the system runtime and indeed can be used for interprocess communications.
Generally the allocation of memory to a process happens at a lower level than heap management.
In other words, the heap is built within the process virtual address space given to the process by the operating system and is private to that process. When the process exits, this memory is reclaimed by the operating system.
Note that C++ does not mandate this, this is part of the execution environment in which C++ runs, so the ISO standards do not dictate this behaviour. What I'm discussing is common implementation.
In UNIX, the brk and sbrk system calls were used to allocate more memory from the operating system to expand the heap. Then, once the process finished, all this memory was given back to the OS.
The normal way to get memory which can outlive a process is with shared memory (under UNIX-type operating systems, not sure about Windows). This can result in a leak but more of system resources rather than process resources.
There are some special purpose operating systems that will not reclaim memory on process exit. If you're targeting such an OS you likely know.
Most systems will not allow you to access the memory of another process, but again...there are some unique situations where this is not true.
The C++ standard deals with this situation by not making any claim about what will happen if you fail to release memory and then exit, nor what will happen if you attempt to access memory that isn't explicitly yours to access. This is the very essence of what "undefined behavior" means and is the core of what it means for a pointer to be "invalid". There are more issues than just these two, but these two play a part.
Normally the O/S will reclaim any leaked memory when the process terminates.
For that reason I reckon it's OK for C++ programmers to never explicitly free any memory which is needed until the process exits; for example, any 'singletons' within a process are often not explicitly freed.
This behaviour may be O/S-specific, though (although it's true for e.g. both Windows and Linux): not theoretically part of the C++ standard.
For practical purposes, the answer to your question is yes. Modern operating systems will generally release memory allocated by a process when that process is shut down. However, to depend on this behavior is a very shoddy practice. Even if we can be assured that operating systems will always function this way, the code is fragile. If some function that fails to free memory suddenly gets reused for another purpose, it might translate to an application-level memory leak.
Nevertheless, the nature of this question and the example posted requires, ethically, for me to point you and your team to look at RAII.
int *pIntPtr = new int;
...
delete pIntPtr;
This code reeks of memory leaks. If anything in [...] throws, you have a memory leak. There are several solutions:
int *pIntPtr = 0;
try
{
pIntPtr = new int;
...
}
catch (...)
{
delete pIntPtr;
throw;
}
delete pIntPtr;
Second solution using nothrow (not necessarily much better than first, but allows sensible initialization of pIntPtr at the time it is defined):
int *pIntPtr = new(nothrow) int;
if (pIntPtr)
{
try
{
...
}
catch (...)
{
delete pIntPtr;
throw;
}
delete pIntPtr;
}
And the easy way:
scoped_ptr<int> pIntPtr(new int);
...
In this last and finest example, there is no need to call delete on pIntPtr as this is done automatically regardless of how we exit this block (hurray for RAII and smart pointers).
I have a C++ program which, during execution, will allocate about 3-8Gb of memory to store a hash table (I use tr1/unordered_map) and various other data structures.
However, at the end of execution, there will be a long pause before returning to shell.
For example, at the very end of my main function I have
std::cout << "End of execution" << endl;
But the execution of my program will go something like
$ ./program
do stuff...
End of execution
[long pause of maybe 2 min]
$ -- returns to shell
Is this expected behavior or am I doing something wrong?
I'm guessing that the program is deallocating the memory at the end. But, commercial applications which use large amounts of memory (such as photoshop) do not exhibit this pause when you close the application.
Please advise :)
Edit: The biggest data structure is an unordered_map keyed with a string and stores a list of integers.
I am using g++ -O2 on linux, the computer I am using has 128GB of memory (with most of that free). There are a few giant objects
Solution: I ended up getting rid of the hashtable since it was almost full anyways. This solved my problem.
If the data structures are sufficiently complicated when your program finishes, freeing them might actually take a long time.
If your program actually must create such complicated structures (do some memory profiling to make sure), there probably is no clean way around this.
You can short cut that freeing of memory by a dirty hack - at least on those operating systems where all memory allocated by a process is automatically freed when the process terminates.
You would do that by directly calling the libc's exit(3) function or the operating system's _exit(2). However, I would be very careful about verifying this does not short-circuit any other (important) cleanups some C++ destructor code might be doing. And what this does or does not do is highly system dependent (operating system, compiler, libc, the APIs you were using, ...).
Yes the deallocation of memory can take some time, and also possibly you have code executing like destructors being called. Photoshop does not use 3-8GB of memory.
Also you should perhaps add profiling to your application to confirm it is the deallocation of memory and not something else.
(I started this as a reply to ndim, but it got to long)
As ndim already posted, termination can take a long time.
Likely reasons are:
you have lots of allocations, and parts of the heap are swapped to disk.
long running destructors
other atexit routines
OS specific cleanup, such as notifying DLL's of thread & process termination on Windows (don't know what exactly happens on Linux.)
exit is not the worst workaround here, however, actual behavior is system dependent. e.g. exit on WIndows / MSVC CRT will run global destructors / atexit routines, then call ExitProcess which does close handles (but not necessarily flush them - at least it's not guaranteed).
Downsides: Destructors of heap allocated objects don't run - if you rely on them (e.g. to save state), you are toast. Also, tracking down real memory leaks gets much harder.
Find the cause You should first analyze what is happening.
e.g. by manually freeing the root objects that are still allocated, you can separate the deallocation time from other process cleanup. Memory is the likely cause accordign to your description, but it's not the only possible one. Some cleanup code deadlocking before it runs into a timeout is possible, too. Monitoring stats (such as CPU/swap activity/disk use) can give clues.
Check the release build - debug builds usually use extra data on the heap that can immensely increase cleanup cost.
Different allocators
Ifdeallocation is the problem, you might benefit a lot from using custom allocation mechanisms. Example: if your map only grows (items are never removed), an arena allocator can help a lot. If your lists of integers have many nodes, switch to a vector, or use a rope if you need random insertion.
Certainly it's possible.
About 7 years ago I had a similar problem on a project, there was much less memory but computers were slower too I suppose.
We had to look at the assembly languge for free in the end to work out why it was so slow and it seemed that it was essentially keeping the freed blocks in a linked list so they could be reallocated and was also scanning that list looking for blocks to combine. Scanning the list was an O(n) operation but freeing 'n' objects turned it into O(n^2)
Our test data took about 5 seconds to free the memory but some customers had about 10 times as much data as we every used and it was taking 5-10 minutes to shut down the program on their systems.
We fixed it, as has been suggested by just terminating the process instead and letting the operating system clear up the mess (which we knew was safe to do on our application).
Perhaps you have a more sensible free function that we had several years ago, but I just wanted to post that it's entirely possible if you have many objects to free and an O(n) free operation.
I can't imagine how you'd use enough memory for it to matter, but one way I sped up a program was to use boost::object_pool to allocate memory for a binary tree. The major benefit for me was that I could just put the object pool as a member variable of the tree, and when the tree went out of scope or was deleted, the object pool would be deleted all at once (letting me not have to use a recursive deconstructor for the nodes). object_pool does call all of its objects decontructors at exit though. I'm not sure if it handles empty decontructors in a special way or not.
If you don't need your allocator to call a constructor, you can also use boost::pool, which I think may deallocate faster because it doesn't have to call deconstructors at all and just deleted the chunk of memory in one free().
Freeing memory may well take time - data structures are being updated. How much time depends on the allocator being used.
Also there might be more than just memory deallocation going on - if destructors are being executed, there may be a lot more than that going on.
2 minutes does sound like a lot of time though - you might want to step through the clean up code in a debugger (or use a profiler if that's more convenient) to see what's actually taking all the time.
The time is probably not entirely wasted deallocating memory, but calling all the destructors. You can provide your own allocator that does not call the destructor (if the object in the map doesn't need to be destructed, but only deallocated).
Also take a look at this other question: C++ STL-conforming Allocators
Normally, deallocating memory as a process ends is not taken care of as part of the process, but rather as an operating system cleanup function. You might try something like valgrind to make sure your memory is being dealt with properly. However, the compiler also does certain things to set up and tear down your program, so some sort of performance profiling, or using a debugger to step through what is taking place at teardown time might be useful.
when your program exits the destructors of all the global objects are called.
if one of them takes a long time, you will see this behavior.
look for global objects and investigate their destructors.
Sorry, but this is a terrible question. You need to show the source code showing the specific algorithms and data structures that you are using.
It could be de-allocating, but that's just a wild guess. What are your destructors doing? Maybe is paging like crazy. Just because your application allocates X amount of memory, that doesn't mean it will get it. Most likely it will be paging off virtual memory. Depending on how the specifics of your application and OS, you might be doing a lot of page faults.
In such cases, it might help to run iostat and vmstat on the background to see what the heck is going on. If you see a lot of I/O that's a sure sign you are page faulting. I/O operations will always be more expensive that memory ops.
I would be very surprised if indeed all that lapsed time at the end is purely due to de-allocation.
Run vmstat and iostat as soon as you get the "ending" message, and look for any indications of I/O going bananas.
The objects in memory are organized in a heap. They are not deleted at once, they are deleted one by one, and the cost of deleting an object is O(log n). Freeing them takes loooong.
The answer is then, yes, it takes so much time.
You can avoid free being called on an object by using a destructor call my_object->~my_class() instead of delete my_object. You can avoid free on all objects of a class by overriding and nullifying operator delete( void * ) {} inside the class. Derived classes with virtual destructors will inherit that delete, otherwise you can copy-paste (or maybe using base::operator delete;).
This is much cleaner than calling exit. Just be sure you don't need that memory back!
I guess your unordered map is a global variable, whose constructor is called at process startup, and destructor is called at process exit.
How could you know if the map is guilty?
You can test if your unordered_map is responsible (and I guess it is) by allocating it with a new, and, well, ahem... forget to delete it.
If your process' exit goes faster, then you have your culprit.
Why this is so sloooooow?
Now, just by reading your post, for your unordered map, I see potential allocations for:
strings allocated buffer
list items (each one being a string + other things)
unordered map items + the bucket array
If you have 3-8 Gb of data in this unordered map, this means that each item above will need some kind of new and delete. And if you free every item, one by one, it could take time.
Other reasons?
Note that if you add items to your map item by item while your process executing, the new are not exactly perceptible... But the moment you want to clean all, all your allocated items must be destroyed at the same time, which could explain the perceived difference between construction/use and destruction...
Now, the destructors could take time for an additional reason.
For example, on Visual C++ 2008 in debug mode, for example, upon destruction of STL iterators, the destructor verifies the iterators are still correct. This caused quite a slowdown upon my object destruction (which was basically a tree of nodes, each node having list of child nodes, with iterators everywhere).
You are working on gcc, so perhaps they have their own debug testing, or perhaps your destructors are doing additional work (e.g. logging?)...
In my experience, the calls to free or delete should not take a significant amount of time. That said, I have seen plenty of cases where it does take non-trivial time to destruct objects because of destructors that did non-trivial things. If you can't tell what's taking time during the destruction, use a debugger and/or a profiler to determine what's going on. If the profiler shows you that it really is calls to free() that take a lot of time, then you should improve your memory allocation scheme, because you must be creating an extremely large number of small objects.
As you noted plenty of applications allocate large amounts of memory, and incur no significant memory during shutdown, so there's no reason your program can't do the same.
I would recommend (as some others have) a simple forced process termination, if you're certain that you've nothing left to do but free memory (for example, no file i/o and such left to do).
The thing is that when you free memory, typically, it's not actually returned to the OS - it's held in a list to be reallocated, and this is obviously slow. However, if you terminate process, the OS will lump reclaim all your memory at once, which should be substantially faster. However, as others have said, if you have any destructors that need to run, you should ensure that they are run before force calling exit() or ExitProcess or anysuch function.
What you should be aware of is that deallocating memory that is spread out (e.g., two nodes in a map) is much slower due to cache effects than deallocating memory in a vector, because the CPU needs to access the memory to free it and run any destructors. If you deallocated a very large amount of memory that's very fragmented, you could be falling afoul of this, and should consider changing to some more contiguous structures.
I actually had a problem where allocating memory was faster than de-allocating it, and after allocating memory and then de-allocating it, I had a memory leak. Eventually, I worked out that this is why.
I am currently facing a similar issue, with a CPU & memory intensive research program of mine. It runs until a specified time limit, prints a solutions and exits. The destructor call of a single object (containing up to 10⁶ relatively small objects) was what unexpectedly took time at the end of execution (about 10sec. to free 5Gb of data).
I was not satisfied by the answers advising to avoid executing every destructor, so here is the solution I came up with:
Original code:
void process() {
vector<unordered_map<State, int>> large_obj(100);
// Processing...
} // Takes a few seconds to exit (destructor calls)
Solution:
void process(bool free_mem = false) {
auto * large_obj_ = new vector<unordered_map<State, int>>(100);
auto &large_obj = *large_obj;
// Processing...
// (No changes required here, 'large_obj' can be used exactly as before)
if(free_mem)
delete large_obj_;
}
It has the advantage of being completely transparent apart from a few lines to insert, and it can even be parametrized to take some time to free the memory if needed. It is explicit which object will intentionally not be freed to avoid leaving things in an "unstable" state. Memory is cleaned up instantly by the OS on exit when free_mem = false.