Heap memory management of child process upon forkpty() and execl()?

Heap memory management of child process upon forkpty() and execl()? - c++

I have a C++ app I'm developing in Linux. I'm allocating some dynamic memory and ultimately calling forkpty(). The child process is calling execl() and as we know, execl() never returns if it succeeds to execute the command. Furthermore, as we know, forkpty() makes a copy of all the parent's data. So, if the child() process never returns control back to my application in order to ultimately do memory cleanup, is it safe to say one better not have any dynamic memory allocated at the time execl() is called from the child process??? I can't believe I could not find this one on here... Thanks in advance.

Allocated memory is part of the process image; when you call
execl, the entire process image is replaced, and any memory in
it simply "disappears" like the rest of it, returning to the OS,
which will then use it elsewhere.

All of the "forked" process memory is freed as part of execl() (if the call is successful).
If this wasn't the case, there would be a lot of memory leaks all over a regular linux system, as it's almost impossible to write anything even a little complex without allocating memory, and, for example, if the arguments to execl() are allocated, you couldn't possibly free them before calling execl().

Related

C++ DLL Unload Destructor?

So I have a DLL I wrote in C++.
However, it allocates memory using GlobalAlloc(). To avoid memory leaks, I want to keep track of these allocations and de-allocate all of them on the destruction of the DLL.
Is there any way to write a function that will be called when my DLL is unloaded?
One thing I can think of is creating a global object in my DLL and writing the memory free calls in its destructor, but this seems like overkill.
My other idea is to just rely on the operating system to free the memory when the DLL unloads, but this seems dirty.

Is there any way to write a function that will be called when my DLL is unloaded? One thing I can think of is creating a global object in my DLL and writing the memory free calls in its destructor
That's possible, although I believe exactly when your object's destructor will be called will be undefined.
You might be interested in DLL_PROCESS_DETACH, and although you should avoid doing anything significant in DllMain, it seems deallocating resources is acceptable here. Note the caveats:
When a DLL is unloaded from a process as a result of an unsuccessful load of the DLL, termination of the process, or a call to FreeLibrary, the system does not call the DLL's entry-point function with the DLL_THREAD_DETACH value for the individual threads of the process. The DLL is only sent a DLL_PROCESS_DETACH notification. DLLs can take this opportunity to clean up all resources for all threads known to the DLL.
When handling DLL_PROCESS_DETACH, a DLL should free resources such as heap memory only if the DLL is being unloaded dynamically (the lpReserved parameter is NULL). If the process is terminating (the lpvReserved parameter is non-NULL), all threads in the process except the current thread either have exited already or have been explicitly terminated by a call to the ExitProcess function, which might leave some process resources such as heaps in an inconsistent state. In this case, it is not safe for the DLL to clean up the resources. Instead, the DLL should allow the operating system to reclaim the memory.
You might need to elaborate on why your DLL can hold on to memory, if you have numerous objects created by the DLL, they should have a defined lifecycle and clean themselves up at the end of their life.
If they're not objects (i.e. memory being allocated and returned to the caller via functions) why not put the responsibility back onto whoever is consuming your DLL? They can free the memory. The Terminal Services library follows this pattern (WTSFreeMemory).
If the resources are long-lived and must exist for the lifetime of your library, let the consumer control the lifecycle of your library. Write two functions: MyFrameworkStartup and MyFrameworkShutdown as appropriate. Winsock follows this pattern (WSAStartup and WSACleanup).
My other idea is to just rely on the operating system to free the memory when the DLL unloads, but this seems dirty.
You'll be okay if the process is exiting:
Don't worry about freeing memory; it will all go away when the process address space is destroyed. Don't worry about closing handles; handles are closed automatically when the process handle table is destroyed. Don't try to call into other DLLs, because those other DLLs may already have received their DLL_PROCESS_DETACH notifications, in which case they may behave erratically in the same way that a Delphi object behaves erratically if you try to use it after its destructor has run.
Make sure you read the whole article and comments and understand it before implementing the "do nothing" strategy.

How/when is the memory allocated? Usually, the sanest option is to try to preserve some kind of symmetry (constructor allocates, destructor deallocates. Or memory allocated when DLL is loaded, and freed when DLL is unloaded).
In any case, if you want to be notified when the DLL is unloaded, look into the DllMain function, and specifically the DLL_PROCESS_DETACH parameter.

The DllMain function is called, with fdwReason set to DLL_PROCESS_DETACH, when a DLL is unloaded. As described in the documentation, make sure you check the value of lpvReserved and only free memory if it is NULL; you should not free memory if the process is terminating.

What happens to malloc'ed memory after exec() changes the program image?

I know that when I call one of the exec() system calls in Linux that it will replace the currently running process with a new image. So when I fork a new process and run exec(), the child will be replaced with the new process.
What happens to any memory I've allocated from the heap? Say I want to parse an arbitrary number of commands and send it into exec(). To hold this arbitrary number, I'll likely have to allocate memory at some point since I don't think I can do it correctly with static sized arrays, so I'll likely use malloc() or something equivalent.
I need to keep this memory allocated until after I've called exec(), but exec() never returns.
Does the memory get reclaimed by the operating system?

When you call fork(), a copy of the calling process is created. This child process is (almost) exactly the same as the parent, i.e. memory allocated by malloc() is preserved and you're free to read or modify it. The modifications will not be visible to the parent process, though, as the parent and child processes are completely separate.
When you call exec() in the child, the child process is replaced by a new process. From execve(2):
execve() does not return on success, and the text, data, bss, and stack
of the calling process are overwritten by that of the program loaded.
By overwriting the data segment, the exec() call effectively reclaims the memory that was allocated before by malloc().
The parent process is unaffected by all this. Assuming that you allocated the memory in the parent process before calling fork(), the memory is still available in the parent process.
EDIT: Modern implementations of malloc() use anonymous memory mappings, see mmap(2). According to execve(2), memory mappings are not preserved over an exec() call, so this memory is also reclaimed.

The entire heap -- allocated memory, and all of the logic malloc
uses to manage it -- is part of the process image which gets
replaced. It simply disappears, as far as your process is
concerned. The system, of course, recovers it and recycles it.

Quick successful exit from C++ with lots of objects allocated

I'm looking for a way to quickly exit a C++ that has allocated a lot of structures in memory using C++ classes. The program finishes correctly, but after the final "return" in the program, all of the auto-destructors kick in. The problem is the program has allocated about 15GB of memory through lots of C++ class structures, and this auto-destruct process takes about 1 more hour itself to complete as it walks through all of the structures - even though I don't care about the results. The program only took 1 hour to complete the task up to this point. I would like to just return to the OS and let it do its normal wholesale process allocation deletion - which is very quick. I've been doing this by manually killing the process during the cleanup stage - but am looking for a better programic solution.
I would like to return a success to the OS, but don't care to keep any of the memory content. The program does perform a lot of dynamic allocation/deallocation during the normal processing, so it's not just simple heap management.
Any opinions?

In Standard C++ you only have abort(), but that has the process return failure to the OS.
On many platforms (Unix, MS Windows) you can use _exit() to exit the program without running cleanup and destructors.

C++0x std::quick_exit is what you are looking for if your compiler already supports it (g++-4.4.5 does).

If the 15 GB of memory is being allocated to a reasonably small number of classes, you could override operator delete for those classes. Just pass the call to the standard delete, but set up a global flag that, if set, will make the call to delete a no-op. Or, if the logic of your program is such that these objects are not deleted in the normal course of building your data structures, you could simply ignore delete in all cases for these classes.

As Naveen says, this can't be a matter of memory deallocation. I've written neural network simulations with evolutionary algorithms that where allocating and freed lots of memory in small and large chunks and this was never a major issue.

If you have a C99 compiler, you can use the _Exit function to end immediately without having global object destructors or any functions registered with atexit to be called; whether or not unwritten buffered file data is flushed, open streams are closed, or temporary files are removed is implementation-defined (C99 §7.20.4.4).
If you're on Windows, you can also use ExitProcess to achieve the same effect.
But, as others have said, your destructors should really not be taking an hour to run unless you're doing a fair amount of I/O (writing log files, etc.). I strongly, strongly recommend you profile your program to see where the time is spent.

The possible strategies depend on the number of objects that are directly visible in main through which you access the 15GB of data and if these are local to main or statically allocated.
If all access to the 15GB of data is through local objects in main, then you can simply replace the return 0; at the end of main with exit(0);.
exit will terminate your application and trigger cleanup of statically allocated variables, but not of local variables.
If the data is accessed through a handful of statically allocated variables, you could turn them into pointers (or references) to dynamically allocated memory and deliberately leak that.

Does it take time to deallocate memory?

I have a C++ program which, during execution, will allocate about 3-8Gb of memory to store a hash table (I use tr1/unordered_map) and various other data structures.
However, at the end of execution, there will be a long pause before returning to shell.
For example, at the very end of my main function I have
std::cout << "End of execution" << endl;
But the execution of my program will go something like
$ ./program
do stuff...
End of execution
[long pause of maybe 2 min]
$ -- returns to shell
Is this expected behavior or am I doing something wrong?
I'm guessing that the program is deallocating the memory at the end. But, commercial applications which use large amounts of memory (such as photoshop) do not exhibit this pause when you close the application.
Please advise :)
Edit: The biggest data structure is an unordered_map keyed with a string and stores a list of integers.
I am using g++ -O2 on linux, the computer I am using has 128GB of memory (with most of that free). There are a few giant objects
Solution: I ended up getting rid of the hashtable since it was almost full anyways. This solved my problem.

If the data structures are sufficiently complicated when your program finishes, freeing them might actually take a long time.
If your program actually must create such complicated structures (do some memory profiling to make sure), there probably is no clean way around this.
You can short cut that freeing of memory by a dirty hack - at least on those operating systems where all memory allocated by a process is automatically freed when the process terminates.
You would do that by directly calling the libc's exit(3) function or the operating system's _exit(2). However, I would be very careful about verifying this does not short-circuit any other (important) cleanups some C++ destructor code might be doing. And what this does or does not do is highly system dependent (operating system, compiler, libc, the APIs you were using, ...).

Yes the deallocation of memory can take some time, and also possibly you have code executing like destructors being called. Photoshop does not use 3-8GB of memory.
Also you should perhaps add profiling to your application to confirm it is the deallocation of memory and not something else.

(I started this as a reply to ndim, but it got to long)
As ndim already posted, termination can take a long time.
Likely reasons are:
you have lots of allocations, and parts of the heap are swapped to disk.
long running destructors
other atexit routines
OS specific cleanup, such as notifying DLL's of thread & process termination on Windows (don't know what exactly happens on Linux.)
exit is not the worst workaround here, however, actual behavior is system dependent. e.g. exit on WIndows / MSVC CRT will run global destructors / atexit routines, then call ExitProcess which does close handles (but not necessarily flush them - at least it's not guaranteed).
Downsides: Destructors of heap allocated objects don't run - if you rely on them (e.g. to save state), you are toast. Also, tracking down real memory leaks gets much harder.
Find the cause You should first analyze what is happening.
e.g. by manually freeing the root objects that are still allocated, you can separate the deallocation time from other process cleanup. Memory is the likely cause accordign to your description, but it's not the only possible one. Some cleanup code deadlocking before it runs into a timeout is possible, too. Monitoring stats (such as CPU/swap activity/disk use) can give clues.
Check the release build - debug builds usually use extra data on the heap that can immensely increase cleanup cost.
Different allocators
Ifdeallocation is the problem, you might benefit a lot from using custom allocation mechanisms. Example: if your map only grows (items are never removed), an arena allocator can help a lot. If your lists of integers have many nodes, switch to a vector, or use a rope if you need random insertion.

Certainly it's possible.
About 7 years ago I had a similar problem on a project, there was much less memory but computers were slower too I suppose.
We had to look at the assembly languge for free in the end to work out why it was so slow and it seemed that it was essentially keeping the freed blocks in a linked list so they could be reallocated and was also scanning that list looking for blocks to combine. Scanning the list was an O(n) operation but freeing 'n' objects turned it into O(n^2)
Our test data took about 5 seconds to free the memory but some customers had about 10 times as much data as we every used and it was taking 5-10 minutes to shut down the program on their systems.
We fixed it, as has been suggested by just terminating the process instead and letting the operating system clear up the mess (which we knew was safe to do on our application).
Perhaps you have a more sensible free function that we had several years ago, but I just wanted to post that it's entirely possible if you have many objects to free and an O(n) free operation.

I can't imagine how you'd use enough memory for it to matter, but one way I sped up a program was to use boost::object_pool to allocate memory for a binary tree. The major benefit for me was that I could just put the object pool as a member variable of the tree, and when the tree went out of scope or was deleted, the object pool would be deleted all at once (letting me not have to use a recursive deconstructor for the nodes). object_pool does call all of its objects decontructors at exit though. I'm not sure if it handles empty decontructors in a special way or not.
If you don't need your allocator to call a constructor, you can also use boost::pool, which I think may deallocate faster because it doesn't have to call deconstructors at all and just deleted the chunk of memory in one free().

Freeing memory may well take time - data structures are being updated. How much time depends on the allocator being used.
Also there might be more than just memory deallocation going on - if destructors are being executed, there may be a lot more than that going on.
2 minutes does sound like a lot of time though - you might want to step through the clean up code in a debugger (or use a profiler if that's more convenient) to see what's actually taking all the time.

The time is probably not entirely wasted deallocating memory, but calling all the destructors. You can provide your own allocator that does not call the destructor (if the object in the map doesn't need to be destructed, but only deallocated).
Also take a look at this other question: C++ STL-conforming Allocators

Normally, deallocating memory as a process ends is not taken care of as part of the process, but rather as an operating system cleanup function. You might try something like valgrind to make sure your memory is being dealt with properly. However, the compiler also does certain things to set up and tear down your program, so some sort of performance profiling, or using a debugger to step through what is taking place at teardown time might be useful.

when your program exits the destructors of all the global objects are called.
if one of them takes a long time, you will see this behavior.
look for global objects and investigate their destructors.

Sorry, but this is a terrible question. You need to show the source code showing the specific algorithms and data structures that you are using.
It could be de-allocating, but that's just a wild guess. What are your destructors doing? Maybe is paging like crazy. Just because your application allocates X amount of memory, that doesn't mean it will get it. Most likely it will be paging off virtual memory. Depending on how the specifics of your application and OS, you might be doing a lot of page faults.
In such cases, it might help to run iostat and vmstat on the background to see what the heck is going on. If you see a lot of I/O that's a sure sign you are page faulting. I/O operations will always be more expensive that memory ops.
I would be very surprised if indeed all that lapsed time at the end is purely due to de-allocation.
Run vmstat and iostat as soon as you get the "ending" message, and look for any indications of I/O going bananas.

The objects in memory are organized in a heap. They are not deleted at once, they are deleted one by one, and the cost of deleting an object is O(log n). Freeing them takes loooong.
The answer is then, yes, it takes so much time.

You can avoid free being called on an object by using a destructor call my_object->~my_class() instead of delete my_object. You can avoid free on all objects of a class by overriding and nullifying operator delete( void * ) {} inside the class. Derived classes with virtual destructors will inherit that delete, otherwise you can copy-paste (or maybe using base::operator delete;).
This is much cleaner than calling exit. Just be sure you don't need that memory back!

I guess your unordered map is a global variable, whose constructor is called at process startup, and destructor is called at process exit.
How could you know if the map is guilty?
You can test if your unordered_map is responsible (and I guess it is) by allocating it with a new, and, well, ahem... forget to delete it.
If your process' exit goes faster, then you have your culprit.
Why this is so sloooooow?
Now, just by reading your post, for your unordered map, I see potential allocations for:
strings allocated buffer
list items (each one being a string + other things)
unordered map items + the bucket array
If you have 3-8 Gb of data in this unordered map, this means that each item above will need some kind of new and delete. And if you free every item, one by one, it could take time.
Other reasons?
Note that if you add items to your map item by item while your process executing, the new are not exactly perceptible... But the moment you want to clean all, all your allocated items must be destroyed at the same time, which could explain the perceived difference between construction/use and destruction...
Now, the destructors could take time for an additional reason.
For example, on Visual C++ 2008 in debug mode, for example, upon destruction of STL iterators, the destructor verifies the iterators are still correct. This caused quite a slowdown upon my object destruction (which was basically a tree of nodes, each node having list of child nodes, with iterators everywhere).
You are working on gcc, so perhaps they have their own debug testing, or perhaps your destructors are doing additional work (e.g. logging?)...

In my experience, the calls to free or delete should not take a significant amount of time. That said, I have seen plenty of cases where it does take non-trivial time to destruct objects because of destructors that did non-trivial things. If you can't tell what's taking time during the destruction, use a debugger and/or a profiler to determine what's going on. If the profiler shows you that it really is calls to free() that take a lot of time, then you should improve your memory allocation scheme, because you must be creating an extremely large number of small objects.
As you noted plenty of applications allocate large amounts of memory, and incur no significant memory during shutdown, so there's no reason your program can't do the same.

I would recommend (as some others have) a simple forced process termination, if you're certain that you've nothing left to do but free memory (for example, no file i/o and such left to do).
The thing is that when you free memory, typically, it's not actually returned to the OS - it's held in a list to be reallocated, and this is obviously slow. However, if you terminate process, the OS will lump reclaim all your memory at once, which should be substantially faster. However, as others have said, if you have any destructors that need to run, you should ensure that they are run before force calling exit() or ExitProcess or anysuch function.
What you should be aware of is that deallocating memory that is spread out (e.g., two nodes in a map) is much slower due to cache effects than deallocating memory in a vector, because the CPU needs to access the memory to free it and run any destructors. If you deallocated a very large amount of memory that's very fragmented, you could be falling afoul of this, and should consider changing to some more contiguous structures.
I actually had a problem where allocating memory was faster than de-allocating it, and after allocating memory and then de-allocating it, I had a memory leak. Eventually, I worked out that this is why.

I am currently facing a similar issue, with a CPU & memory intensive research program of mine. It runs until a specified time limit, prints a solutions and exits. The destructor call of a single object (containing up to 10⁶ relatively small objects) was what unexpectedly took time at the end of execution (about 10sec. to free 5Gb of data).
I was not satisfied by the answers advising to avoid executing every destructor, so here is the solution I came up with:
Original code:
void process() {
vector<unordered_map<State, int>> large_obj(100);
// Processing...
} // Takes a few seconds to exit (destructor calls)
Solution:
void process(bool free_mem = false) {
auto * large_obj_ = new vector<unordered_map<State, int>>(100);
auto &large_obj = *large_obj;
// Processing...
// (No changes required here, 'large_obj' can be used exactly as before)
if(free_mem)
delete large_obj_;
}
It has the advantage of being completely transparent apart from a few lines to insert, and it can even be parametrized to take some time to free the memory if needed. It is explicit which object will intentionally not be freed to avoid leaving things in an "unstable" state. Memory is cleaned up instantly by the OS on exit when free_mem = false.

Vectored Exception Handling During StackOverflowException

If I've registered my very own vectored exception handler (VEH) and a StackOverflow exception had occurred in my process, when I'll reach to the VEH, will I'll be able to allocate more memory on the stack? will the allocation cause me to override some other memory? what will happen?
I know that in .Net this is why the entire stack is committed during the thread's creation, but let's say i'm writing in native and such scenario occurs ... what will i able to do inside the VEH? what about memory allocation..?

In the case of a stack overflow, you'll have a tiny bit of stack to work with. It's enough stack to start a new thread, which will have an entirely new stack. From there, you can do whatever you need to do before terminating.
You cannot recover from a stack overflow, it would involve unwinding the stack, but your entire program would be destroyed in the progress. Here's some code I wrote for a stack-dumping utility:
// stack overflows cannot be handled, try to get output then quit
set_current_thread(get_current_thread());
boost::thread t(stack_fail_thread);
t.join(); // will never exit
All this did was get the thread's handle so the stack dumping mechanism knew which thread to dump, start a new thread to do the dumping/logging, and wait for it to finish (which won't happen, the thread calls exit()).
For completeness, get_current_thread() looked like this:
const HANDLE process = GetCurrentProcess();
HANDLE thisThread = 0;
DuplicateHandle(process, GetCurrentThread(), process,
&thisThread, 0, true, DUPLICATE_SAME_ACCESS);
All of these are "simple" functions that don't require a lot of room to work (and keep in mind, the compiler will inline these msot likely, removing a function call). You cannot, contrarily, throw an exception. Not only does that require much more work, but destructors can do quite a bit of work (like deallocating memory), which tend to be complex as well.
Your best bet is to start a new thread, save as much information about your application as you can or want, then terminate.

No, you can't allocate memory in vectored exception handler.
MSDN says it explicitly:
"The handler should not call functions that acquire synchronization objects or allocate memory, because this can cause problems. Typically, the handler will simply access the exception record and return."

Stacks need to be contiguous, so you can't just allocate any random memory but have to allocate the next part of the address space.
If you are willing to preallocate the address space (i.e. just reserve a range of addresses without actually allocating memory), you can use VirtualAlloc. First you call it with the MEM_RESERVE flag to set aside the address space. Later, in your exception handler you can call it again with MEM_COMMIT to allocate physical memory to your pre-reserved address space.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js