Shielding app from library leaks - c++

I have to use a function from a shared library which leaks some small amount of memory (Let's assume I can't modify the library). Unfortunately, I have to call this function huge number of times which obviously makes this leak catastrophic.
Is there any method to fix this problem? If yes, is there a fast method of doing this? (The function must be called few hundred thousand times, the leak becomes problematic after about 10k times)

I can think of a couple of approaches, but I don't know what will work for you.
Switch to a garbage-collecting memory allocator like Boehm's gc. This can sweep up those leaks, and may even be a performance gain because free() becomes a no-op.
exit(): The Ultimate Deallocator. Fork off a subprocess, run it 10k times, pass the results back to the parent process. Apache's web server does this to contain damage from third-party library leaks.

I'm not sure this easier than rewriting the function yourself, but you could write your own small memory allocator specific for your task, which would look somewhat the following way:
(it should replace default memory allocation calls and this is done for the functions in your library too).
1) You should have a possibility to enter the leak-reverting mode, which, for example, disposes everything allocated in this mode.
2) Before your function processes something, enter that leak-reverting mode and exit it upon the function finishes.
Basically, if the dependencies in your code aren't too tight, this would help.
Another way would be making another application and pairing it with the main. When the second one exits, the memory would be automatically disposed. You may want to see how googletest framework runs it's child test and how the pipes are being constructed there.

In short, no. If you have time, you can rewrite the function yourself. Catastrophic usually means this is the way to go. One other possibility, can you load and unload the library (like a .so)? It's possible that this will release the leaked memory.

Related

Cleaning up when process goes down - WHY

I'm writing c++ and have multiple objects I created during the life of the process, when the process goes down all its memory release to the OS so why should I clean up and release stuff ?
Because it forces you to write clean reusable software. Let's imagine the situation where, say, you finish your game and all is fine even though you don't have clean-up code.
Now, given the huge success, you make a sequel and plan to reuse the code. But this time round you have a coop mode, a matchmaking lobby and a story mode. You realize you could simply delete the main Game* object, and create a new one with different parameters, when switching modes.
Then you realize: oops, although this would work nicely, all the unallocated memory prevents your from using this approach.
This is the pattern that always happen. It comes in different flavors. You want to write unit tests but between tests you need to clean-up everything. No clean-up: no way to have a clean test after the first one.
It looks like extra work, but it will pay off a lot in the future.
I can think of multiple reasons:
Modularity
An application is usually a collection of modules
Modules should have a well defined purpose, and should be reusable throughout your application, or even better, applications
When writing a module, one should not assume how or when someone else is going to use it, so s/he should exercise best-practices and write it correctly
Analysis
When intentionally leaking resources, analyzing actual memory leaks
becomes a nearly impossible task. You can't differentiate modules that should free resources and those that shouldn't.
Because what you think is wrong. In some cases indeed, you are right. If you instanciate objects like this: A a;, then A will be destroyed when getting outside its context. Now if you do A *a = new A[12]; then 12 memory spaces the size of an instance of A are reserved. If you do nothing, at the end of the context, then a will be destroyed. But a is a pointer, so only the address of the array will be destroyed, the 12 spaces will still be allocated in memory and not accessible. They are lost for everyone.
And there are many other examples. Try to use Valgrind and you will see.
Now if your program gets more and more complicated, adding all these memory leaks could become a problem in the context of your own program. Of course when your program is over, the memory is given back to the OS, but that is definitely not a reason to tolerate memory leaks.
There are many good reasons.
1) Some systems do not clean up after you. Especially true in embedded systems, or systems running without an OS.
2) Usually an application main() is merely instantiating modules and subsystems that are reusable and should manage their own resources. If main() is simply a minor function that instantiates some objects then runs the event loop, then your "application cleanup" code is not really application cleanup, but module cleanup. A big difference.
3) Imagine your library was to run in someone else's code and you do not know how long it will be used, relative to the lifetime of the whole application. If your code leaks, it will quickly be discarded as a low quality piece of code.
4) If it is difficult to clean up after yourself, it likely is an indicator of bugs in your design and memory / resource management. By properly managing resources, you force a clean design and do not leave such management as an afterthought. Therefore, I argue that your code is less buggy when you design it to clean up when it is finished.

Testing C++ code and IsBadWritePtr

I am currently writing some basic tests for some functions of my C++ code (it is a game engine that I am writing mostly for educational purposes). One of the features I want to test is the memory allocation code. The tests currently consist of a function that runs every startup if the code is in debug mode. This forces me to always test the code when I am debugging.
To test my memory allocation code my instinct is to do something like this:
int* test = MemoryManager::AllocateMemory<int>();
assert(!IsBadWritePtr(test, sizeof(int)), "Memory allocation test failed: allocate");
MemoryManager::FreeMemory(test);
assert(IsBadWritePtr(test, sizeof(int)), "Memory free test failed: free");
This code is working fine, but all the resources I can find say not to use the IsBadWritePtr function (this is a WinAPI function for those unfamiliar). Is the use of this function OK in this case? The three main warnings against using it I found were:
This might cause issues with guard pages
This isn't an issue as the memory allocation code is right there and I know I am not allocating a guard page.
It's better to fail earlier
This isn't a consideration as I am literally using it to fail as early as possible.
It isn't thread safe
The test code is executed right at the beginning of execution, long before any other threads exist. It also acts on memory to which no other pointers are created, and therefore could not exist in other threads.
So basically I want to know if the use of this function is a good idea in this case, and if there is anything I am missing about the function. I am also aware that something that points to the wrong place will still pass this test, but it at least detects most memory allocation errors, right (what are the chances I get a pointer to valid memory if the allocation fails?)
I was going to write this as a comment but it's too long.
I'll bring up the elephant in the room: why don't you just test for failure in the traditional way (by returning null or throwing an exception)?
Nevermind the fact that IsBadWritePtr is so frowned upon (even its documentation says that it's obsolete and you shouldn't use it), but your use case doesn't even seem appropriate. From the MSDN documentation:
This function is typically used when working with pointers returned from third-party libraries, where you cannot determine the memory management behavior in the third-party DLL.
But you are not using it to test anything passed/returned from a DLL, you just seem to be using it to test for allocation success, which is not only unnecessary (because you already know that from the return value of HeapAlloc, GlobalAlloc, etc.), but it's not what IsBadWritePtr is intended for.
Also, testing for allocation success is not something you should only do in debug mode, or with asserts, as it's obviously out of your control and you can't try to "fix" it by debugging.
Building on #user1610015's answer, there one reason why IsBadReadPtr should NOT work in your scenario.
Basically IsBadReadPtr works on whole page granularity. This means for the above code to be correct each and every allocation you make will consume a whole page (min 4KB).
Modern allocators use a variety of tricks to pack lots of allocations into pages (low fragmentation bucket heaps, linked lists of allocations, etc). If you don't pack small allocation like this then thinks like stl maps and other libraries which use lots of small allocations will absolutely kill your game (both in memory use and the fact that cache coherency will be killed with so much unused padding).
As a side not, your last comment about thread safety is dangerous. Lots of apps and libraries you link to can spawn threads off with global object constructors (and so run before main is called) and other tricks to insert code into your process. So I would definitely check this is the case with your code right now but more importantly later as you add 3rd party libraries to you code check it then.

Allocation numbers in C++ (windows) and its predictibility

I am using _CrtDumpMemoryLeaks to identify memory leaks in our software. We are using a third party library in a multi-threaded application. This library does have memory leaks and therefore in our tests we want to identify those that ours and discard those we do not have any control over.
We use continuous integration so new functions/algorithms/bug fixes get added all the time.
So the question is - is there a safe way of identifying those leaks that are ours and those that are the third parties library. We though about using allocation numbers but is that safe?
In a big application I worked on the global new and delete operators were overwritten (eg. see How to properly replace global new & delete operators) and used private heaps (eg. HeapCreate). Third party libraries would use the process heap and thus the allocation would be clearly separated.
Frankly I don't think you can get far with allocation numbers. Using explicit separate heaps for app/libraries (and maybe even have separate per-component heaps within your own app) would be much more manageable. Consider that you can add your own app specific header to each allocated block and thus enable very fancy memory tracking. For example capture the allocation entire call-stack would be possible, for debugging. Enable per-component accounting. Etc etc.
You might be able to do this using Mirosoft's heap debugging library without using any third-party solutions. Based on what I learned from a previous question here, you should just make sure that all memory allocated in your code is allocated through a call to _malloc_dbg where the second argument is set to _CLIENT_BLOCK. Then you can set a callback function with _CrtSetDumpClient, and that callback will only receive information about the client blocks that were allocated, not the other ones.
You can easily use the preprocessor to convert all the calls to malloc and free to actually call their debugging versions (e.g. _malloc_dbg); just look at how it's done in crtdbg.h which comes with Visual Studio.
The tricky part for me would be figuring out how to override the new and delete operators to call debugging functions like _malloc_dbg. It might be hard to find a solution where only the news and deletes in your own code are affected, and not in the third-party library.
You may want to use DebugDiag Tool provided by Microsoft. For complete information about the tool
we can refer : http://www.microsoft.com/en-sg/download/details.aspx?id=40336
DebugDiag can be used for identifying various issue. We can follow the steps to track down the
leaks(ours and third party module):
Configure the DebugDiag under Rule Type "Native(non .NET) Memory and Handle Leak".
Now Re-run the application for sometime and capture the dump files. We can also configure
the DebugDiag to capture the dump file after specified interval.
Now we can open/analyze the captured dump file using DebugDiag under the "Performance Analyzers".
Once analysis is complete, DebugDiag would automatically generate the report and would give you the
modules/DLL information where leak is possible(with probability). After this we get the information about the modules from DebugDiag tool, we can concentrate on that particular module by doing static code analysis. If modules belongs to third party DLL, we can share the DebugDiag report to them. In addition to this, if you run/attach your application with appropriate PDB file, DebugDiag also provides the call stack from where chances of memory leak is possible.
These information were very useful in the past while debugging memory leak on windows based application. Hopefully above information would be useful.
The answer would REALLY depend on the actual implementation of the third partly library. Does it only leak a consistent number of items, or does that depend on, for example, the number of threads, what functions are used within the library, or some such? When are the allocations made?
Even then if it's a consistent number of leaks regardless of library usage, I'd be hesitant to use this the allocation number. By all means, give it a try. If all the allocations are made very early on, and they don't depend on any of "your" code, then it could work - and it is a REALLY simple thing. But try adding for example a static std::vector<int>(100) to see if memory allocations in static variables are affecting the allocation number... If it does, this method is probably doomed (unless you have very strict rules on static objects).
Using a separate heap (with new/delete operators replaced) would be the correct solution, as this can probably be expanded to gather other statistics too [like number of allocations made, to detect parts of the code that makes excessive allocations - of course, this has to be analysed based on what the code actually does].
The newer Doug Lea malloc's include the mspace abstraction. An mspace is a separate heap. In our couple 100K NCSL application, we use a dozen different mspace's for different parts of the code. We use allocators to have STL containers allocate memory from the right mspace.
Some of the benefits
3rd party code does not use mspaces, so their allocation (and leaks) do not mix with ours
We can look at the memory usage of each mspace to see which piece of code might have memory leaks
Any memory corruption is contained within one mspace thus limiting the amount of code we need to look at for debugging.

C++ delete [] - how to check if "all is deleted"?

I was wondering, throughout a program I am using a lot of char* pointers to cstrings, and other pointers.
I want to make sure that I have delete all pointers after the program is done, even though Visual Studio and Code Blocks both do it for me (I think..).
Is there a way to check is all memory is cleared? That nothing is still 'using memory'?
The obvious answer on Linux would be valgrind, but the VS mention makes me think you're on Windows. Here is a SO thread discussing valgrind alternatives for windows.
I was wondering, throughout a program I am using a lot of char* pointers to cstrings
Why? I write C++ code every day and I very rarely use a pointer at all. Really it's only when using third party API's, and even then I can usually work around it to some degree. Not that pointers are inherently bad, but if you can avoid them, do so as it simplifies your program.
I want to make sure that I have delete all pointers after the program is done
This is a bit of a pointless exercise. The OS will do it for you when it cleans up your process.
Visual Studio is an IDE. It isn't even there by the time your code is deployed. It doesn't release memory for you.
For what you want you can look into tools like this:
http://sourceforge.net/apps/mediawiki/cppcheck/index.php?title=Main_Page
You might want to use some sort of a smart pointer (see the Boost library, for example). The idea is that instead of having to manage memory manually (that is call delete explicitly when you don't need the object any more), you enlist the power of RAII to do the work for you.
The problem with manual memory management (and in general resource management) is that it is hard to write a program that properly deallocates all memory -- because you forget, or later when you change your code do not realize there was some other memory that needed to be deallocated.
RAII in C++ takes advantage of the fact that the destructor of a stack-allocated object is called automatically when that object goes out of scope. If the destructor logic is written properly, the last object that references (manages) the dynamically allocated data will be the one (and only one) that deallocates that memory. This could be achieved via reference counting, maintaining a list of references, etc.
The RAII paradigm for memory is in a sense similar to the garbage collection of mamged languages, except it is running when needed and dictated by your code, not at certain intervals, largely independent from your code.
Unless you're writing driver code, there's absolutely nothing you can do with heap/freestore-allocated memory that would cause it to remain leaked once the program terminates. Leaked memory is only an issue during the lifetime of a given process; once a particular process has leaked its entire address space, it can't get any more for _that_particular_ process.
And it's not your compiler that does the uiltimate cleanup, it's the OS. In all modern operating systems that support segregated process address spaces (i.e. in which one process cannot read/write another process's address space, at least not without OS assistance), when a program terminates, the process's entire address space is reclaimed by the OS, whether or not the program has cleanly free()ed or deleted all of its heap/freestore-allocated memory. You can easily test this: Write a program that allocates 256 MBytes of space and then either exits or returns normally without freeing/deleting the allocated memory. Then run your program 16 times (or more). Shoot, run it 1000 times. If exiting without releasing that memory made it leak into oblivion until reboot, then you'd very soon run out of available memory. You'll find this to not be the case.
This same carelessness is not acceptable for certain other types of memory, however: If your program is holding onto any "handles" (a number that identifies some sort of resource granted to the process by the operating system), in some cases if you exit the program abnormally or uncleanly without releasing the handles, they remain lost until the next reboot. But unless your program is calling OS functions that give you such handles, that isn't a concern anyway.
If this is windows specific you can call this at the start of your program :-
_CrtSetDbgFlag(_crtDbgFlag | _CRTDBG_LEAK_CHECK_DF);
After including
#include <crtdbg.h>
And when your program exits the C++ runtime will output to the debugger a list of all memory blocks that are still allocated so you can see if you forgot to free anything.
Note that this only happens in DEBUG builds, in RELEASE builds the code does nothing (which is probabyl what you want anyway)

Is it acceptable not to deallocate memory

I'm working on a project that is supposed to be used from the command line with the following syntax:
program-name input-file
The program is supposed to process the input, compute some stuff and spit out results on stdout.
My language of choice is C++ for several reasons I'm not willing to debate. The computation phase will be highly symbolic (think compiler) and will use pretty complex dynamically allocated data structures. In particular, it's not amenable to RAII style programming.
I'm wondering if it is acceptable to forget about freeing memory, given that I expect the entire computation to consume less than the available memory and that the OS is free to reclaim all the memory in one step after the program finishes (assume program terminates in seconds). What are your feeling about this?
As a backup plan, if ever my project will require to run as a server or interactively, I figured that I can always refit a garbage collector into the source code. Does anyone have experience using garbage collectors for C++? Do they work well?
It shouldn't cause any problems in the specific situation described the question.
However, it's not exactly normal. Static analysis tools will complain about it. Most importantly, it builds bad habits.
Sometimes not deallocating memory is the right thing to do.
I used to write compilers. After building the parse tree and traversing it to write the intermediate code, we would simply just exit. Deallocating the tree would have
added a bit of slowness to the compiler, which we wanted of course to be as fast as possible.
taken up code space
taken time to code and test the deallocators
violated the "no code executes better than 'no code'" dictum.
HTH! FWIW, this was "back in the day" when memory was non-virtual and minimal, the boxes were much slower, and the first two were non-trivial considerations.
My feeling would be something like "WTF!!!"
Look at it this way:
You choose a programming language that does not include a garbage collector, we are not allowed to ask why.
You are basically stating that you are too lazy to care about freeing the memory.
Well, WTF again. Laziness isn't a good reason for anything, the least of what is playing around with memory without freeing it.
Just free the memory, it's a bad practice, the scenario may change and then can be a million reasons you can need that memory freed and the only reason for not doing it is laziness, don't get bad habits, and get used to do things right, that way you'll tend to do them right in the future!!
Not deallocating memory should not be problem but it is a bad practice.
Joel Coehoorn is right:
It shouldn't cause any problems.
However, it's not exactly normal.
Static analysis tools will complain
about it. Most importantly, it builds
bad habits.
I'd also like to add that thinking about deallocation as you write the code is probably a lot easier than trying to retrofit it afterwards. So I would probably make it deallocate memory; you don't know how your program might be used in future.
If you want a really simple way to free memory, look at the "pools" concept that Apache uses.
Well, I think that it's not acceptable. You've already alluded to potential future problems yourself. Don't think they're necessarily easy to solve.
Things like “… given that I expect the entire computation to consume less …” are famous last phrases. Similarly, refitting code with some feature is one of these things they all talk of and never do.
Not deallocating memory might sound good in the short run but can potentially create a huge load of problems in the long run. Personally, I just don't think that's worth it.
There are two strategies. Either you build in the GC design from the very beginning. It's more work but it will pay off. For a lot of small objects it might pay to use a pool allocator and just keep track of the memory pool. That way, you can keep track of the memory consumption and simply avoid a lot of problems that similar code, but without allocation pool, would create.
Or you use smart pointers throughout the program from the beginning. I actually prefer this method even though it clutters the code. One solution is to rely heavily on templates, which takes out a lot of redundancy when referring to types.
Take a look at projects such as WebKit. Their computation phase resembles yours since they build parse trees for HTML. They use smart pointers throughout their program.
Finally: “It’s a question of style … Sloppy work tends to be habit-forming.”
– Silk in Castle of Wizardry by David Eddings.
will use pretty complex dynamically
allocated data structures. In
particular, it's not amenable to RAII
style programming.
I'm almost sure that's an excuse for lazy programming. Why can't you use RAII? Is it because you don't want to keep track of your allocations, there's no pointer to them that you keep? If so, how do you expect to use the allocated memory - there's always a pointer to it that contains some data.
Is it because you don't know when it should be released? Leave the memory in RAII objects, each one referenced by something, and they'll all trickle-down free each other when the containing object gets freed - this is particularly important if you want to run it as a server one day, each iteration of the server effective runs a 'master' object that holds all others so you can just delete it and all the memory disappears. It also helps prevent you retro-fitting a GC.
Is it because all your memory is allocated and kept in-use all the time, and only freed at the end? If so see above.
If you really, really cannot think of a design where you cannot leak memory, at least have the decency to use a private heap. Destroy that heap before you quit and you'll have a better design already, if a little 'hacky'.
There are instances where memory leaks are ok - static variables, globally initialised data, things like that. These aren't generally large though.
Reference counting smart pointers like shared_ptr in boost and TR1 could also help you manage your memory in a simple manner.
The drawback is that you have to wrap every pointers that use these objects.
I've done this before, only to find that, much later, I needed the program to be able to process several inputs without separate commands, or that the guts of the program were so useful that they needed to be turned into a library routine that could be called many times from within another program that was not expected to terminate. It was much harder to go back later and re-engineer the program than it would have been to make it leak-less from the start.
So, while it's technically safe as you've described the requirements, I advise against the practice since it's likely that your requirements may someday change.
If the run time of your program is very short, it should not be a problem. However, being too lazy to free what you allocate and losing track of what you allocate are two entirely different things. If you have simply lost track, its time to ask yourself if you actually know what your code is doing to a computer.
If you are just in a hurry or lazy and the life of your program is small in relation to what it actually allocates (i.e. allocating 10 MB per second is not small if running for 30 seconds) .. then you should be OK.
The only 'noble' argument regarding freeing allocated memory sets in when a program exits .. should one free everything to keep valgrind from complaining about leaks, or just let the OS do it? That entirely depends on the OS and if your code might become a library and not a short running executable.
Leaks during run time are generally bad, unless you know your program will run in a short amount of time and not cause other programs far more important than your's as far as the OS is concerned to skid to dirty paging.
What are your feeling about this?
Some O/Ses might not reclaim the memory, but I guess you're not intenting to run on those O/Ses.
As a backup plan, if ever my project will require to run as a server or interactively, I figured that I can always refit a garbage collector into the source code.
Instead, I figure you can spawn a child process to do the dirty work, grab the output from the child process, let the child process die as soon as possible after that and then expect the O/S to do the garbage collection.
I have not personally used this, but since you are starting from scratch you may wish to consider the Boehm-Demers-Weiser conservative garbage collector
The answer really depends on how large your program will be and what performance characteristics it needs to exhibit. If you never deallocate memory, your process's memory footprint will be much larger than it would otherwise be. Depeding on the system, this could cause a lot of paging and slow down the performance for you or other applications on the system.
Beyond that, what everyone above says is correct. It probably won't cause harm in the short term, but it's a bad practice that you should avoid. You'll never be able to use the code again. Trying to retrofit a GC on afterwards will be a nightmare. Just think about going to each place you allocate memory and trying to retrofit it but not break anything.
One more reason to avoid doing this: reputation. If you fail to deallocate, everyone who maintains the code will curse your name and your rep in the company will take a hit. "Can you believe how dumb he was? Look at this code."
If it is non-trivial for you to determine where to deallocate the memory, I would be concerned that other aspects of the data structure manipulation may not be fully understood either.
Apart from the fact that the OS (kernel and/or C/C++ library) can choose not to free the memory when the execution ends, your application should always provide proper freeing of allocated memory as a good practice. Why? Suppose you decide to extend that application or reuse the code; you'll quickly get in trouble if the code you had previously written hogs up the memory unnecessarily, after finishing its job. It's a recipe for memory leaks.
In general, I agree it's a bad practice.
For a one shot program, it can be OK, but it kinda shows like you don't what you are doing.
There is one solution to your problem though - use a custom allocator, which preallocates larger blocks from malloc, and then, after the computation phase, instead of freeing all the little blocks from you custom allocator, just release the larger preallocated blocks of memory. Then you don't need to keep track of all objects you need to deallocate and when. One guy who wrote a compiler too explained this approach many years ago to me, so if it worked for him, it will probably work for you as well.
Try to use automatic variables in methods so that they will be freed automatically from the stack.
The only useful reason to not free heap memory is to save a tiny amount of computational power used in the free() method. You might loose any advantage if page faults become an issue due to large virtual memory needs with small physical memory resources. Some factors to consider are:
If you are allocating a few huge chunks of memory or many small chunks.
Is the memory going to need to be locked into physical memory.
Are you absolutely positive the code and memory needed will fit into 2GB, for a Win32 system, including memory holes and padding.
That's generally a bad idea. You might encounter some cases where the program will try to consume more memory than it's available. Plus you risk being unable to start several copies of the program.
You can still do this if you don't care of the mentioned issues.
When you exit from a program, the memory allocated is automatically returned to the system. So you may not deallocate the memory you had allocated.
But deallocations becomes necessary when you go for bigger programs such as an OS or Embedded systems where the program is meant to run forever & hence a small memory leak can be malicious.
Hence it is always recommended to deallocate the memory you have allocated.