Does anybody know a "technique" to discover memory leaks caused by smart pointers? I am currently working on a large project written in C++ that heavily uses smart pointers with reference counting. Obviously we have some memory leaks caused by smart pointers, that are still referenced somewhere in the code, so that their memory does not get free'd. It's very hard to find the line of code with the "needless" reference, that causes the corresponding object not to be free'd (although it's not of use any longer).
I found some advice in the web, that proposed to collect call stacks of the increment/decrement operations of the reference counter. This gives me a good hint, which piece of code has caused the reference counter to get increased or decreased.
But what I need is some kind of algorithm that groups the corresponding "increase/decrease call stacks" together. After removing these pairs of call stacks, I hopefully have (at least) one "increase call stack" left over, that shows me the piece of code with the "needless" reference, that caused the corresponding object not to be freed. Now it will be no big deal to fix the leak!
But has anybody an idea for an "algorithm" that does the grouping?
Development takes place under Windows XP.
(I hope someone understood, what I tried to explain ...)
EDIt: I am talking about leaks caused by circular references.
Note that one source of leaks with reference-counting smart pointers are pointers with circular dependancies. For example, A have a smart pointer to B, and B have a smart pointer to A. Neither A nor B will be destroyed. You will have to find, and then break the dependancies.
If possible, use boost smart pointers, and use shared_ptr for pointers which are supposed to be owners of the data, and weak_ptr for pointers not supposed to call delete.
The way I do it is simply:
- on every AddRef() record call-stack,
- matching Release() removes it.
This way at the end of the program I'm left with AddRefs() without maching Releases. No need to match pairs,
If you can reproduce the leak in a deterministic way, a simple technique I often used is to number all your smart pointers in their order of construction (use a static counter in the constructor), and report this ID together with the leak. Then run the program again, and trigger a DebugBreak() when the smart pointer with the same ID gets constructed.
You should also consider this great tool : http://www.codeproject.com/KB/applications/visualleakdetector.aspx
To detect reference cycles you need to have a graph of all reference-counted objects. Such a graph is not easy to construct, but it can be done.
Create a global set<CRefCounted*> to register living reference-counted objects. This is easier if you have common AddRef() implementation - just add this pointer to the set when object's reference count goes from 0 to 1. Similarly, in Release() remove object from the set when it's reference count goes from 1 to 0.
Next, provide some way to get the set of referenced objects from each CRefCounted*. It could be a virtual set<CRefCounted*> CRefCounted::get_children() or whatever suits you. Now you have a way to walk the graph.
Finally, implement your favorite algorithm for cycle detection in a directed graph. Start the program, create some cycles and run cycle detector. Enjoy! :)
What I do is wrap the smart pointer with a class that takes FUNCTION and LINE parameters. Increment a count for that function and line every time the constructor is called, and decrement the count every time the destructor is called. then, write a function that dumps the function/line/count information. That tells you where all of your references were created
What I have done to solve this is to override the malloc/new & free/delete operators such that they keep track in a data structure as much as possible about the operation you are performing.
For example, when overriding malloc/new, You can create a record of the caller's address, the amount of bytes requested, the assigned pointer value returned and a sequence ID so all your records can be sequenced (I do not know if you deal with threads but you need to take that into account, too).
When writing the free/delete routines, I also keep track of the caller's address and the pointer info. Then I look backwards into the list and try to match the malloc/new counterpart using the pointer as my key. If I don't find it, raise a red flag.
If you can afford it, you can embed in your data the sequence ID to be absolutely sure who and when allocation call was made. The key here is to uniquely identify each transaction pair as much as we can.
Then you will have a third routine displaying your memory allocations/deallocation history, along with the functions invoking each transaction. (this can be accomplished by parsing the symbolic map out of your linker). You will know how much memory you will have allocated at any time and who did it.
If you don't have enough resources to perform these transactions (my typical case for 8-bit microcontrollers), you can output the same information via a serial or TCP link to another machine with enough resources.
It's not a matter of finding a leak. In case of smart-pointers it'll most probably direct to some generic place like CreateObject(), which is being called thousands of time. It's a matter of determining what place in the code didnt call Release() on ref-counted object.
Since you said that you're using Windows, you may be able to take advantage of Microsoft's user-mode dump heap utility, UMDH, which comes with the Debugging Tools for Windows. UMDH makes snapshots of your application's memory usage, recording the stack used for each allocation, and lets you compare multiple snapshots to see which calls to the allocator "leaked" memory. It also translates the stack traces to symbols for you using dbghelp.dll.
There's also another Microsoft tool called "LeakDiag" that supports more memory allocators than UMDH, but it's a bit more difficult to find and doesn't seem to be actively maintained. The latest version is at least five years old, if I recall correctly.
If I were you I would take the log and write a quick script to do something like the following (mine is in Ruby):
def allocation?(line)
# determine if this line is a log line indicating allocation/deallocation
end
def unique_stack(line)
# return a string that is equal for pairs of allocation/deallocation
end
allocations = []
file = File.new "the-log.log"
file.each_line { |line|
# custom function to determine if line is an alloc/dealloc
if allocation? line
# custom function to get unique stack trace where the return value
# is the same for a alloc and dealloc
allocations[allocations.length] = unique_stack line
end
}
allocations.sort!
# go through and remove pairs of allocations that equal,
# ideally 1 will be remaining....
index = 0
while index < allocations.size - 1
if allocations[index] == allocations[index + 1]
allocations.delete_at index
else
index = index + 1
end
end
allocations.each { |line|
puts line
}
This basically goes through the log and captures each allocation/deallocation and stores a unique value for each pair, then sort it and remove pairs that match, see what's left.
Update: Sorry for all the intermediary edits (I accidentally posted before I was done)
For Windows, check out:
MFC Memory Leak Detection
I am a big fan of Google's Heapchecker -- it will not catch all leaks, but it gets most of them. (Tip: Link it into all your unittests.)
First step could be to know what class is leaking.
Once you know it, you can find who is increasing the reference:
1. put a breakpoint on the constructor of class that is wrapped by shared_ptr.
2. step in with debugger inside shared_ptr when its increasing the reference count: look at variable pn->pi_->use_count_
Take the address of that variable by evaluating expression (something like this: &this->pn->pi_.use_count_), you will get an address
3. In visual studio debugger, go to Debug->New Breakpoint->New Data Breakpoint...
Enter the address of the variable
4. Run the program. Your program will stop every time when some point in the code is increasing and decreasing the reference counter.
Then you need to check if those are matching.
Related
I'm trying to cut down on string copying (which has been measured to be a performance bottleneck in my application) by putting the strings into an unordered_set<string> and then passing around shared_ptr<string>'s. It's hard to know when all references to the string in the set have been removed, so I hope that the shared_ptr can help me. This is the untested code that illustrates how I hope to be able to write it:
unordered_set<string> string_pool;
:
shared_ptr<string> a = &(*string_pool.emplace("foo").first); // .first is an iterator
:
shared_ptr<string> b = &(*string_pool.emplace("foo").first);
In the above, only one instance of the string "foo" should be in string_pool; both a and b should point to it; and at such time that both a and b are destructed, "foo" should be erased from the string_pool.
The doc on emplace() suggests, but doesn't make clear to me, that pointer a can survive a rehashing caused by the allocation of pointer b. It also appears to guarantee that the second emplacement of "foo" will not cause any reallocation, because it is recognized as already present in the set.
Am I on the right track here? I need to keep the string_pool from growing endlessly, but there's no single point at which I can simply clear() it, nor is there any clear "owner" of the strings therein.
UPDATE 1
The history of this problem: this is a "traffic cop" app that reads from servers, parcels out data to other servers, receives their answers, parcels those out to others, receives, and finally assembles and returns a summary answer. It includes an application protocol stack that receives TCP messages, parses them into string scalar values, which the application then assembles into other TCP messages, sends, receives, etc. I originally wrote it using strings, vectors<string>s, and string references, and valgrind reported a "high number" of string constructors (even compiled with -O3), and high CPU usage that was focused in library routines related to strings. I was asked to investigate ways to reduce string copying, and designed a "memref" class (char* and length pointing into an input buffer) that could be copied around in lieu of the string itself. Circumstances then arose requiring the input buffer to be reused while memrefs into it still needed to be valid, so I paid to copy each buffer substring into an internment area (an unordered_set<string>), and have the memref point there instead. Then I discovered it was difficult and inconvenient to find a spot in the process when the internment area could be cleared all at once (to prevent its growing without bound), and I began trying to redesign the internment area so that when all memrefs to an interned string were gone, the string would be removed from the pool. Hence the shared_ptr.
As I mentioned in my comment to #Peter R, I was even less comfortable with move semantics and containers and references than I am now, and it's quite possible I didn't code my simple, string-based solution to use all that C++11 can offer. By now I seem to have been traveling in a great circle.
The unordered_set owns the strings. When it goes out of scope your strings will be freed.
My first impression is that your approach does not sound like it will result in a positive experience with respect to maintainability or testability. Certainly this
shared_ptr<string> a = &(*string_pool.emplace("foo").first);
is wrong. You already have an owner for the string in your unordered_set. Trying to put another ownership layer on it with the shared_ptr is not going to work. You could have an unordered_set<shared_ptr<string>> but even that I would not recommend.
Without understanding the rest of your code base it's hard to recommend a 'solution' here. The combination of move semantics and passing const string& should handle most requirements at a low level. If there are still performance issues they may then be architectural. Certainly using only shared_ptr<string> may solve your life-time issues if there is no natural owner of the string, and they are cheap to copy, just don't use the unordered_set<string> in that case.
You've gone a bit wayward. shared_ptrs conceptually form a set of shared owners of an object... the first shared_ptr should be created with make_shared, then the other copies are created automatically (with "value" semantics) when that value is copied. What you're attempting to do is flawed in that:
the string_pool itself stores strings that don't partake in the shared ownership, nor is there any way in which the string_pool is notified or updated when the shared_ptr's reference count hits 0
the share_ptrs have no relationship to each other (you're giving both of them raw pointers rather than copying one to make the other)
For your usage, you need to decide whether you'll pro-actively erase the string from the string_pool at some point in time, otherwise you may want to put a weak_ptr in the string_pool and check whether the shared string actually still exists before using it. You can google weak_ptr if you're not already familiar with the concept.
Separately, it's worth checking whether your current observation that string copying is a performance problem is due to inefficient coding. For example:
are your string variables passed around by reference where possible, e.g.:const std::string& function parameters whenever you won't change them
do you use static const strings rather than continual run-time recreation from string literals/character arrays?
are you compiling with a sensible level of optimisation (e.g. -O2, /O2)
are there places where a keeping a reference to a string, and offsets within the string would massively improve performance and reduce memory usage (the referenced string must be kept around as long as it's used even indirectly) - it is very common to implement a "string_ref" or similar class for this is medium- and larger-sized C++ projects
I have one working solution for parallelization . However, execution time is very very slightly improved by parallelization. I thinks it comes from the fact I new and delete some variable in the loop. I would like it to be stack created, BUT Command class is abstract, and must remain abstract. What can I do to work around that? How to improve time spent on these very long loops???
#pragma omp parallel for reduction(+:functionEvaluation)
for (int i=rowStart;i<rowEnd+1;i++)
{
Model model_(varModel_);
model_.addVariable("i", i);
model_.addVariable("j", 1);
Command* command_ = formulaCommand->duplicate(&model_);
functionEvaluation += command_->execute().toDouble();
delete command_;
}
The problem may also lie elsewhere! advice welcome!!
thanks and regards.
You may want to play with the private or firstprivate clauses.
Your #pragma would include ...private(varModel, formulaCommand)... or similar, and then each thread would have its own copy of those variables. Using firstprivate will ensure that the thread-specific variable has the initial value copied in, instead of being uninitialized. This would remove the need to new and delete, assuming you can just modify instances for each loop iteration.
This may or may not work as desired, as you haven't provided a lot of detail.
I think you should try to use a mechanism to reuse allocated memory. You probably don't know the size nor the alignment of the Command object coming, so a "big enough" buffer will not suffice. I'd make your duplicate method take two arguments, the second being the reference to a boost::pool. If the pool object is big enough just construct the new Command object inside it, if it's not expand it, and construct into it. boost::pool will handle alignment issues for you, so you don't have to think about it. This way, you'll have to do dynamic memory allocation only a few times per thread.
By the way, it's in general not good practice to return raw pointers in C++. Use smart pointers instead, it's simply better that way without any buts... Well, there's a but in this case :), since with my suggestion you'd be doing some under the hood custom memory management. Still, the bestest practice would be to write a custom smart pointer which handles your special case gracefully, without risking the user to mess up. You could of course do like everyone else and make en exception in this case :) (My advice still holds under normal circumstances though, f.x. in the question above, you should normally use something like boost::scoped_ptr)
My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.
What is the best way to measure the memory used by a C++ program or a block in a C++ program. The measurement code should thereby be part of the code and it should not be measured from outside. I know of the difficulty of that task, so it does not have to be 100% accurate but at least give me a good impression of the memory usage.
Measuring at the block level will be difficult (at best) unless you're willing to explicitly add instrumentation directly to the code under test.
I wouldn't start with overloads of new and delete at the class level to try to do this. Instead, I'd use overloads of ::operator new and ::operator delete. That's basically the tip of the funnel (so to speak) -- all the other dynamic memory management eventually comes down to calling those (and most do so fairly directly). As such, they will generally do the most to tell you about dynamic memory usage of the program as a whole.
The main time you'd need to deal with overloads of new and delete for an individual class would be if they're already overloaded so they're managing a separate pool, and you care about how much of that pool is in use at a given time. In that case, you'd (just about) need to add instrumentation directly to them, to get something like a high-water mark on their memory usage during a given interval.
Overloading new and delete is probably the way to go. Not only on a specific class but maybe more general. But you also need to mark where the new and delete is made. And have some kind of start/reset or something to mark now you enter this block, and now you exit the block. And then you need to keep the history to be able to monitor it after it happened.
Class level override of the new and delete operators are what you want.
As others have pointed out, you can overload new and delete to measure how much heap was allocated. To do the same for the stack, if you feel adventurous, you'll have to do some ASM. In GCC in x86-64 to get the stack position:
int64_t x = 0;
asm("movq %%rsp, %0;" : "=r" (x) );
This will put in x the address of the stack pointer. Put this in a few places around your code and compare the values before/after entering the block.
Please note that this might need some work to get what you want because of how/when the compiler might allocate memory; it is not as intuitive or trivial as it sounds.
I have an object which implements reference counting mechanism. If the number of references to it becomes zero, the object is deleted.
I found that my object is never deleted, even when I am done with it. This is leading to memory overuse. All I have is the number of references to the object and I want to know the places which reference it so that I can write appropriate cleanup code.
Is there some way to accomplish this without having to grep in the source files? (That would be very cumbersome.)
A huge part of getting reference counting (refcounting) done correctly in C++ is to use Resource Allocation Is Initialization so it's much harder to accidentally leak references. However, this doesn't solve everything with refcounts.
That said, you can implement a debug feature in your refcounting which tracks what is holding references. You can then analyze this information when necessary, and remove it from release builds. (Use a configuration macro similar in purpose to how DEBUG macros are used.)
Exactly how you should implement it is going to depend on all your requirements, but there are two main ways to do this (with a brief overview of differences):
store the information on the referenced object itself
accessible from your debugger
easier to implement
output to a special trace file every time a reference is acquired or released
still available after the program exits (even abnormally)
possible to use while the program is running, without running in your debugger
can be used even in special release builds and sent back to you for analysis
The basic problem, of knowing what is referencing a given object, is hard to solve in general, and will require some work. Compare: can you tell me every person and business that knows your postal address or phone number?
One known weakness of reference counting is that it does not work when there are cyclic references, i.e. (in the simplest case) when one object has a reference to another object which in turn has a reference to the former object. This sounds like a non-issue, but in data structures such as binary trees with back-references to parent nodes, there you are.
If you don't explicitly provide for a list of "reverse" references in the referenced (un-freed) object, I don't see a way to figure out who is referencing it.
In the following suggestions, I assume that you don't want to modify your source, or if so, just a little.
You could of course walk the whole heap / freestore and search for the memory address of your un-freed object, but if its address turns up, it's not guaranteed to actually be a memory address reference; it could just as well be any random floating point number, of anything else. However, if the found value lies inside a block a memory that your application allocated for an object, chances improve a little that it's indeed a pointer to another object.
One possible improvement over this approach would be to modify the memory allocator you use -- e.g. your global operator new -- so that it keeps a list of all allocated memory blocks and their sizes. (In a complete implementation of this, operator delete would have remove the list entry for the freed block of memory.) Now, at the end of your program, you have a clue where to search for the un-freed object's memory address, since you have a list of memory blocks that your program actually used.
The above suggestions don't sound very reliable to me, to be honest; but maybe defining a custom global operator new and operator delete that does some logging / tracing goes in the right direction to solve your problem.
I am assuming you have some class with say addRef() and release() member functions, and you call these when you need to increase and decrease the reference count on each instance, and that the instances that cause problems are on the heap and referred to with raw pointers. The simplest fix may be to replace all pointers to the controlled object with boost::shared_ptr. This is surprisingly easy to do and should enable you to dispense with your own reference counting - you can just make those functions I mentioned do nothing. The main change required in your code is in the signatures of functions that pass or return your pointers. Other places to change are in initializer lists (if you initialize pointers to null) and if()-statements (if you compare pointers with null). The compiler will find all such places after you change the declarations of the pointers.
If you do not want to use the shared_ptr - maybe you want to keep the reference count intrinsic to the class - you can craft your own simple smart pointer just to deal with your class. Then use it to control the lifetime of your class objects. So for example, instead of pointer assignment being done with raw pointers and you "manually" calling addRef(), you just do an assignment of your smart pointer class which includes the addRef() automatically.
I don't think it's possible to do something without code change. With code change you can for example remember the pointers of the objects which increase reference count, and then see what pointer is left and examine it in the debugger. If possible - store more verbose information, such as object name.
I have created one for my needs. You can compare your code with this one and see what's missing. It's not perfect but it should work in most of the cases.
http://sites.google.com/site/grayasm/autopointer
when I use it I do:
util::autopointer<A> aptr=new A();
I never do it like this:
A* ptr = new A();
util::autopointer<A> aptr = ptr;
and later to start fulling around with ptr; That's not allowed.
Further I am using only aptr to refer to this object.
If I am wrong I have now the chance to get corrections. :) See ya!