Know what references an object - c++

I have an object which implements reference counting mechanism. If the number of references to it becomes zero, the object is deleted.
I found that my object is never deleted, even when I am done with it. This is leading to memory overuse. All I have is the number of references to the object and I want to know the places which reference it so that I can write appropriate cleanup code.
Is there some way to accomplish this without having to grep in the source files? (That would be very cumbersome.)

A huge part of getting reference counting (refcounting) done correctly in C++ is to use Resource Allocation Is Initialization so it's much harder to accidentally leak references. However, this doesn't solve everything with refcounts.
That said, you can implement a debug feature in your refcounting which tracks what is holding references. You can then analyze this information when necessary, and remove it from release builds. (Use a configuration macro similar in purpose to how DEBUG macros are used.)
Exactly how you should implement it is going to depend on all your requirements, but there are two main ways to do this (with a brief overview of differences):
store the information on the referenced object itself
accessible from your debugger
easier to implement
output to a special trace file every time a reference is acquired or released
still available after the program exits (even abnormally)
possible to use while the program is running, without running in your debugger
can be used even in special release builds and sent back to you for analysis
The basic problem, of knowing what is referencing a given object, is hard to solve in general, and will require some work. Compare: can you tell me every person and business that knows your postal address or phone number?

One known weakness of reference counting is that it does not work when there are cyclic references, i.e. (in the simplest case) when one object has a reference to another object which in turn has a reference to the former object. This sounds like a non-issue, but in data structures such as binary trees with back-references to parent nodes, there you are.
If you don't explicitly provide for a list of "reverse" references in the referenced (un-freed) object, I don't see a way to figure out who is referencing it.
In the following suggestions, I assume that you don't want to modify your source, or if so, just a little.
You could of course walk the whole heap / freestore and search for the memory address of your un-freed object, but if its address turns up, it's not guaranteed to actually be a memory address reference; it could just as well be any random floating point number, of anything else. However, if the found value lies inside a block a memory that your application allocated for an object, chances improve a little that it's indeed a pointer to another object.
One possible improvement over this approach would be to modify the memory allocator you use -- e.g. your global operator new -- so that it keeps a list of all allocated memory blocks and their sizes. (In a complete implementation of this, operator delete would have remove the list entry for the freed block of memory.) Now, at the end of your program, you have a clue where to search for the un-freed object's memory address, since you have a list of memory blocks that your program actually used.
The above suggestions don't sound very reliable to me, to be honest; but maybe defining a custom global operator new and operator delete that does some logging / tracing goes in the right direction to solve your problem.

I am assuming you have some class with say addRef() and release() member functions, and you call these when you need to increase and decrease the reference count on each instance, and that the instances that cause problems are on the heap and referred to with raw pointers. The simplest fix may be to replace all pointers to the controlled object with boost::shared_ptr. This is surprisingly easy to do and should enable you to dispense with your own reference counting - you can just make those functions I mentioned do nothing. The main change required in your code is in the signatures of functions that pass or return your pointers. Other places to change are in initializer lists (if you initialize pointers to null) and if()-statements (if you compare pointers with null). The compiler will find all such places after you change the declarations of the pointers.
If you do not want to use the shared_ptr - maybe you want to keep the reference count intrinsic to the class - you can craft your own simple smart pointer just to deal with your class. Then use it to control the lifetime of your class objects. So for example, instead of pointer assignment being done with raw pointers and you "manually" calling addRef(), you just do an assignment of your smart pointer class which includes the addRef() automatically.

I don't think it's possible to do something without code change. With code change you can for example remember the pointers of the objects which increase reference count, and then see what pointer is left and examine it in the debugger. If possible - store more verbose information, such as object name.

I have created one for my needs. You can compare your code with this one and see what's missing. It's not perfect but it should work in most of the cases.
http://sites.google.com/site/grayasm/autopointer
when I use it I do:
util::autopointer<A> aptr=new A();
I never do it like this:
A* ptr = new A();
util::autopointer<A> aptr = ptr;
and later to start fulling around with ptr; That's not allowed.
Further I am using only aptr to refer to this object.
If I am wrong I have now the chance to get corrections. :) See ya!

Related

Scenarios where we force to use Pointers in c++

I had been in an interview and asked to give an example or scenario in CPP where we can't proceed without pointers, means we have to use pointer necessarily.
I have given an example of function returning array whose size is not known then we need to return pointer that is name of the array which is actually a pointer. But the interviewer said its internal to array give some other example.
So please help me with some other scenarios for the same.
If you are using a C Library which has a function that returns a pointer, you have to use pointers then, not a reference.
There are many other cases (explicitly dealing with memory, for instance) - but these two came to my mind first:
linked data-structures
How: You need to reference parts of your structure in multiple places. You use pointers for that, because containers (which also use pointers internally) do not cover all your data-structure needs. For example,
class BinTree {
BinTree *left, *right;
public:
// ...
};
Why there is no alternative: there are no generic tree implementations in the standard (not counting the sorting ones).
pointer-to-implementation pattern (pimpl)
How: Your public .hpp file has the methods, but only refers to internal state via an opaque Whatever *; and your internal implementation actually knows what that means and can access its fields. See:
Is the pImpl idiom really used in practice?
Why there is no alternative: if you provide your implementation in binary-only form, users of the header cannot access internals without decompiling/reverse engineering. It is a much stronger form of privacy.
Anyplace you would want to use a reference, but have to allow for null values
This is common in libraries where if you pass a non zero pointer, it will be set to the value
It is also a convention to have arguments to a function that will be changed to use a pointer, rather than a reference to emphasize that the value can be changed to the user.
Here are some cases:
Objects with large lifetime. You created some object in function. You need this object afterwards (not even copy of it).
But if you created it without pointers, on stack - after function would finish, this object would die. So you need to create this object using dynamic memory and return pointer to it.
Stack space is not enough. You need object which needs lot of memory, hence allocating it on the stack won't fit your needs, since stack has less space than heap usually. So you need to create the object again using dynamic memory on heap and return pointer to it.
You need reference semantics. You have structure which you passed to some function and you want the function to modify this structure, in this case you need to pass a pointer to this structure, otherwise you can't modify the original structure, since copy of it will be passed to the function if you don't use pointers.
Note: in the latter case, indeed using pointer is not necessary, since you can substitute it using reference.
PS. You can browse here for more scenarios, and decide in which cases are pointer usages necessary.
pointers are important for performance example of this are for functions. originally when you pass a value in a function it copies the value from the argument and stores to the parameter
but in pointers you can indirectly access them and do what you want

C++ implicit data sharing [duplicate]

I am building a game engine library in C++. A little while back I was using Qt to build an application and was rather fascinated with its use of Implicit Sharing. I am wondering if anybody could explain this technique in greater detail or could offer a simple example of this in action.
The key idea behind implicit sharing seems to go around using the more common term copy-on-write. The idea behind copy-on-write is to have each object serve as a wrapper around a pointer to the actual implementation. Each implementation object keeps track of the number of pointers into it. Whenever an operation is performed on the wrapper object, it's just forwarded to the implementation object, which does the actual work.
The advantage of this approach is that copying and destruction of these objects are cheap. To make a copy of the object, we just make a new instance of a wrapper, set its pointer to point at the implementation object, and then increment the count of the number of pointers to the object (this is sometimes called the reference count, by the way). Destruction is similar - we drop the reference count by one, then see if anyone else is pointing at the implementation. If not, we free its resources. Otherwise, we do nothing and just assume someone else will do the cleanup later.
The challenge in this approach is that it means that multiple different objects will all be pointing at the same implementation. This means that if someone ends up making a change to the implementation, every object referencing that implementation will see the changes - a very serious problem. To fix this, every time an operation is performed that might potentially change the implementation, the operation checks to see if any other objects also reference the implementation by seeing if the reference count is identically 1. If no other objects reference the object, then the operation can just go ahead - there's no possibility of the changes propagating. If there is at least one other object referencing the data, then the wrapper first makes a deep-copy of the implementation for itself and changes its pointer to point to the new object. Now we know there can't be any sharing, and the changes can be made without a hassle.
If you'd like to see some examples of this in action, take a look at lecture examples 15.0 and 16.0 from Stanford's introductory C++ programming course. It shows how to design an object to hold a list of words using this technique.
Hope this helps!

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

Tracking automatic variable lifetime?

This may not be possible, but I figured I'd ask...
Is there any way anyone can think of to track whether or not an automatic variable has been deleted without modifying the class of the variable itself? For example, consider this code:
const char* pStringBuffer;
{
std::string sString( "foo" );
pStringBuffer = sString.c_str();
}
Obviously, after the block, pStringBuffer is a dangling pointer which may or may not be valid. What I would like is a way to have a wrapper class which contains pStringBuffer (with a casting operator for const char*), but asserts that the variable it's referencing is still valid. By changing the type of the referenced variable I can certainly do it (boost shared_ptr/weak_ptr, for example), but I would like to be able to do it without imposing restrictions on the referenced type.
Some thoughts:
I'll probably need to change the assignment syntax to include the referenced variable (which is fine)
I might be able to look at the stack pointer to detect if my wrapper class was allocated "later" than the referenced class, but this seems hacky and not standard (C++ doesn't define stack behavior). It could work, though.
Thoughts / brilliant solutions?
In general, it's simply not possible from within C++ as pointers are too 'raw'. Also, looking to see if you were allocated later than the referenced class wouldn't work, because if you change the string, then the c_str pointer may well change.
In this particular case, you could check to see if the string is still returning the same value for c_str. If it is, you are probably still valid and if it isn't then you have an invalid pointer.
As a debugging tool, I would advise using an advanced memory tracking system, like valgrind (available only for linux I'm afraid. Similar programs exist for windows but I believe they all cost money. This program is the only reason I have linux installed on my mac). At the cost of much slower execution of your program, valgrind detects if you ever read from an invalid pointer. While it isn't perfect, I've found it detects many bugs, in particular ones of this type.
One technique you may find useful is to replace the new/delete operators with your own implementations which mark the memory pages used (allocated by your operator new) as non-accessible when released (deallocated by your operator delete). You will need to ensure that the memory pages are never re-used however so there will be limitations regarding run-time length due to memory exhaustion.
If your application accesses memory pages once they've been deallocated, as in your example above, the OS will trap the attempted access and raise an error. It's not exactly tracking per se as the application will be halted immediately but it does provide feedback :-)
This technique is applicable in narrow scenarios and won't catch all types of memory abuses but it can be useful. Hope that helps.
You could make a wrapper class that works in the simple case you mentioned. Maybe something like this:
X<const char*> pStringBuffer;
{
std::string sString( "foo" );
Trick trick(pStringBuffer).set(sString.c_str());
} // trick goes out of scope; its destructor marks pStringBuffer as invalid
But it doesn't help more complex cases:
X<const char*> pStringBuffer;
{
std::string sString( "foo" );
{
Trick trick(pStringBuffer).set(sString.c_str());
} // trick goes out of scope; its destructor marks pStringBuffer as invalid
}
Here, the invalidation happens too soon.
Mostly you should just write code which is as safe as possible (see: smart pointers), but no safer (see: performance, low-level interfaces), and use tools (valgrind, Purify) to make sure nothing slips through the cracks.
Given "pStringBuffer" is the only part of your example existing after sString goes out of scope, you need some change to it, or a substitute, that will reflect this. One simple mechanism is a kind of scope guard, with scope matching sString, that affects pStringBuffer when it is destroyed. For example, it could set pStringBuffer to NULL.
To do this without changing the class of "the variable" can only be done in so many ways:
Introduce a distinct variable in the same scope as sString (to reduce verbosity, you might consider a macro to generate the two things together). Not nice.
Wrap with a template ala X sString: it's arguable whether this is "modifying the type of the variable"... the alternative perspective is that sString becomes a wrapper around the same variable. It also suffers in that the best you can do is have templated constructor pass arguments to wrapped constructors up to some finite N arguments.
Neither of these help much as they rely on the developer remembering to use them.
A much better approach is to make "const char* pStringBuffer" simply "std::string some_meaningful_name", and assign to it as necessary. Given reference counting, it's not too expensive 99.99% of the time.

Find memory leaks caused by smart pointers

Does anybody know a "technique" to discover memory leaks caused by smart pointers? I am currently working on a large project written in C++ that heavily uses smart pointers with reference counting. Obviously we have some memory leaks caused by smart pointers, that are still referenced somewhere in the code, so that their memory does not get free'd. It's very hard to find the line of code with the "needless" reference, that causes the corresponding object not to be free'd (although it's not of use any longer).
I found some advice in the web, that proposed to collect call stacks of the increment/decrement operations of the reference counter. This gives me a good hint, which piece of code has caused the reference counter to get increased or decreased.
But what I need is some kind of algorithm that groups the corresponding "increase/decrease call stacks" together. After removing these pairs of call stacks, I hopefully have (at least) one "increase call stack" left over, that shows me the piece of code with the "needless" reference, that caused the corresponding object not to be freed. Now it will be no big deal to fix the leak!
But has anybody an idea for an "algorithm" that does the grouping?
Development takes place under Windows XP.
(I hope someone understood, what I tried to explain ...)
EDIt: I am talking about leaks caused by circular references.
Note that one source of leaks with reference-counting smart pointers are pointers with circular dependancies. For example, A have a smart pointer to B, and B have a smart pointer to A. Neither A nor B will be destroyed. You will have to find, and then break the dependancies.
If possible, use boost smart pointers, and use shared_ptr for pointers which are supposed to be owners of the data, and weak_ptr for pointers not supposed to call delete.
The way I do it is simply:
- on every AddRef() record call-stack,
- matching Release() removes it.
This way at the end of the program I'm left with AddRefs() without maching Releases. No need to match pairs,
If you can reproduce the leak in a deterministic way, a simple technique I often used is to number all your smart pointers in their order of construction (use a static counter in the constructor), and report this ID together with the leak. Then run the program again, and trigger a DebugBreak() when the smart pointer with the same ID gets constructed.
You should also consider this great tool : http://www.codeproject.com/KB/applications/visualleakdetector.aspx
To detect reference cycles you need to have a graph of all reference-counted objects. Such a graph is not easy to construct, but it can be done.
Create a global set<CRefCounted*> to register living reference-counted objects. This is easier if you have common AddRef() implementation - just add this pointer to the set when object's reference count goes from 0 to 1. Similarly, in Release() remove object from the set when it's reference count goes from 1 to 0.
Next, provide some way to get the set of referenced objects from each CRefCounted*. It could be a virtual set<CRefCounted*> CRefCounted::get_children() or whatever suits you. Now you have a way to walk the graph.
Finally, implement your favorite algorithm for cycle detection in a directed graph. Start the program, create some cycles and run cycle detector. Enjoy! :)
What I do is wrap the smart pointer with a class that takes FUNCTION and LINE parameters. Increment a count for that function and line every time the constructor is called, and decrement the count every time the destructor is called. then, write a function that dumps the function/line/count information. That tells you where all of your references were created
What I have done to solve this is to override the malloc/new & free/delete operators such that they keep track in a data structure as much as possible about the operation you are performing.
For example, when overriding malloc/new, You can create a record of the caller's address, the amount of bytes requested, the assigned pointer value returned and a sequence ID so all your records can be sequenced (I do not know if you deal with threads but you need to take that into account, too).
When writing the free/delete routines, I also keep track of the caller's address and the pointer info. Then I look backwards into the list and try to match the malloc/new counterpart using the pointer as my key. If I don't find it, raise a red flag.
If you can afford it, you can embed in your data the sequence ID to be absolutely sure who and when allocation call was made. The key here is to uniquely identify each transaction pair as much as we can.
Then you will have a third routine displaying your memory allocations/deallocation history, along with the functions invoking each transaction. (this can be accomplished by parsing the symbolic map out of your linker). You will know how much memory you will have allocated at any time and who did it.
If you don't have enough resources to perform these transactions (my typical case for 8-bit microcontrollers), you can output the same information via a serial or TCP link to another machine with enough resources.
It's not a matter of finding a leak. In case of smart-pointers it'll most probably direct to some generic place like CreateObject(), which is being called thousands of time. It's a matter of determining what place in the code didnt call Release() on ref-counted object.
Since you said that you're using Windows, you may be able to take advantage of Microsoft's user-mode dump heap utility, UMDH, which comes with the Debugging Tools for Windows. UMDH makes snapshots of your application's memory usage, recording the stack used for each allocation, and lets you compare multiple snapshots to see which calls to the allocator "leaked" memory. It also translates the stack traces to symbols for you using dbghelp.dll.
There's also another Microsoft tool called "LeakDiag" that supports more memory allocators than UMDH, but it's a bit more difficult to find and doesn't seem to be actively maintained. The latest version is at least five years old, if I recall correctly.
If I were you I would take the log and write a quick script to do something like the following (mine is in Ruby):
def allocation?(line)
# determine if this line is a log line indicating allocation/deallocation
end
def unique_stack(line)
# return a string that is equal for pairs of allocation/deallocation
end
allocations = []
file = File.new "the-log.log"
file.each_line { |line|
# custom function to determine if line is an alloc/dealloc
if allocation? line
# custom function to get unique stack trace where the return value
# is the same for a alloc and dealloc
allocations[allocations.length] = unique_stack line
end
}
allocations.sort!
# go through and remove pairs of allocations that equal,
# ideally 1 will be remaining....
index = 0
while index < allocations.size - 1
if allocations[index] == allocations[index + 1]
allocations.delete_at index
else
index = index + 1
end
end
allocations.each { |line|
puts line
}
This basically goes through the log and captures each allocation/deallocation and stores a unique value for each pair, then sort it and remove pairs that match, see what's left.
Update: Sorry for all the intermediary edits (I accidentally posted before I was done)
For Windows, check out:
MFC Memory Leak Detection
I am a big fan of Google's Heapchecker -- it will not catch all leaks, but it gets most of them. (Tip: Link it into all your unittests.)
First step could be to know what class is leaking.
Once you know it, you can find who is increasing the reference:
1. put a breakpoint on the constructor of class that is wrapped by shared_ptr.
2. step in with debugger inside shared_ptr when its increasing the reference count: look at variable pn->pi_->use_count_
Take the address of that variable by evaluating expression (something like this: &this->pn->pi_.use_count_), you will get an address
3. In visual studio debugger, go to Debug->New Breakpoint->New Data Breakpoint...
Enter the address of the variable
4. Run the program. Your program will stop every time when some point in the code is increasing and decreasing the reference counter.
Then you need to check if those are matching.