Using a shared_ptr<string> into an unordered_set<string>

Using a shared_ptr<string> into an unordered_set<string> - c++

I'm trying to cut down on string copying (which has been measured to be a performance bottleneck in my application) by putting the strings into an unordered_set<string> and then passing around shared_ptr<string>'s. It's hard to know when all references to the string in the set have been removed, so I hope that the shared_ptr can help me. This is the untested code that illustrates how I hope to be able to write it:
unordered_set<string> string_pool;
:
shared_ptr<string> a = &(*string_pool.emplace("foo").first); // .first is an iterator
:
shared_ptr<string> b = &(*string_pool.emplace("foo").first);
In the above, only one instance of the string "foo" should be in string_pool; both a and b should point to it; and at such time that both a and b are destructed, "foo" should be erased from the string_pool.
The doc on emplace() suggests, but doesn't make clear to me, that pointer a can survive a rehashing caused by the allocation of pointer b. It also appears to guarantee that the second emplacement of "foo" will not cause any reallocation, because it is recognized as already present in the set.
Am I on the right track here? I need to keep the string_pool from growing endlessly, but there's no single point at which I can simply clear() it, nor is there any clear "owner" of the strings therein.
UPDATE 1
The history of this problem: this is a "traffic cop" app that reads from servers, parcels out data to other servers, receives their answers, parcels those out to others, receives, and finally assembles and returns a summary answer. It includes an application protocol stack that receives TCP messages, parses them into string scalar values, which the application then assembles into other TCP messages, sends, receives, etc. I originally wrote it using strings, vectors<string>s, and string references, and valgrind reported a "high number" of string constructors (even compiled with -O3), and high CPU usage that was focused in library routines related to strings. I was asked to investigate ways to reduce string copying, and designed a "memref" class (char* and length pointing into an input buffer) that could be copied around in lieu of the string itself. Circumstances then arose requiring the input buffer to be reused while memrefs into it still needed to be valid, so I paid to copy each buffer substring into an internment area (an unordered_set<string>), and have the memref point there instead. Then I discovered it was difficult and inconvenient to find a spot in the process when the internment area could be cleared all at once (to prevent its growing without bound), and I began trying to redesign the internment area so that when all memrefs to an interned string were gone, the string would be removed from the pool. Hence the shared_ptr.
As I mentioned in my comment to #Peter R, I was even less comfortable with move semantics and containers and references than I am now, and it's quite possible I didn't code my simple, string-based solution to use all that C++11 can offer. By now I seem to have been traveling in a great circle.

The unordered_set owns the strings. When it goes out of scope your strings will be freed.
My first impression is that your approach does not sound like it will result in a positive experience with respect to maintainability or testability. Certainly this
shared_ptr<string> a = &(*string_pool.emplace("foo").first);
is wrong. You already have an owner for the string in your unordered_set. Trying to put another ownership layer on it with the shared_ptr is not going to work. You could have an unordered_set<shared_ptr<string>> but even that I would not recommend.
Without understanding the rest of your code base it's hard to recommend a 'solution' here. The combination of move semantics and passing const string& should handle most requirements at a low level. If there are still performance issues they may then be architectural. Certainly using only shared_ptr<string> may solve your life-time issues if there is no natural owner of the string, and they are cheap to copy, just don't use the unordered_set<string> in that case.

You've gone a bit wayward. shared_ptrs conceptually form a set of shared owners of an object... the first shared_ptr should be created with make_shared, then the other copies are created automatically (with "value" semantics) when that value is copied. What you're attempting to do is flawed in that:
the string_pool itself stores strings that don't partake in the shared ownership, nor is there any way in which the string_pool is notified or updated when the shared_ptr's reference count hits 0
the share_ptrs have no relationship to each other (you're giving both of them raw pointers rather than copying one to make the other)
For your usage, you need to decide whether you'll pro-actively erase the string from the string_pool at some point in time, otherwise you may want to put a weak_ptr in the string_pool and check whether the shared string actually still exists before using it. You can google weak_ptr if you're not already familiar with the concept.
Separately, it's worth checking whether your current observation that string copying is a performance problem is due to inefficient coding. For example:
are your string variables passed around by reference where possible, e.g.:const std::string& function parameters whenever you won't change them
do you use static const strings rather than continual run-time recreation from string literals/character arrays?
are you compiling with a sensible level of optimisation (e.g. -O2, /O2)
are there places where a keeping a reference to a string, and offsets within the string would massively improve performance and reduce memory usage (the referenced string must be kept around as long as it's used even indirectly) - it is very common to implement a "string_ref" or similar class for this is medium- and larger-sized C++ projects

Related

Returning Large Objects by Value (move semantic) or by pointer?

I have read several articles and answers in SO (in particular this), but they do not provide the full answer to my question. They tend to focus on special cases where the move semantic is as fast as copying a pointer, but that is not always the case.
For example consider this class:
struct Big {
map<string, unsigned> m;
vector<unsigned> v;
set<string> s;
};
And this function:
Big foo();
If foo returns by value and the copy cannot be optimized via RVO, the compiler will apply the move semantic, which implies 3 moves, one for each class member. If the class members were more than 3, then I would have even more operations. If foo returned the Big object by pointer (smart pointer maybe) it would always be 1 operation.
To make things even more interesting, Big objects have a non local life span: they are kept in some data structures for the duration of the application. So you might expect the Big objects to be moved around multiple times during their life and the cost of 3 operations (move semantic) vs 1 operation (pointer) keeps burdening the performance long after the objects were returned by foo.
Given that background information, here are my questions:
1 - First of all I would like to be sure about my understanding of the move semantic performance: is it true that in the example above moving Big object is slower than copying pointers?
2 - Assuming the move semantic is indeed slower, would I accept to return Big objects by pointer or are there better way to achieve both speed and nice API (I consider returning by value a better API)?
[EDIT]
Bottom line: I like to return by value, because if I introduce one single pointer in the API then they spread everywhere. So I would like to avoid them. However I want to be sure about the performance impact. C++ is all about speed and I cannot accept blindly the move semantic without understanding the performance hit.

they are kept in some data structures for the duration of the application. So you might expect the Big objects to be moved around multiple times during their life
I don't agree with this conclusion. Elements of most data structures tend to be quite stable in memory. Exception are unreserved std::vector and std::string, and other structures based on vector such as flat maps.
If foo returns by value and the copy cannot be optimized via RVO
So, implement foo in a way that can be optimised via RVO. Preferably in such way that a non-move is guaranteed in C++17. This is fast, and a convenient API, so is what you should prefer.
1 - First of all I would like to be sure about my understanding of the move semantic performance: is it true that in the example above moving Big object is slower than copying pointers?
It is true. Moving Big is relatively slower than copying a pointer. They are both rather light operations in absolute terms through (depending on context).
When you think about returning a pointer to a newly created object, you must also think about the lifetime of the object and where it is stored. If you're thinking of allocating it dynamically, and returning a pointer to the dynamic object, then you must consider that the dynamic allocation may be much more expensive than the few moves of the member objects. And furthermore, all of this may be insignificant in relation to all of the allocations that the std::map and other containers will do, so none of this deliberation may end up mattering in the end.
In conclusion: If you want to know what is faster, then measure. If one implementation measures significantly faster, then that implementation is probably the one that is faster (depending on how good you are at measuring).

Passing objects by value, performance and unique_ptr

I am working with a C++ library that was not written by me.
Currently, I am trying to improve the library a bit by removing a lot of circular dependencies.
The libray communicates over the network and has some message classes, which are created when reading data received from the network.
Currently, it works like this:
The message is parsed and a std::unique_ptr<Message> is being created.
The Message is passed to the parent object with std::move(msg)
I removed the circular dependency from the network parser to the parent object and used a signal instead, which emits a std::shared_ptr<Message>.
I am wondering if it would be a bad idea to just pass the Message by value instead of a shared_ptr. Would this decrease the performance?
The library passes many objects with unique_ptr like in this case. Is this good practice?
Thanks in advance
EDIT:
The Message
consists of three unsigned int and a std::map<string, string> which will not hold more than a few strings

How big is the message, and are the big parts movable?
If your message is, say, an int and two std::strings, the move constructor should be pretty fast. Whether that degrades your performance depends on how much you do it.
However, beware of copying the message if it's large.
Based on the additional info: pass it by value. If you can, use a single std::function instead of a boost::signal, because that allows you to move the message.
Pass by value is, as you say, preferable due to its readability, and should always be your first choice. Only if that shows to be slow in profiling should you look for other solutions.

In general, non-const objects wrapped into shared_pointers don't fit well in inherently asynchronous and threaded applications. The application can get much more complex due to preventing potential data races.
It would be probably better to pass objects by value or unique_pointer, unless those objects are const.
Regarding performance: you can utilize move semantics.

Consequences of only using stack in C++

Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?

Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?

Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.

There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).

I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.

Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.

First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.

I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision

I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.

As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.

The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....

The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.

Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.

Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.

My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.

If you want to know whether or not the use count is 1, use the unique() member function.

I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.

I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.

I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

Find memory leaks caused by smart pointers

Does anybody know a "technique" to discover memory leaks caused by smart pointers? I am currently working on a large project written in C++ that heavily uses smart pointers with reference counting. Obviously we have some memory leaks caused by smart pointers, that are still referenced somewhere in the code, so that their memory does not get free'd. It's very hard to find the line of code with the "needless" reference, that causes the corresponding object not to be free'd (although it's not of use any longer).
I found some advice in the web, that proposed to collect call stacks of the increment/decrement operations of the reference counter. This gives me a good hint, which piece of code has caused the reference counter to get increased or decreased.
But what I need is some kind of algorithm that groups the corresponding "increase/decrease call stacks" together. After removing these pairs of call stacks, I hopefully have (at least) one "increase call stack" left over, that shows me the piece of code with the "needless" reference, that caused the corresponding object not to be freed. Now it will be no big deal to fix the leak!
But has anybody an idea for an "algorithm" that does the grouping?
Development takes place under Windows XP.
(I hope someone understood, what I tried to explain ...)
EDIt: I am talking about leaks caused by circular references.

Note that one source of leaks with reference-counting smart pointers are pointers with circular dependancies. For example, A have a smart pointer to B, and B have a smart pointer to A. Neither A nor B will be destroyed. You will have to find, and then break the dependancies.
If possible, use boost smart pointers, and use shared_ptr for pointers which are supposed to be owners of the data, and weak_ptr for pointers not supposed to call delete.

The way I do it is simply:
- on every AddRef() record call-stack,
- matching Release() removes it.
This way at the end of the program I'm left with AddRefs() without maching Releases. No need to match pairs,

If you can reproduce the leak in a deterministic way, a simple technique I often used is to number all your smart pointers in their order of construction (use a static counter in the constructor), and report this ID together with the leak. Then run the program again, and trigger a DebugBreak() when the smart pointer with the same ID gets constructed.
You should also consider this great tool : http://www.codeproject.com/KB/applications/visualleakdetector.aspx

To detect reference cycles you need to have a graph of all reference-counted objects. Such a graph is not easy to construct, but it can be done.
Create a global set<CRefCounted*> to register living reference-counted objects. This is easier if you have common AddRef() implementation - just add this pointer to the set when object's reference count goes from 0 to 1. Similarly, in Release() remove object from the set when it's reference count goes from 1 to 0.
Next, provide some way to get the set of referenced objects from each CRefCounted*. It could be a virtual set<CRefCounted*> CRefCounted::get_children() or whatever suits you. Now you have a way to walk the graph.
Finally, implement your favorite algorithm for cycle detection in a directed graph. Start the program, create some cycles and run cycle detector. Enjoy! :)

What I do is wrap the smart pointer with a class that takes FUNCTION and LINE parameters. Increment a count for that function and line every time the constructor is called, and decrement the count every time the destructor is called. then, write a function that dumps the function/line/count information. That tells you where all of your references were created

What I have done to solve this is to override the malloc/new & free/delete operators such that they keep track in a data structure as much as possible about the operation you are performing.
For example, when overriding malloc/new, You can create a record of the caller's address, the amount of bytes requested, the assigned pointer value returned and a sequence ID so all your records can be sequenced (I do not know if you deal with threads but you need to take that into account, too).
When writing the free/delete routines, I also keep track of the caller's address and the pointer info. Then I look backwards into the list and try to match the malloc/new counterpart using the pointer as my key. If I don't find it, raise a red flag.
If you can afford it, you can embed in your data the sequence ID to be absolutely sure who and when allocation call was made. The key here is to uniquely identify each transaction pair as much as we can.
Then you will have a third routine displaying your memory allocations/deallocation history, along with the functions invoking each transaction. (this can be accomplished by parsing the symbolic map out of your linker). You will know how much memory you will have allocated at any time and who did it.
If you don't have enough resources to perform these transactions (my typical case for 8-bit microcontrollers), you can output the same information via a serial or TCP link to another machine with enough resources.

It's not a matter of finding a leak. In case of smart-pointers it'll most probably direct to some generic place like CreateObject(), which is being called thousands of time. It's a matter of determining what place in the code didnt call Release() on ref-counted object.

Since you said that you're using Windows, you may be able to take advantage of Microsoft's user-mode dump heap utility, UMDH, which comes with the Debugging Tools for Windows. UMDH makes snapshots of your application's memory usage, recording the stack used for each allocation, and lets you compare multiple snapshots to see which calls to the allocator "leaked" memory. It also translates the stack traces to symbols for you using dbghelp.dll.
There's also another Microsoft tool called "LeakDiag" that supports more memory allocators than UMDH, but it's a bit more difficult to find and doesn't seem to be actively maintained. The latest version is at least five years old, if I recall correctly.

If I were you I would take the log and write a quick script to do something like the following (mine is in Ruby):
def allocation?(line)
# determine if this line is a log line indicating allocation/deallocation
end
def unique_stack(line)
# return a string that is equal for pairs of allocation/deallocation
end
allocations = []
file = File.new "the-log.log"
file.each_line { |line|
# custom function to determine if line is an alloc/dealloc
if allocation? line
# custom function to get unique stack trace where the return value
# is the same for a alloc and dealloc
allocations[allocations.length] = unique_stack line
end
}
allocations.sort!
# go through and remove pairs of allocations that equal,
# ideally 1 will be remaining....
index = 0
while index < allocations.size - 1
if allocations[index] == allocations[index + 1]
allocations.delete_at index
else
index = index + 1
end
end
allocations.each { |line|
puts line
}
This basically goes through the log and captures each allocation/deallocation and stores a unique value for each pair, then sort it and remove pairs that match, see what's left.
Update: Sorry for all the intermediary edits (I accidentally posted before I was done)

For Windows, check out:
MFC Memory Leak Detection

I am a big fan of Google's Heapchecker -- it will not catch all leaks, but it gets most of them. (Tip: Link it into all your unittests.)

First step could be to know what class is leaking.
Once you know it, you can find who is increasing the reference:
1. put a breakpoint on the constructor of class that is wrapped by shared_ptr.
2. step in with debugger inside shared_ptr when its increasing the reference count: look at variable pn->pi_->use_count_
Take the address of that variable by evaluating expression (something like this: &this->pn->pi_.use_count_), you will get an address
3. In visual studio debugger, go to Debug->New Breakpoint->New Data Breakpoint...
Enter the address of the variable
4. Run the program. Your program will stop every time when some point in the code is increasing and decreasing the reference counter.
Then you need to check if those are matching.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js