How to safely implement reusable scratch memory in C++?

How to safely implement reusable scratch memory in C++? - c++

It is very common that even pure functions require some additional scratch memory for their operations. If the size of this memory is known at compile time, we can allocate this memory on the stack with std::array or a C array. But the size often depends on the input, so we often resort to dynamic allocations on the heap through std::vector.
Consider a simple example of building a wrapper around some C api:
void addShapes(std::span<const Shape> shapes) {
std::vector<CShape> cShapes;
cShapes.reserve(shapes.size());
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes.data(), cShapes.size());
}
Let's say that we call this function repeatedly and that we identify that the overhead of std::vector memory allocations is significant, even with the call to reserve(). So what can we do?
We could declare the vector as static to reuse the allocated space between calls, but that comes with several problems. First, it is no longer thread safe, but that can be fixed easily enough by using thread_local instead. Second, the memory doesn't get released until the program or thread terminates. Let's say we are fine with that. And lastly, we have to remember to clear the vector every time, because it's not just the memory that will persist between function calls, but the data as well.
void addShapes(std::span<const Shape> shapes) {
thread_local std::vector<CShape> cShapes;
cShapes.clear();
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes.data(), cShapes.size());
}
This is the pattern I use whenever I would like to avoid the dynamic allocation on every call. The issue is, I don't think the semantics of this are very apparent if you aren't aware of the pattern. thread_local looks scary, you have to remember to clear the vector and even though the lifetime of the object now extends beyond the scope of the function, it is unsafe to return a reference to it, because another call to the same function would modify it.
My first attempt to make this a bit easier was to define a helper function like this:
template <typename T, typename Cleaner = void (T&)>
T& getScratch(Cleaner cleaner = [] (T& o) { o.clear(); }) {
thread_local T scratchObj;
cleaner(scratchObj);
return scratchObj;
}
void addShapes(std::span<const Shape> shapes) {
std::vector<CShape>& cShapes = getScratch<std::vector<CShape>>();
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes.data(), cShapes.size());
}
But of course, that creates a thread_local variable for each template instantiation of the getScratch function, rather than for each place the function is called. So if we asked for two vectors of the same type at once, we'd get two references to the same vector. Not good.
What would be a good way to implement this sort of a reusable memory safely and cleanly? Are there already existing solutions? Or should we not use thread local storage in this way and just use local allocations despite the performance benefits that reusing them brings: https://quick-bench.com/q/VgkPLveFL_K5wT5wX6NL1MRSE8c ?

To answer my own question, I came up with a solution that builds upon the last example. Rather than keeping only one object for each thread and type, lets keep a free list of them. Upon request, we either reuse an object from the free list or create a new one. The user keeps a RAII-style handle that returns the object into the free list when it leaves the scope. Since we still use thread_local, this is thread safe without any effort. We can wrap all this into a simple class:
template <typename T>
class Scratch {
public:
template <typename Cleaner = void (T&)>
explicit Scratch(Cleaner cleaner = [] (T& o) { o.clear(); }) : borrowedObj(acquire()) {
cleaner(borrowedObj);
}
T& operator*() {
return borrowedObj;
}
T* operator->() {
return &borrowedObj;
}
~Scratch() {
release(std::move(borrowedObj));
}
private:
static thread_local std::vector<T> freeList;
T borrowedObj;
static T acquire() {
if (!freeList.empty()) {
T obj = std::move(freeList.back());
freeList.pop_back();
return obj;
} else {
return T();
}
}
static void release(T&& obj) {
freeList.push_back(std::move(obj));
}
};
That can be used simply as:
void addShapes(std::span<const Shape> shapes) {
Scratch<std::vector<CShape>> cShapes;
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes->push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes->data(), cShapes->size());
}
You might want to extend this as needed, perhaps add a [] operator for convenience if it's going to be used with containers. You could keep its intended use to be a local object in a function and explicitly make it non-copyable and non-movable, or it could be turned into a general purpose handle like unique_ptr. But beware that the object must be destroyed by the same thread that created it.
In both cases it addresses my issues with a raw thread_local. The clear is implicit and returning a reference to the scratch object or its data is now obviously wrong. It still doesn't automatically free memory, which is what we want after all, but at least it's now easier to implement the functionality to free it on demand as needed.
In general, it should have lower memory usage than the raw thread_local method, too, since allocations of the same type can be reused across different call sites. But there is a scenario in which this behavior will result in a higher memory usage, too. Let's say we have a function that needs a std::vector<int> of size 10000. If we call this function and then ask for a vector of the same type, we will get the one with capacity 10000. If we then call the function again while holding this vector, it will have to create another one, resizing it to 10000 elements, too.
For those reasons I would recommend using it only where you don't expect to see large amounts of data, but rather want to avoid lots of small, but frequent and short-lived allocations.

static to reuse the allocated space between calls, but that comes with several problems. First, it is no longer thread safe, but that can be fixed easily enough by using thread_local instead. Second, the memory doesn't get released until the program or thread terminates.
Exactly. Because only the user of the function knows how and when he wants to call the function and when he wants to do it, only the user of the function should be the one responsible for reusing space if he wants to and for clearing it up, because the user knows if he is going to use it later or not. So add cache object to your function, where you cache the state to speed it up later.
void addShapes(std::span<const Shape> shapes, std::vector<CShape>& cache) {
cache.reserve(shapes.size());
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cache.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cache.data(), cache.size());
}
Or you could objectify it a bit, like:
class shapes {
std::vector<CShape> cache;
void add(std::span<const Shape> shapes) {
cache.reserve(shapes.size());
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cache.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cache.data(), cache.size());
}
void clear_cache() {
cache.clear();
}
};

Related

Creating template types without new/delete

I have a C++ Object class like this:
class Component {};
template <typename T>
concept component = std::is_base_of_v<Component, T>;
class Object
{
std::map<std::type_index, Component*> components;
public:
template<component T>
T* add()
{
if(components.find(typeid(T)) == components.cend())
{
T* value{new T{}};
components[typeid(T)] = static_cast<Component*>(value);
}
}
template<component T, typename... Args>
T* add(Args &&... args)
{
if(components.find(typeid(T)) == components.cend())
{
T* value{new T{std::forward<Args>(args)...}};
components[typeid(T)] = static_cast<Component*>(value);
}
}
};
Components that are added to class Object are deleted on another function that is not related to my question. AFAIK doing a lot of new/delete calls (heap allocations) hurt performance and supposedly there should be like 20/30 (or even more) Objectss with 3-10 Object::add on each one. I thought that I could just call T-s constructor without new, then to static_cast<Component*>(&value), but the Component added on the map is "invalid", meaning all T's members (ex. on a class with some int members, they are all equal to 0 instead of some custom value passed on its constructor). I am aware that value goes out of scope and the pointer on the map becomes a dangling one, but I can't find a way to instantiate T objects without calling new or without declaring them as static. Is there any way to do this?
EDIT: If I declare value as static, everything works as expected, so I guess its a lifetime issue related to value.

I suppose, you think of this as the alternative way of creating your objects
T value{std::forward<Args>(args)...};
components[typeid(T)] = static_cast<Component*>(&value);
This creates a local variable on the stack. Doing the assignment then, stores a pointer to a local variable in the map.
When you leave method add(), the local object will be destroyed, and you have a dangling pointer in the map. This, in turn, will bite you eventually.
As long as you want to store pointers, there's no way around new and delete. You can mitigate this a bit with some sort of memory pool.
If you may also store objects instead of pointers in the map, you could create the components in place with std::map::emplace. When you do this, you must also remove the call to delete and clean up the objects some other way.

Trying to avoid heap allocations before you've proven that they indeed hurt your programs' performance is not a good approach in my opinion. If that was the case, you should probably get rid of std::map in your code as well. That being said, if you really want to have no new/delete calls there, it can be done, but requires explicit enumeration of the Component types. Something like this could be what you are looking for:
#include <array>
#include <variant>
// Note that components no longer have to implement any specific interface, which might actually be useful.
struct Component1 {};
struct Component2 {};
// Component now is a variant enumerating all known component types.
using Component = std::variant<std::monostate, Component1, Component2>;
struct Object {
// Now there is no need for std::map, as we can use variant size
// and indexes to create and access a std::array, which avoids more
// dynamic allocations.
std::array<Component, std::variant_size_v<Component> - 1> components;
bool add (Component component) {
// components elements hold std::monostate by default, and holding std::monostate
// is indicated by returning index() == 0.
if (component.index() > 0 && components[component.index() - 1].index() == 0) {
components[component.index() - 1] = std::move(component);
return true;
}
return false;
}
};
Component enumerates all known component types, this allows to avoid dynamic allocation in Object, but can increase memory usage, as the memory used for single Object is roughly number_of_component_types * size_of_largest_component.

While the other answers made clear what the problem is I want to make a proposition how you could get around this in its entirety.
You know at compile time what possible types will be in the map at mosz, since you know which instantation of the add template where used. Hence you can get rid of the map and do all in a compile time.
template<component... Comps>
struct object{
std::tuple<std::optional<Comps>...> components;
template<component comp, class ... args>
void add(Args... &&args) {
std::get<std::optional<comp>>(components).emplace(std::forward<Args>(args)...);
}
}
Of course this forces you to collect all the possible objects when you create the object, but this not more info you have to have just more impractical.
You could add the following overload for add to make the errors easier to read
template<component T>
void add(...) {
static_assert(false, "Please add T to the componentlist of this object");
}

C++: Optimizing out destructor call

There is a little code example here:
struct Data {
};
struct Init {
Data *m_data;
Init() : m_data(new Data) { }
~Init() {
delete m_data;
}
};
class Object {
private:
int m_initType;
Data *m_data;
public:
Object(const Init &init) : m_initType(0), m_data(init.m_data) { }
Object(Init &&init) : m_initType(1), m_data(init.m_data) { init.m_data = nullptr; }
~Object() {
if (m_initType==1) {
delete m_data;
}
}
};
Object can be initialized two ways:
const Init &: this initialization just stores m_data as a pointer, m_data is not owned, so ~Object() doesn't have to do anything (in this case, m_data will be destroyed at ~Init())
Init &&: this initialization transfers ownership of m_data, Object becomes the owner of m_data, so ~Object() needs to destroy it
Now, there is a function:
void somefunction(Object object);
This function is called in callInitA and callInitB:
void callInitA() {
Init x;
somefunction(x); // calls the "const Init &" constructor
}
void callInitB() {
somefunction(Init()); // calls the "Init &&" constructor
}
Now, here's what I'd like to accomplish: in the callInitA case, I'd like to make the compiler to optimize away the destructor call of the resulting temporary Object (Object is used frequently, and I'd like to decrease code size).
However, the compiler doesn't optimize it away (tested with GCC and clang).
Object is designed so it doesn't have any functions which alter m_initType, so the compiler would be able to find out that if m_initType is set to 0 at construct time, then it won't change, so at the destructor it is still be 0 -> no need to call destructor at all, as it would do nothing.
Even, m_initType is an unnecessary member of Object: it is only needed at destruct time.
Do you have any design ideas how to accomplish this?
UPDATE: I mean that using some kind of c++ construct (helper class, etc.). C++ is a powerful language, maybe with some kind of c++ trickery this can be done.
(My original problem is more complex that this simplified one: Object can be initialized with other kind of Init structures, but all Objects constructors boils down to getting a "Data*" somehow)

void callInitA() {
Init x;
somefunction(x); // calls the "const Init &" constructor
}
The destruction of x cannot be optimized away, regardless of the contents of Init. Doing so would violate the design of the language.
It's not just a matter of whether Init contains resources or not. Init x, like all objects, will allocate space on the stack that later needs to be cleaned up, as an implicit (not part of code that you yourself write) part of the destructor. It's impossible to avoid.
If the intention is for x to be an object that somefunction can call without having to repeatedly create and delete references to x, you should be handling it like this:
void callInitA(Init & x) { //Or Init const& x
somefunction(x); // calls the "const Init &" constructor
}
A few other notes:
Make sure you implement the Rule of Five (sometimes known as Rule of Three) on any object that owns resources.
You might consider wrapping all pointers inside std::unique_ptr, as it doesn't seem like you need functionality beyond what std::unique_ptr offers.

Your m_initType actually distinguishes between two kinds of Objects - those which own their memory and those which don't. Also, you mention that actually there are many kinds of Objects which can be initialized with all sorts of inputs; so actually there are all sorts of Objects. That would suggest Object should better be some abstract base class. Now, that wouldn't speed anything up or avoid destructor calls, but it might make your design more reasonable. Or maybe Object could be an std::variant (new in C++17, you can read up on it).
But then, you say that temporary Objects are "used frequently". So perhaps you should go another way: In your example, suppose you had
template <bool Owning> class Object;
which you would then specialize for the non-owning case, with only a const Init& constructor and default destruction, and the owning case, with only an Init&& constructor (considering the two you mentioned) and a destructor which deletes. This would mean templatizing the code that uses Object, which many mean larger code size, as well as having to know what kind of Objects you pass in; but if would avoid the condition check if that really bugs you so much.
I'd like to decrease code size
I kind of doubt that you do. Are you writing code for an embedded system? In that case it's kind of strange you use lots of temporary Objects which are sort-of polymorphic.

Keeping track of (stack-allocated) objects

In a rather large application, I want to keep track of some statistics about objects of a certain class. In order to not degrade performance, I want the stats to be updated in a pull-configuration. Hence, I need to have a reference to each live object in some location. Is there an idiomatic way to:
Create, search, iterate such references
Manage it automatically (i.e. remove the reference upon destruction)
I am thinking in terms of a set of smart pointers here, but the memory management would be somewhat inverted: Instead of destroying the object when the smart pointer is destroyed, I'd want the smart pointer to be removed, when the object is destroyed. Ideally, I do not want to reinvent the wheel.
I could live with a delay in the removal of the pointers, I'd just need a way to invalidate them quickly.
edit: Because paddy asked for it: The reason for pull-based collection is that obtaining the information may be relatively costly. Pushing is obviously a clean solution but considered too expensive.

There is no special feature of the language that will allow you to do this. Sometimes object tracking is handled by rolling your own memory allocator, but this doesn't work easily on the stack.
But if you're using only the stack it actually makes your problem easier, assuming that the objects being tracked are on a single thread. C++ makes special guarantees about the order of construction and destruction on the stack. That is, the destruction order is exactly the reverse of construction order.
And so, you can leverage this to store a single pointer in each object, plus one static pointer to track the most recent one. Now you have an object stack represented as a linked list.
template <typename T>
class Trackable
{
public:
Trackable()
: previous( current() )
{
current() = this;
}
~Trackable()
{
current() = previous;
}
// External interface
static const T *head() const { return dynamic_cast<const T*>( current() ); }
const T *next() const { return dynamic_cast<const T*>( previous ); }
private:
static Trackable * & current()
{
static Trackable *ptr = nullptr;
return ptr;
}
Trackable *previous;
}
Example:
struct Foo : Trackable<Foo> {};
struct Bar : Trackable<Bar> {};
// :::
// Walk linked list of Foo objects currently on stack.
for( Foo *foo = Foo::head(); foo; foo = foo->next() )
{
// Do kung foo
}
Now, admittedly this is a very simplistic solution. In a large application you may have multiple stacks using your objects. You could handle stacks on multiple threads by making current() use thread_local semantics. Although you need some magic to make this work, as head() would need to point at a registry of threads, and that would require synchronization.
You definitely don't want to synchronize all stacks into a single list, because that will kill your program's performance scalability.
As for your pull-requirement, I presume it's a separate thread wanting to walk over the list. You would need a way to synchronize such that all new object construction or destruction is blocked inside Trackable<T> while the list is being iterated. Or similar.
But at least you could take this basic idea and extend it to your needs.
Remember, you can't use this simple list approach if you allocate your objects dynamically. For that you would need a bi-directional list.

The simplest approach is to have code inside each object so that it registers itself on instantiation and removes itself upon destruction. This code can easily be injected using a CRTP:
template <class T>
struct AutoRef {
static auto &all() {
static std::set<T*> theSet;
return theSet;
}
private:
friend T;
AutoRef() { all().insert(static_cast<T*>(this)); }
~AutoRef() { all().erase(static_cast<T*>(this)); }
};
Now a Foo class can inherit from AutoRef<Foo> to have its instances referenced inside AutoRef<Foo>::all().
See it live on Coliru

Parallel Command Pattern

I wanted to know of how to make my use of the command pattern thread-safe while maintaining performance. I have a simulation where I perform upwards of tens of billions of iterations; performance is critical.
In this simulation, I have a bunch of Moves that perform commands on objects in my simulation. The base class looks like this:
class Move
{
public:
virtual ~Move(){}
// Perform a move.
virtual void Perform(Object& obj) = 0;
// Undo a move.
virtual void Undo() = 0;
};
The reason I have the object passed in on Perform rather than the constructor, as is typical with the Command pattern, is that I cannot afford to instantiate a new Move object every iteration. Rather, a concrete implementation of Move would simply take Object, maintain a pointer to it and it's previous state for when it's needed. Here's an example of a concrete implementation:
class ConcreteMove : public Move
{
std::string _ns;
std::string _prev;
Object* _obj;
ConcreteMove(std::string newstring): _ns(newstring) {}
virtual void Perform(Object& obj) override
{
_obj= &obj;
_prev = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo()
{
_obj->SetIdentifier(_prev);
}
};
Unfortunately, what this has cost me is thread-safety. I want to parallelize my loop, where multiple iterators perform moves on a bunch of objects simultaneously. But obviously one instance of ConcreteMove cannot be reused because of how I implemented it.
I considered having Perform return a State object which can be passed into Undo, that way making the implementation thread-safe, since it is independent of the ConcereteMove state. However, the creation and destruction of such an object on each iteration is too costly.
Furthermore, the simulation has a vector of Moves because multiple moves can be performed every iteration stored in a MoveManager class which contains a vector of Move object pointers instantiated by the client. I set it up this way because the constructors of each particular Concrete moves take parameters (see above example).
I considered writing a copy operator for Move and MoveManager such that it can be duplicated amongst the threads, but I don't believe that is a proper answer because then the ownership of the Move objects falls on MoveManager rather than the client (who is only responsible for the first instance). Also, the same would be said for MoveManager and responsibility of maintaining that.
Update: Here's my MoveManager if it matters
class MoveManager
{
private:
std::vector<Move*> _moves;
public:
void PushMove(Move& move)
{
_moves.push_back(&move);
}
void PopMove()
{
_moves.pop_back();
}
// Select a move by index.
Move* SelectMove(int i)
{
return _moves[i];
}
// Get the number of moves.
int GetMoveCount()
{
return (int)_moves.size();
}
};
Clarification: All I need is one collection of Move objects per thread. They are re-used every iteration, where Perform is called on different objects each time.
Does anyone know how to solve this problem efficiently in a thread-safe manner?
Thanks!

What about the notion of a thread ID. Also, why not preconstruct the identifier strings and pass pointers to them?
class ConcreteMove : public Move
{
std::string *_ns;
std::vector<std::string *> _prev;
std::vector<Object *> _obj;
ConcreteMove(unsigned numthreads, std::string *newstring)
: _ns(newstring),
_prev(numthreads),
_obj(numthreads)
{
}
virtual void Perform(unsigned threadid, Object &obj) override
{
_obj[threadid] = &obj;
_prev[threadid] = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo(unsigned threadid)
{
_obj[threadid]->SetIdentifier(_prev[threadid]);
}
};

Impossible with stated requirements. Specifically,
Use the command pattern. "the command pattern is a behavioral design pattern in which an object is used to represent and encapsulate all the information needed to call a method at a later time." Thus you're storing data.
You "can't afford" to allocate memory.
You have "billions" of iterations, which means some large static allocation won't suffice.
You want to store data without any place to store it. Thus there is no answer. However, if you're willing to change your requirements, there are undoubtedly many ways to solve your problem (whatever it may be -- I couldn't tell from the description.)
I also can't estimate how many Move objects you need at once. If that number is reasonably low then a specialized allocation scheme might solve part of your problem. Likewise, if most of the Move objects are duplicates, a different specialized allocation scheme might help.
In general what you're asking can't be solved, but relax the requirements and it shouldn't be hard.

Your Move Manager should not contain a vector of pointers, it should be a vector of Move objects
std::vector<Move> _moves;
It seems you will have one Move Manager per thread, so no issue of multi-threading problems, set the vector capacity at max, and then apply perform and other actions on the move in the vector
No new allocation, and you will be reusing the move objects

Simplest way to count instances of an object

I would like to know the exact number of instances of certain objects allocated at certain point of execution. Mostly for hunting possible memory leaks(I mostly use RAII, almost no new, but still I could forget .clear() on vector before adding new elements or something similar). Ofc I could have an
atomic<int> cntMyObject;
that I -- in destructor, ++ increase in constructor, cpy constructor(I hope I covered everything :)).
But that is hardcoding for every class. And it is not simple do disable it in "Release" mode.
So is there any simple elegant way that can be easily disabled to count object instances?

Have a "counted object" class that does the proper reference counting in its constructor(s) and destructor, then derive your objects that you want to track from it. You can then use the curiously recurring template pattern to get distinct counts for any object types you wish to track.
// warning: pseudo code
template <class Obj>
class CountedObj
{
public:
CountedObj() {++total_;}
CountedObj(const CountedObj& obj) {++total_;}
~CountedObj() {--total_;}
static size_t OustandingObjects() {return total_;}
private:
static size_t total_;
};
class MyClass : private CountedObj<MyClass>
{};

you can apply this approach
#ifdef DEBUG
class ObjectCount {
static int count;
protected:
ObjectCount() {
count++;
}
public:
void static showCount() {
cout << count;
}
};
int ObjectCount::count = 0;
class Employee : public ObjectCount {
#else
class Employee {
#endif
public:
Employee(){}
Employee(const Employee & emp) {
}
};
at DEBUG mode, invoking of ObjectCount::showCount() method will return count of object(s) created.

Better off to use memory profiling & leak detection tools like Valgrind or Rational Purify.
If you can't and want to implement your own mechanism then,
You should overload the new and delete operators for your class and then implement the memory diagnostic in them.
Have a look at this C++ FAQ answer to know how to do that and what precautions you should take.

This is a sort of working example of something similar: http://www.almostinfinite.com/memtrack.html (just copy the code at the end of the page and put it in Memtrack.h, and then run TrackListMemoryUsage() or one of the other functions to see diagnostics)
It overrides operator new and does some arcane macro stuff to make it 'stamp' each allocation with information that allow it to count how many instances of an object and how much memory they're usingusing. It's not perfect though, the macros they use break down under certain conditions. If you decide to try this out make sure to include it after any standard headers.

Without knowing your code and your requirements, I see 2 reasonable options:
a) Use boost::shared_ptr. It has the atomic reference counts you suggested built in and takes care of your memory management (so that you'd never actually care to look at the count). Its reference count is available through the use_count() member.
b) If the implications of a), like dealing with pointers and having shared_ptrs everywhere, or possible performance overhead, are not acceptable for you, I'd suggest to simply use available tools for memory leak detection (e.g. Valgrind, see above) that'll report your loose objects at program exit. And there's no need to use intrusive helper classes for (anyway debug-only) tracking object counts, that just mess up your code, IMHO.

We used to have the solution of a base class with internal counter and derive from it, but we changed it all into boost::shared_ptr, it keeps a reference counter and it cleans up memory for you. The boost smart pointer family is quite useful:
boost smart pointers

My approach, which outputs leakage count to Debug Output (via the DebugPrint function implemented in our code base, replace that call with your own...)
#include <typeinfo>
#include <string.h>
class CountedObjImpl
{
public:
CountedObjImpl(const char* className) : mClassName(className) {}
~CountedObjImpl()
{
DebugPrint(_T("**##** Leakage count for %hs: %Iu\n"), mClassName.c_str(), mInstanceCount);
}
size_t& GetCounter()
{
return mInstanceCount;
}
private:
size_t mInstanceCount = 0;
std::string mClassName;
};
template <class Obj>
class CountedObj
{
public:
CountedObj() { GetCounter()++; }
CountedObj(const CountedObj& obj) { GetCounter()++; }
~CountedObj() { GetCounter()--; }
static size_t OustandingObjects() { return GetCounter(); }
private:
size_t& GetCounter()
{
static CountedObjImpl mCountedObjImpl(typeid(Obj).name());
return mCountedObjImpl.GetCounter();
}
};
Example usage:
class PostLoadInfoPostLoadCB : public PostLoadCallback, private CountedObj<PostLoadInfoPostLoadCB>

Adding counters to individual classes was discussed in some of the answers. However, it requires to pick the classes to have counted and modify them in one way or the other. The assumption in the following is, you are adding such counters to find bugs where more objects of certain classes are kept alive than expected.
To shortly recap some things mentioned already: For real memory leaks, certainly there is valgrind:memcheck and the leak sanitizers. However, for other scenarios without real leaks they do not help (uncleared vectors, map entries with keys never accessed, cycles of shared_ptrs, ...).
But, since this was not mentioned: In the valgrind tool suite there is also massif, which can provide you with the information about all pieces of allocated memory and where they were allocated. However, let's assume that valgrind:massif is also not an option for you, and you truly want instance counts.
For the purpose of occasional bug hunting - if you are open for some hackish solution and if the above options don't work - you might consider the following: Nowadays, many objects on the heap are effectively held by smart pointers. This could be the smart pointer classes from the standard library, or the smart pointer classes of the respective helper libraries you use. The trick is then the following (picking the shared_ptr as an example): You can get instance counters for many classes at once by patching the shared_ptr implementation, namely by adding instance counts to the shared_ptr class. Then, for some class Foo, the counter belonging to shared_ptr<Foo> will give you an indication of the number of instances of class Foo.
Certainly, it is not quite as accurate as adding the counters to the respective classes directly (instances referenced only by raw pointers are not counted), but possibly it is accurate enough for your case. And, certainly, this is not about changing the smart pointer classes permanently - only during the bug hunting. At least, the smart pointer implementations are not too complex, so patching them is simple.

This approach is much simpler than the rest of the solutions here.
Make a variable for the count and make it static. Increase that variable by +1 inside the constructor and decrease it by -1 inside the destructor.
Make sure you initialize the variable (it cannot be initialized inside the header because its static).
.h
// Pseudo code warning
class MyObject
{
MyObject();
~MyObject();
static int totalObjects;
}
.cpp
int MyObject::totalObjects = 0;
MyObject::MyObject()
{
++totalObjects;
}
MyObject::~MyObject()
{
--totalObjects;
}
For every new instance you make, the constructor is called and totalObjects automatically grows by 1.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js