Parallel Command Pattern

Parallel Command Pattern - c++

I wanted to know of how to make my use of the command pattern thread-safe while maintaining performance. I have a simulation where I perform upwards of tens of billions of iterations; performance is critical.
In this simulation, I have a bunch of Moves that perform commands on objects in my simulation. The base class looks like this:
class Move
{
public:
virtual ~Move(){}
// Perform a move.
virtual void Perform(Object& obj) = 0;
// Undo a move.
virtual void Undo() = 0;
};
The reason I have the object passed in on Perform rather than the constructor, as is typical with the Command pattern, is that I cannot afford to instantiate a new Move object every iteration. Rather, a concrete implementation of Move would simply take Object, maintain a pointer to it and it's previous state for when it's needed. Here's an example of a concrete implementation:
class ConcreteMove : public Move
{
std::string _ns;
std::string _prev;
Object* _obj;
ConcreteMove(std::string newstring): _ns(newstring) {}
virtual void Perform(Object& obj) override
{
_obj= &obj;
_prev = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo()
{
_obj->SetIdentifier(_prev);
}
};
Unfortunately, what this has cost me is thread-safety. I want to parallelize my loop, where multiple iterators perform moves on a bunch of objects simultaneously. But obviously one instance of ConcreteMove cannot be reused because of how I implemented it.
I considered having Perform return a State object which can be passed into Undo, that way making the implementation thread-safe, since it is independent of the ConcereteMove state. However, the creation and destruction of such an object on each iteration is too costly.
Furthermore, the simulation has a vector of Moves because multiple moves can be performed every iteration stored in a MoveManager class which contains a vector of Move object pointers instantiated by the client. I set it up this way because the constructors of each particular Concrete moves take parameters (see above example).
I considered writing a copy operator for Move and MoveManager such that it can be duplicated amongst the threads, but I don't believe that is a proper answer because then the ownership of the Move objects falls on MoveManager rather than the client (who is only responsible for the first instance). Also, the same would be said for MoveManager and responsibility of maintaining that.
Update: Here's my MoveManager if it matters
class MoveManager
{
private:
std::vector<Move*> _moves;
public:
void PushMove(Move& move)
{
_moves.push_back(&move);
}
void PopMove()
{
_moves.pop_back();
}
// Select a move by index.
Move* SelectMove(int i)
{
return _moves[i];
}
// Get the number of moves.
int GetMoveCount()
{
return (int)_moves.size();
}
};
Clarification: All I need is one collection of Move objects per thread. They are re-used every iteration, where Perform is called on different objects each time.
Does anyone know how to solve this problem efficiently in a thread-safe manner?
Thanks!

What about the notion of a thread ID. Also, why not preconstruct the identifier strings and pass pointers to them?
class ConcreteMove : public Move
{
std::string *_ns;
std::vector<std::string *> _prev;
std::vector<Object *> _obj;
ConcreteMove(unsigned numthreads, std::string *newstring)
: _ns(newstring),
_prev(numthreads),
_obj(numthreads)
{
}
virtual void Perform(unsigned threadid, Object &obj) override
{
_obj[threadid] = &obj;
_prev[threadid] = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo(unsigned threadid)
{
_obj[threadid]->SetIdentifier(_prev[threadid]);
}
};

Impossible with stated requirements. Specifically,
Use the command pattern. "the command pattern is a behavioral design pattern in which an object is used to represent and encapsulate all the information needed to call a method at a later time." Thus you're storing data.
You "can't afford" to allocate memory.
You have "billions" of iterations, which means some large static allocation won't suffice.
You want to store data without any place to store it. Thus there is no answer. However, if you're willing to change your requirements, there are undoubtedly many ways to solve your problem (whatever it may be -- I couldn't tell from the description.)
I also can't estimate how many Move objects you need at once. If that number is reasonably low then a specialized allocation scheme might solve part of your problem. Likewise, if most of the Move objects are duplicates, a different specialized allocation scheme might help.
In general what you're asking can't be solved, but relax the requirements and it shouldn't be hard.

Your Move Manager should not contain a vector of pointers, it should be a vector of Move objects
std::vector<Move> _moves;
It seems you will have one Move Manager per thread, so no issue of multi-threading problems, set the vector capacity at max, and then apply perform and other actions on the move in the vector
No new allocation, and you will be reusing the move objects

Related

How to safely implement reusable scratch memory in C++?

It is very common that even pure functions require some additional scratch memory for their operations. If the size of this memory is known at compile time, we can allocate this memory on the stack with std::array or a C array. But the size often depends on the input, so we often resort to dynamic allocations on the heap through std::vector.
Consider a simple example of building a wrapper around some C api:
void addShapes(std::span<const Shape> shapes) {
std::vector<CShape> cShapes;
cShapes.reserve(shapes.size());
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes.data(), cShapes.size());
}
Let's say that we call this function repeatedly and that we identify that the overhead of std::vector memory allocations is significant, even with the call to reserve(). So what can we do?
We could declare the vector as static to reuse the allocated space between calls, but that comes with several problems. First, it is no longer thread safe, but that can be fixed easily enough by using thread_local instead. Second, the memory doesn't get released until the program or thread terminates. Let's say we are fine with that. And lastly, we have to remember to clear the vector every time, because it's not just the memory that will persist between function calls, but the data as well.
void addShapes(std::span<const Shape> shapes) {
thread_local std::vector<CShape> cShapes;
cShapes.clear();
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes.data(), cShapes.size());
}
This is the pattern I use whenever I would like to avoid the dynamic allocation on every call. The issue is, I don't think the semantics of this are very apparent if you aren't aware of the pattern. thread_local looks scary, you have to remember to clear the vector and even though the lifetime of the object now extends beyond the scope of the function, it is unsafe to return a reference to it, because another call to the same function would modify it.
My first attempt to make this a bit easier was to define a helper function like this:
template <typename T, typename Cleaner = void (T&)>
T& getScratch(Cleaner cleaner = [] (T& o) { o.clear(); }) {
thread_local T scratchObj;
cleaner(scratchObj);
return scratchObj;
}
void addShapes(std::span<const Shape> shapes) {
std::vector<CShape>& cShapes = getScratch<std::vector<CShape>>();
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes.data(), cShapes.size());
}
But of course, that creates a thread_local variable for each template instantiation of the getScratch function, rather than for each place the function is called. So if we asked for two vectors of the same type at once, we'd get two references to the same vector. Not good.
What would be a good way to implement this sort of a reusable memory safely and cleanly? Are there already existing solutions? Or should we not use thread local storage in this way and just use local allocations despite the performance benefits that reusing them brings: https://quick-bench.com/q/VgkPLveFL_K5wT5wX6NL1MRSE8c ?

To answer my own question, I came up with a solution that builds upon the last example. Rather than keeping only one object for each thread and type, lets keep a free list of them. Upon request, we either reuse an object from the free list or create a new one. The user keeps a RAII-style handle that returns the object into the free list when it leaves the scope. Since we still use thread_local, this is thread safe without any effort. We can wrap all this into a simple class:
template <typename T>
class Scratch {
public:
template <typename Cleaner = void (T&)>
explicit Scratch(Cleaner cleaner = [] (T& o) { o.clear(); }) : borrowedObj(acquire()) {
cleaner(borrowedObj);
}
T& operator*() {
return borrowedObj;
}
T* operator->() {
return &borrowedObj;
}
~Scratch() {
release(std::move(borrowedObj));
}
private:
static thread_local std::vector<T> freeList;
T borrowedObj;
static T acquire() {
if (!freeList.empty()) {
T obj = std::move(freeList.back());
freeList.pop_back();
return obj;
} else {
return T();
}
}
static void release(T&& obj) {
freeList.push_back(std::move(obj));
}
};
That can be used simply as:
void addShapes(std::span<const Shape> shapes) {
Scratch<std::vector<CShape>> cShapes;
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cShapes->push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cShapes->data(), cShapes->size());
}
You might want to extend this as needed, perhaps add a [] operator for convenience if it's going to be used with containers. You could keep its intended use to be a local object in a function and explicitly make it non-copyable and non-movable, or it could be turned into a general purpose handle like unique_ptr. But beware that the object must be destroyed by the same thread that created it.
In both cases it addresses my issues with a raw thread_local. The clear is implicit and returning a reference to the scratch object or its data is now obviously wrong. It still doesn't automatically free memory, which is what we want after all, but at least it's now easier to implement the functionality to free it on demand as needed.
In general, it should have lower memory usage than the raw thread_local method, too, since allocations of the same type can be reused across different call sites. But there is a scenario in which this behavior will result in a higher memory usage, too. Let's say we have a function that needs a std::vector<int> of size 10000. If we call this function and then ask for a vector of the same type, we will get the one with capacity 10000. If we then call the function again while holding this vector, it will have to create another one, resizing it to 10000 elements, too.
For those reasons I would recommend using it only where you don't expect to see large amounts of data, but rather want to avoid lots of small, but frequent and short-lived allocations.

static to reuse the allocated space between calls, but that comes with several problems. First, it is no longer thread safe, but that can be fixed easily enough by using thread_local instead. Second, the memory doesn't get released until the program or thread terminates.
Exactly. Because only the user of the function knows how and when he wants to call the function and when he wants to do it, only the user of the function should be the one responsible for reusing space if he wants to and for clearing it up, because the user knows if he is going to use it later or not. So add cache object to your function, where you cache the state to speed it up later.
void addShapes(std::span<const Shape> shapes, std::vector<CShape>& cache) {
cache.reserve(shapes.size());
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cache.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cache.data(), cache.size());
}
Or you could objectify it a bit, like:
class shapes {
std::vector<CShape> cache;
void add(std::span<const Shape> shapes) {
cache.reserve(shapes.size());
// Convert shapes to a form accepted by the API
for (const Shape& shape : shapes) {
cache.push_back(static_cast<CShape>(shape));
}
cAddShapes(context, cache.data(), cache.size());
}
void clear_cache() {
cache.clear();
}
};

Pointers to stack-allocated object and move-contruction

Note: This is a complete re-wording of a question I posted a while ago. If you find they are duplicate, please close the other one.
My problem is quite general but it seems that it could be explained more easily based on a concrete simple example.
So imagine I want to simulate the electricity consumption in an office throught time. Let's assume that there is only a light and heating.
class Simulation {
public:
Simulation(Time const& t, double lightMaxPower, double heatingMaxPower)
: time(t)
, light(&time,lightMaxPower)
, heating(&time,heatingMaxPower) {}
private:
Time time; // Note : stack-allocated
Light light;
Heating heating;
};
class Light {
public:
Light(Time const* time, double lightMaxPower)
: timePtr(time)
, lightMaxPower(lightMaxPower) {}
bool isOn() const {
if (timePtr->isNight()) {
return true;
} else {
return false;
}
}
double power() const {
if (isOn()) {
return lightMaxPower;
} else {
return 0.;
}
private:
Time const* timePtr; // Note : non-owning pointer
double lightMaxPower;
};
// Same kind of stuff for Heating
The important points are:
1.Time cannot be moved to be a data member Light or Heating since its change does not come from any of these classes.
2.Time does not have to be explicitly passed as a parameter to Light. Indeed, there could be a reference to Light in any part of the program that does not want to provide Time as a parameter.
class SimulationBuilder {
public:
Simulation build() {
Time time("2015/01/01-12:34:56");
double lightMaxPower = 42.;
double heatingMaxPower = 43.;
return Simulation(time,lightMaxPower,heatingMaxPower);
}
};
int main() {
SimulationBuilder builder;
auto simulation = builder.build();
WeaklyRelatedPartOfTheProgram lightConsumptionReport;
lightConsumptionReport.editReport((simulation.getLight())); // No need to supply Time information
return 0;
}
Now, Simulation is perfectly find as long as it is not copy/move constructed. Because if it is, Light will also get copy/move constructed and by default, the pointer to Time will be pointing to the Time in the old Simulation instance which is copied/moved from.
However, Simulation actually is copy/move constructed in between the return statement in SimulationBuilder::build() and the object creation in main()
Now there are a number of ways to solve the problem:
1: Rely on copy elision. In this case (and in my real code) copy elision seems to be allowed by the standard. But not required, and as a matter of fact, it is not elided by clang -O3. To be more precise, clang elides Simulation copy, but does call the move ctor for Light. Also notice that relying on an implementation-dependent time is not robust.
2: Define a move-ctor in Simulation:
Simulation::Simulation(Simulation&& old)
: time(old.time)
, light(old.light)
, heating(old.heating)
{
light.resetTimePtr(&time);
heating.resetTimePtr(&time);
}
Light::resetTimePtr(Time const* t) {
timePtr = t;
}
This does work but the big problem here is that it weakens encapsulation: now Simulation has to know that Light needs more info during a move. In this simplified example, this is not too bad, but imagine timePtr is not directly in Light but in one of its sub-sub-sub-member. Then I would have to write
Simulation::Simulation(Simulation&& old)
: time(old.time)
, subStruct(old.subStruct)
{
subStruct.getSubMember().getSubMember().getSubMember().resetTimePtr(&time);
}
which completly breaks encapsulation and the law of Demeter. Even when delegating functions I find it horrible.
3: Use some kind of observer pattern where Time is being observed by Light and sends a message when it is copy/move constructed so that Light change its pointer when receiving the message.
I must confess I am lazy to write a complete example of it but I think it will be so heavy I am not sure the added complexity worth it.
4: Use a owning pointer in Simulation:
class Simulation {
private:
std::unique_ptr<Time> const time; // Note : heap-allocated
};
Now when Simulation is moved, the Time memory is not, so the pointer in Light is not invalidated. Actually this is what almost every other object-oriented language does since all objects are created on the heap.
For now, I favor this solution, but still think it is not perfect: heap-allocation could by slow, but more importantly it simply does not seems idiomatic. I've heard B. Stroustrup say that you should not use a pointer when not needed and needed meant more or less polymorphic.
5: Construct Simulation in-place, without it being return by SimulationBuilder (Then copy/move ctor/assignment in Simulation can then all be deleted). For instance
class Simulation {
public:
Simulation(SimulationBuilder const& builder) {
builder.build(*this);
}
private:
Time time; // Note : stack-allocated
Light light;
Heating heating;
...
};
class SimulationBuilder {
public:
void build(Simulation& simulation) {
simulation.time("2015/01/01-12:34:56");
simulation.lightMaxPower = 42.;
simulation.heatingMaxPower = 43.;
}
};
Now my questions are the following:
1: What solution would you use? Do you think of another one?
2: Do you think there is something wrong in the original design? What would you do to fix it?
3: Did you ever came across this kind of pattern? I find it rather common throughout my code. Generally though, this is not a problem since Time is indeed polymorphic and hence heap-allocated.
4: Coming back to the root of the problem, which is "There is no need to move, I only want an unmovable object to be created in-place, but the compiler won't allow me to do so" why is there no simple solution in C++ and is there a solution in another language ?

If all classes need access to the same const (and therefore immutable) feature, you have (at least) 2 options to make the code clean and maintainable:
store copies of the SharedFeature rather than references - this is reasonable if SharedFeature is both small and stateless.
store a std::shared_ptr<const SharedFeature> rather than a reference to const - this works in all cases, with almost no additional expense. std::shared_ptr is of course fully move-aware.

EDIT: Due to the class naming and ordering I completely missed the fact that your two classes are unrelated.
It's really hard to help you with such an abstract concept as "feature" but I'm going to completely change my thought here. I would suggest moving the feature's ownership into MySubStruct. Now copying and moving will work fine because only MySubStruct knows about it and is able to make the correct copy. Now MyClass needs to be able to operate on feature. So, where needed just add delegation to MySubStruct: subStruct.do_something_with_feature(params);.
If your feature needs data members from both sub struct AND MyClass then I think you split responsibilities incorrectly and need to reconsider all the way back to the split of MyClass and MySubStruct.
Original answer based on the assumption that MySubStruct was a child of MyClass:
I believe the correct answer is to remove featurePtr from the child and provide a proper protected interface to feature in the parent (note: I really do mean an abstract interface here, not just a get_feature() function). Then the parent doesn't have to know about children and the child can operator on the feature as needed.
To be completely clear: MySubStruct will not know that the parent class even HAS a member called feature. For example, perhaps something like this:

1: What solution would you use? Do you think of another one?
Why not apply a few design patterns? I see uses for a factory and a singleton in your solution. There are probably a few others that we could claim work but I am way more experienced with applying a Factory during a simulation than anything else.
Simulation turns into a Singleton.
The build() function in SimulationBuilder gets moved into Simulation. The constructor for Simulation gets privatized, and your main call becomes Simulation * builder = Simulation::build();. Simulation also gets a new variable static Simulation * _Instance;, and we make a few changes to Simulation::build()
class Simulation
{
public:
static Simulation * build()
{
// Note: If you don't need a singleton, just create and return a pointer here.
if(_Instance == nullptr)
{
Time time("2015/01/01-12:34:56");
double lightMaxPower = 42.;
double heatingMaxPower = 43.;
_Instance = new Simulation(time, lightMaxPower, heatingMaxPower);
}
return _Instance;
}
private:
static Simulation * _Instance;
}
Simulation * Simulation::_Instance = nullptr;
Light and Heating objects get provided as a Factory.
This thought is worthless if you are only going to have 2 objects inside of simulation. But, if you are going to be managing 1...N objects and multiple types, then I would strongly encourage you utilize a factory, and a dynamic list (vector, deque, etc.). You would need to make Light and Heating inherit from a common template, set things up to register those classes with the factory, set the factory so that it is templated and an instance of the factory can only create objects of a specific template, and initialize the factory for the Simulation object. Basically the factory would look something like this
template<class T>
class Factory
{
// I made this a singleton so you don't have to worry about
// which instance of the factory creates which product.
static std::shared_ptr<Factory<T>> _Instance;
// This map just needs a key, and a pointer to a constructor function.
std::map<std::string, std::function< T * (void)>> m_Objects;
public:
~Factory() {}
static std::shared_ptr<Factory<T>> CreateFactory()
{
// Hey create a singleton factory here. Good Luck.
return _Instance;
}
// This will register a predefined function definition.
template<typename P>
void Register(std::string name)
{
m_Objects[name] = [](void) -> P * return new P(); };
}
// This could be tweaked to register an unknown function definition,
void Register(std::string name, std::function<T * (void)> constructor)
{
m_Objects[name] = constructor;
}
std::shared_ptr<T> GetProduct(std::string name)
{
auto it = m_Objects.find(name);
if(it != m_Objects.end())
{
return std::shared_ptr<T>(it->second());
}
return nullptr;
}
}
// We need to instantiate the singleton instance for this type.
template<class T>
std::shared_ptr<Factory<T>> Factory<T>::_Instance = nullptr;
That may seem a bit weird, but it really makes creating templated objects fun. You can register them by doing this:
// To load a product we would call it like this:
pFactory.get()->Register<Light>("Light");
pFactory.get()->Register<Heating>("Heating");
And then when you need to actually get an object all you need is:
std::shared_ptr<Light> light = pFactory.get()->GetProduct("Light");
2: Do you think there is something wrong in the original design? What would you do to fix it?
Yeah I certainly do, but unfortunately I don't have much to expound upon from my answer to item 1.
If I were to fix anything I start "fixing" by seeing what a Profiling session tells me. If I was worried about things like time to allocate memory, then profiling is the best way to get an accurate idea about how long to expect allocations to take. All the theories in the world cannot make up for profiling when you are not reusing known profiled implementations.
Also, if I were truly worried about the speed of things like memory allocation then I would take into consideration things from my profiling run such as the number of times that an object is created vs the objects time of life, hopefully my profiling session told me this. An object like your Simulation class should be created at most 1 time for a given simulation run while an object like Light might be created 0..N times during the run. So, I would focus on how creating Light objects affected my performance.
3: Did you ever came across this kind of pattern? I find it rather common throughout my code. Generally though, this is not a problem since Time is indeed polymorphic and hence heap-allocated.
I do not typically see simulation objects maintain a way to see the current state change variables such as Time. I typically see an object maintain its state, and only update when a time change occurs through a function such as SetState(Time & t){...}. If you think about it, that kind of makes sense. A simulation is a way to see the change of objects given a particular parameter(s), and the parameter should not be required for the object to report its state. Thus, an object should only update by single function and maintain its state between function calls.
// This little snippet is to give you an example of how update the state.
// I guess you could also do a publish subscribe for the SetState function.
class Light
{
public:
Light(double maxPower)
: currPower(0.0)
, maxPower(maxPower)
{}
void SetState(const Time & t)
{
currentPower = t.isNight() ? maxPower : 0.0;
}
double GetCurrentPower() const
{
return currentPower;
}
private:
double currentPower;
double maxPower;
}
Keeping an object from performing its own check on Time helps alleviate multithreaded stresses such as "How do I handle the case where the time changes and invalidates my on/off state after I read the time, but before I returned my state?"
4: Coming back to the root of the problem, which is "There is no need to move, I only want an unmovable object to be created in-place, but the compiler won't allow me to do so" why is there no simple solution in C++ and is there a solution in another language ?
If you only ever want 1 object to be created you can use the Singleton Design Pattern. When correctly implemented a Singleton is guaranteed to only ever make 1 instance of an object, even in a multithreaded scenario.

In the comment to your second solution, you're saying that it weakens the encapsulation, because the Simulation has to know that Light needs more information during move. I think it is the other way around. The Light needs to know that is being used in a context where the provided reference to the Time object may become invalid during Light's lifetime. Which is not good, because it forces a design of Light based on how it is being used, not based on what it should do.
Passing a reference between two objects creates (or should create) a contract between them. When passing a reference to a function, that reference should be valid until the function being called returns. When passing a reference to an object constructor, that reference should be valid throughout the lifetime of the constructed object. The object passing a reference is responsible for its validity. If you don't follow this, you may create very hard to trace relationships between the user of the reference and an entity maintaining lifetime of the referenced object. In your example, the Simulation is unable to uphold the contract between it and the Light object it creates when it is moved. Since the lifetime of the Light object is tightly coupled to the lifetime of the Simulation object, there are 3 ways to resolve this:
1) your solution number 2)
2) pass a reference to the Time object to constructor of the Simulation. If you assume the contract between the Simulation and the outer entity passing the reference is reliable, so will be the contract between Simulation and Light. You may, however, consider the Time object to be internal detail of the Simulation object and thus you would break encapsulation.
3) make the Simulation unmovable. Since C++(11/14) does not have any "in-place constructor methods" (don't know how good a term that is), you cannot create an in-place object by returning it from some function. Copy/Move-elision is currently an optimalization, not a feature. For this, you can either use your solution 5) or use lambdas, like this:
class SimulationBuilder {
public:
template< typename SimOp >
void withNewSimulation(const SimOp& simOp) {
Time time("2015/01/01-12:34:56");
double lightMaxPower = 42.;
double heatingMaxPower = 43.;
Simulation simulation(time,lightMaxPower,heatingMaxPower);
simOp( simulation );
}
};
int main() {
SimulationBuilder builder;
builder.withNewSimulation([] (Simulation& simulation) {
WeaklyRelatedPartOfTheProgram lightConsumptionReport;
lightConsumptionReport.editReport((simulation.getLight())); // No need to supply Time information
}
return 0;
}
If none fits your needs, then you either have to reevaluate your needs (might be a good option, too) or use heap allocation and pointers somewhere.

Efficient factory functions without pointers (including smart pointers) or copies?

Suppose I have some kind of factory function which creates objects that are largely used for a very short timespan only (possibly just for the duration of the scope of the function where this factory function is called).
Like this:
foo factory(some_parameter fancy_parameter)
{
return foo(fancy_parameter);
}
//this gets called all the time... very often
void every_frame_function()
{
for(int i=0; i<big_number; ++i)
do_something_with(factory(some_parameter(i));
} //don't need those foos out here!
Is there a way to implement such factories without having the user care about memory management (by returning a pointer), without having to deal with smartpointer overhead and without returning a foo object that has to be hardcopied?
Maybe I'm asking for a goose that lays golden eggs here, but maybe there are some move semantics to be used here (I just don't know how).

Use std::unique_ptr<T>, it has zero overhead compared with a raw pointer.
Or simply return by value, but then you cannot do subtype polymorphism.

The compiler will in many cases optimize out the copy, depending on what you do with it. For example:
type create();
void test {
type local = create(); // Copy will be elided
type const & ref = create(); // Extra copy will be elided
local = create();
}
This is assuming that create is implemented in a way that (N)RVO can be applied, or else there would be another internal copy before returning from create.
Now, what is interesting is determining whether this is the correct approach or not which we cannot do since you are not providing enough information about the problem. For example, how costly are the objects returned by the factory to create, whether they hold resources, or whether you could reuse the object...

I think that's a problem about the deep copy and the shallow copy. Pointers are similar to a shallow copy.
Details for the two copies: http://en.wikipedia.org/wiki/Object_copy#In_C.2B.2B
Without using pointers in another function, you have to use the deep copy in your code. If the object is very big, deep copy will cost a lot of time, cause the program very slow. In your example about 2-dimensional vectors, the difference between the two copies is not obvious. But if the object is as large as a complicated dialog, the effect is very clear.

Well, you can't do it with a factory function, but you can do it with a class:
class Factory {
public:
Base &create_obj(some_param p)
{
d.p =p;
return d;
}
private:
Derived d;
};
void every_frame_function() {
Factory f;
for(int i=0;i<bignumber;i++)
{
do_something_with(f.create_obj(some_parameter(i)));
}
}

Simplest way to count instances of an object

I would like to know the exact number of instances of certain objects allocated at certain point of execution. Mostly for hunting possible memory leaks(I mostly use RAII, almost no new, but still I could forget .clear() on vector before adding new elements or something similar). Ofc I could have an
atomic<int> cntMyObject;
that I -- in destructor, ++ increase in constructor, cpy constructor(I hope I covered everything :)).
But that is hardcoding for every class. And it is not simple do disable it in "Release" mode.
So is there any simple elegant way that can be easily disabled to count object instances?

Have a "counted object" class that does the proper reference counting in its constructor(s) and destructor, then derive your objects that you want to track from it. You can then use the curiously recurring template pattern to get distinct counts for any object types you wish to track.
// warning: pseudo code
template <class Obj>
class CountedObj
{
public:
CountedObj() {++total_;}
CountedObj(const CountedObj& obj) {++total_;}
~CountedObj() {--total_;}
static size_t OustandingObjects() {return total_;}
private:
static size_t total_;
};
class MyClass : private CountedObj<MyClass>
{};

you can apply this approach
#ifdef DEBUG
class ObjectCount {
static int count;
protected:
ObjectCount() {
count++;
}
public:
void static showCount() {
cout << count;
}
};
int ObjectCount::count = 0;
class Employee : public ObjectCount {
#else
class Employee {
#endif
public:
Employee(){}
Employee(const Employee & emp) {
}
};
at DEBUG mode, invoking of ObjectCount::showCount() method will return count of object(s) created.

Better off to use memory profiling & leak detection tools like Valgrind or Rational Purify.
If you can't and want to implement your own mechanism then,
You should overload the new and delete operators for your class and then implement the memory diagnostic in them.
Have a look at this C++ FAQ answer to know how to do that and what precautions you should take.

This is a sort of working example of something similar: http://www.almostinfinite.com/memtrack.html (just copy the code at the end of the page and put it in Memtrack.h, and then run TrackListMemoryUsage() or one of the other functions to see diagnostics)
It overrides operator new and does some arcane macro stuff to make it 'stamp' each allocation with information that allow it to count how many instances of an object and how much memory they're usingusing. It's not perfect though, the macros they use break down under certain conditions. If you decide to try this out make sure to include it after any standard headers.

Without knowing your code and your requirements, I see 2 reasonable options:
a) Use boost::shared_ptr. It has the atomic reference counts you suggested built in and takes care of your memory management (so that you'd never actually care to look at the count). Its reference count is available through the use_count() member.
b) If the implications of a), like dealing with pointers and having shared_ptrs everywhere, or possible performance overhead, are not acceptable for you, I'd suggest to simply use available tools for memory leak detection (e.g. Valgrind, see above) that'll report your loose objects at program exit. And there's no need to use intrusive helper classes for (anyway debug-only) tracking object counts, that just mess up your code, IMHO.

We used to have the solution of a base class with internal counter and derive from it, but we changed it all into boost::shared_ptr, it keeps a reference counter and it cleans up memory for you. The boost smart pointer family is quite useful:
boost smart pointers

My approach, which outputs leakage count to Debug Output (via the DebugPrint function implemented in our code base, replace that call with your own...)
#include <typeinfo>
#include <string.h>
class CountedObjImpl
{
public:
CountedObjImpl(const char* className) : mClassName(className) {}
~CountedObjImpl()
{
DebugPrint(_T("**##** Leakage count for %hs: %Iu\n"), mClassName.c_str(), mInstanceCount);
}
size_t& GetCounter()
{
return mInstanceCount;
}
private:
size_t mInstanceCount = 0;
std::string mClassName;
};
template <class Obj>
class CountedObj
{
public:
CountedObj() { GetCounter()++; }
CountedObj(const CountedObj& obj) { GetCounter()++; }
~CountedObj() { GetCounter()--; }
static size_t OustandingObjects() { return GetCounter(); }
private:
size_t& GetCounter()
{
static CountedObjImpl mCountedObjImpl(typeid(Obj).name());
return mCountedObjImpl.GetCounter();
}
};
Example usage:
class PostLoadInfoPostLoadCB : public PostLoadCallback, private CountedObj<PostLoadInfoPostLoadCB>

Adding counters to individual classes was discussed in some of the answers. However, it requires to pick the classes to have counted and modify them in one way or the other. The assumption in the following is, you are adding such counters to find bugs where more objects of certain classes are kept alive than expected.
To shortly recap some things mentioned already: For real memory leaks, certainly there is valgrind:memcheck and the leak sanitizers. However, for other scenarios without real leaks they do not help (uncleared vectors, map entries with keys never accessed, cycles of shared_ptrs, ...).
But, since this was not mentioned: In the valgrind tool suite there is also massif, which can provide you with the information about all pieces of allocated memory and where they were allocated. However, let's assume that valgrind:massif is also not an option for you, and you truly want instance counts.
For the purpose of occasional bug hunting - if you are open for some hackish solution and if the above options don't work - you might consider the following: Nowadays, many objects on the heap are effectively held by smart pointers. This could be the smart pointer classes from the standard library, or the smart pointer classes of the respective helper libraries you use. The trick is then the following (picking the shared_ptr as an example): You can get instance counters for many classes at once by patching the shared_ptr implementation, namely by adding instance counts to the shared_ptr class. Then, for some class Foo, the counter belonging to shared_ptr<Foo> will give you an indication of the number of instances of class Foo.
Certainly, it is not quite as accurate as adding the counters to the respective classes directly (instances referenced only by raw pointers are not counted), but possibly it is accurate enough for your case. And, certainly, this is not about changing the smart pointer classes permanently - only during the bug hunting. At least, the smart pointer implementations are not too complex, so patching them is simple.

This approach is much simpler than the rest of the solutions here.
Make a variable for the count and make it static. Increase that variable by +1 inside the constructor and decrease it by -1 inside the destructor.
Make sure you initialize the variable (it cannot be initialized inside the header because its static).
.h
// Pseudo code warning
class MyObject
{
MyObject();
~MyObject();
static int totalObjects;
}
.cpp
int MyObject::totalObjects = 0;
MyObject::MyObject()
{
++totalObjects;
}
MyObject::~MyObject()
{
--totalObjects;
}
For every new instance you make, the constructor is called and totalObjects automatically grows by 1.

Lazy/multi-stage construction in C++

What's a good existing class/design pattern for multi-stage construction/initialization of an object in C++?
I have a class with some data members which should be initialized in different points in the program's flow, so their initialization has to be delayed. For example one argument can be read from a file and another from the network.
Currently I am using boost::optional for the delayed construction of the data members, but it's bothering me that optional is semantically different than delay-constructed.
What I need reminds features of boost::bind and lambda partial function application, and using these libraries I can probably design multi-stage construction - but I prefer using existing, tested classes. (Or maybe there's another multi-stage construction pattern which I am not familiar with).

The key issue is whether or not you should distinguish completely populated objects from incompletely populated objects at the type level. If you decide not to make a distinction, then just use boost::optional or similar as you are doing: this makes it easy to get coding quickly. OTOH you can't get the compiler to enforce the requirement that a particular function requires a completely populated object; you need to perform run-time checking of fields each time.
Parameter-group Types
If you do distinguish completely populated objects from incompletely populated objects at the type level, you can enforce the requirement that a function be passed a complete object. To do this I would suggest creating a corresponding type XParams for each relevant type X. XParams has boost::optional members and setter functions for each parameter that can be set after initial construction. Then you can force X to have only one (non-copy) constructor, that takes an XParams as its sole argument and checks that each necessary parameter has been set inside that XParams object. (Not sure if this pattern has a name -- anybody like to edit this to fill us in?)
"Partial Object" Types
This works wonderfully if you don't really have to do anything with the object before it is completely populated (perhaps other than trivial stuff like get the field values back). If you do have to sometimes treat an incompletely populated X like a "full" X, you can instead make X derive from a type XPartial, which contains all the logic, plus protected virtual methods for performing precondition tests that test whether all necessary fields are populated. Then if X ensures that it can only ever be constructed in a completely-populated state, it can override those protected methods with trivial checks that always return true:
class XPartial {
optional<string> name_;
public:
void setName(string x) { name_.reset(x); } // Can add getters and/or ctors
string makeGreeting(string title) {
if (checkMakeGreeting_()) { // Is it safe?
return string("Hello, ") + title + " " + *name_;
} else {
throw domain_error("ZOINKS"); // Or similar
}
}
bool isComplete() const { return checkMakeGreeting_(); } // All tests here
protected:
virtual bool checkMakeGreeting_() const { return name_; } // Populated?
};
class X : public XPartial {
X(); // Forbid default-construction; or, you could supply a "full" ctor
public:
explicit X(XPartial const& x) : XPartial(x) { // Avoid implicit conversion
if (!x.isComplete()) throw domain_error("ZOINKS");
}
X& operator=(XPartial const& x) {
if (!x.isComplete()) throw domain_error("ZOINKS");
return static_cast<X&>(XPartial::operator=(x));
}
protected:
virtual bool checkMakeGreeting_() { return true; } // No checking needed!
};
Although it might seem the inheritance here is "back to front", doing it this way means that an X can safely be supplied anywhere an XPartial& is asked for, so this approach obeys the Liskov Substitution Principle. This means that a function can use a parameter type of X& to indicate it needs a complete X object, or XPartial& to indicate it can handle partially populated objects -- in which case either an XPartial object or a full X can be passed.
Originally I had isComplete() as protected, but found this didn't work since X's copy ctor and assignment operator must call this function on their XPartial& argument, and they don't have sufficient access. On reflection, it makes more sense to publically expose this functionality.

I must be missing something here - I do this kind of thing all the time. It's very common to have objects that are big and/or not needed by a class in all circumstances. So create them dynamically!
struct Big {
char a[1000000];
};
class A {
public:
A() : big(0) {}
~A() { delete big; }
void f() {
makebig();
big->a[42] = 66;
}
private:
Big * big;
void makebig() {
if ( ! big ) {
big = new Big;
}
}
};
I don't see the need for anything fancier than that, except that makebig() should probably be const (and maybe inline), and the Big pointer should probably be mutable. And of course A must be able to construct Big, which may in other cases mean caching the contained class's constructor parameters. You will also need to decide on a copying/assignment policy - I'd probably forbid both for this kind of class.

I don't know of any patterns to deal with this specific issue. It's a tricky design question, and one somewhat unique to languages like C++. Another issue is that the answer to this question is closely tied to your individual (or corporate) coding style.
I would use pointers for these members, and when they need to be constructed, allocate them at the same time. You can use auto_ptr for these, and check against NULL to see if they are initialized. (I think of pointers are a built-in "optional" type in C/C++/Java, there are other languages where NULL is not a valid pointer).
One issue as a matter of style is that you may be relying on your constructors to do too much work. When I'm coding OO, I have the constructors do just enough work to get the object in a consistent state. For example, if I have an Image class and I want to read from a file, I could do this:
image = new Image("unicorn.jpeg"); /* I'm not fond of this style */
or, I could do this:
image = new Image(); /* I like this better */
image->read("unicorn.jpeg");
It can get difficult to reason about how a C++ program works if the constructors have a lot of code in them, especially if you ask the question, "what happens if a constructor fails?" This is the main benefit of moving code out of the constructors.
I would have more to say, but I don't know what you're trying to do with delayed construction.
Edit: I remembered that there is a (somewhat perverse) way to call a constructor on an object at any arbitrary time. Here is an example:
class Counter {
public:
Counter(int &cref) : c(cref) { }
void incr(int x) { c += x; }
private:
int &c;
};
void dontTryThisAtHome() {
int i = 0, j = 0;
Counter c(i); // Call constructor first time on c
c.incr(5); // now i = 5
new(&c) Counter(j); // Call the constructor AGAIN on c
c.incr(3); // now j = 3
}
Note that doing something as reckless as this might earn you the scorn of your fellow programmers, unless you've got solid reasons for using this technique. This also doesn't delay the constructor, just lets you call it again later.

Using boost.optional looks like a good solution for some use cases. I haven't played much with it so I can't comment much. One thing I keep in mind when dealing with such functionality is whether I can use overloaded constructors instead of default and copy constructors.
When I need such functionality I would just use a pointer to the type of the necessary field like this:
public:
MyClass() : field_(0) { } // constructor, additional initializers and code omitted
~MyClass() {
if (field_)
delete field_; // free the constructed object only if initialized
}
...
private:
...
field_type* field_;
next, instead of using the pointer I would access the field through the following method:
private:
...
field_type& field() {
if (!field_)
field_ = new field_type(...);
return field_;
}
I have omitted const-access semantics

The easiest way I know is similar to the technique suggested by Dietrich Epp, except it allows you to truly delay the construction of an object until a moment of your choosing.
Basically: reserve the object using malloc instead of new (thereby bypassing the constructor), then call the overloaded new operator when you truly want to construct the object via placement new.
Example:
Object *x = (Object *) malloc(sizeof(Object));
//Use the object member items here. Be careful: no constructors have been called!
//This means you can assign values to ints, structs, etc... but nested objects can wreak havoc!
//Now we want to call the constructor of the object
new(x) Object(params);
//However, you must remember to also manually call the destructor!
x.~Object();
free(x);
//Note: if you're the malloc and new calls in your development stack
//store in the same heap, you can just call delete(x) instead of the
//destructor followed by free, but the above is the correct way of
//doing it
Personally, the only time I've ever used this syntax was when I had to use a custom C-based allocator for C++ objects. As Dietrich suggests, you should question whether you really, truly must delay the constructor call. The base constructor should perform the bare minimum to get your object into a serviceable state, whilst other overloaded constructors may perform more work as needed.

I don't know if there's a formal pattern for this. In places where I've seen it, we called it "lazy", "demand" or "on demand".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js