Suppose I have some container class like the one below:
class Container {
public:
int const & operator[] (int n) const {
return data[n];
}
private:
std::vector<int> data;
}
I need to access its elements from multiple threads, using overloaded operator [] and passing an object of this class to lambda capture by reference:
Container store;
std::thread t_1([&store]() { /* do something here and read from store */ } );
std::thread t_2([&store]() { /* do something here and read from store */ } );
Will there be some slowdowns because of such design? Is it possible to speedup this part somehow?
Since std::vector's data() lies on the heap anyway, you cannot omit the access there. The only faster way would be to keep the elements on the stacks of the two threads (threads have separate stacks but share heap space), but this is not a possibility here. Thus, I see no optimisations for your case, unless you share your whole implementation and by changing the approach, one may come up with more performant implementation.
I would advise against it, though. That would belong to CodeReview, not StackOverflow.
Lastly, I would like to mention thread safety - I do not see any races here and I believe you specifically made sure that the example does not hint that you may encounter any (by only showing access to reading and not writing to shared resources), but it is still a good idea to check for them. If what you are doing is only reading, no data races will occur.
Related
I have a class that holds a shared pointer to a map.
Code snippet:
class SomeClass {
private:
shared_ptr<const map<int, string>> sptr_;
ReaderWriterLock rw_lock_;
public:
SomeClass() : sptr_(make_shared<const map<int, string>>()) {};
void UpdateSharedPointer(shared_ptr<const map<int, string>>&& new_sptr) {
if (!new_sptr) { return; }
rw_lock_.Lock(true /* is_writer */);
sptr_ = move(new_sptr);
rw_lock_.Unlock();
}
void ReadSharedPointer() const {
rw_lock_.Lock(false /* is_writer */);
// Create a copy of the pointer.
shared_ptr<const map<int, string>> local_sptr = sptr_;
rw_lock_.Unlock();
if (!local_sptr) {
return;
}
// Read the map object pointed by the local_sptr and perform some operations.
}
}
Usage pattern: An instance of SomeClass can be accessed by multiple threads. And I am okay working on a stale copy of map inside ReadSharedPointer() method.
The use of shared pointer to a constant map object leads to a less contentious code. If someone needs to update or read the map, the lock is held for only a short while. Instead, if I eliminate the shared pointer, any map update or read will need to hold the rw_lock_ for a larger time.
What I am worried about is that this extra pointer indirection might make the code fragile. For example, my current use case requires only a constant map, but if in future someone updates the map to be a non-const object and forgets to update the locking mechanism associated with the read operation, a race condition might get introduced.
Also, one might argue that eliminating the shared pointer makes the complete workflow simpler to understand. And a shared pointer might not make sense if we are not truly sharing this map but only introducing a shared pointer to reduce contention.
So basically my questions are:
How good of a design is this? What are the pros and cons? Is there any alternative to this?
Within a class method, I'm accessing private attributes - or attributes of a nested class. Moreover, I'm looping over these attributes.
I was wondering what is the most efficient way in terms of time (and memory) between:
copying the attributes and accessing them within the loop
Accessing the attributes within the loop
Or maybe using an iterator over the attribute
I feel my question is related to : Efficiency of accessing a value through a pointer vs storing as temporary value. But in my case, I just need to access a value, not change it.
Example
Given two classes
class ClassA
{
public:
vector<double> GetAVector() { return AVector; }
private:
vector<double> m_AVector;
}
and
class ClassB
{
public:
void MyFunction();
private:
vector<double> m_Vector;
ClassA m_A;
}
I. Should I do:
1.
void ClassB::MyFunction()
{
vector<double> foo;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(m_Vector[i]));
}
/// do something ...
}
2.
void ClassB::MyFunction()
{
vector<double> foo;
vector<double> VectorCopy = m_Vector;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(VectorCopy[i]));
}
/// do something ...
}
3.
void ClassB::MyFunction()
{
vector<double> foo;
for(vector<double>::iterator it = m_Vector.begin(); it != m_Vector.end() ; it++)
{
foo.push_back(SomeFunction((*it)));
}
/// do something ...
}
II. What if I'm not looping over m_vector but m_A.GetAVector()?
P.S. : I understood while going through other posts that it's not useful to 'micro'-optimize at first but my question is more related to what really happens and what should be done - as for standards (and coding-style)
You're in luck: you can actually figure out the answer all by yourself, by trying each approach with your compiler and on your operating system, and timing each approach to see how long it takes.
There is no universal answer here, that applies to every imaginable C++ compiler and operating system that exists on the third planet from the sun. Each compiler, and hardware is different, and has different runtime characteristics. Even different versions of the same compiler will often result in different runtime behavior that might affect performance. Not to mention various compilation and optimization options. And since you didn't even specify your compiler and operating system, there's literally no authoritative answer that can be given here.
Although it's true that for some questions of this type it's possible to arrive at the best implementation with a high degree of certainty, for most use cases, this isn't one of them. The only way you can get the answer is to figure it out yourself, by trying each alternative yourself, profiling, and comparing the results.
I can categorically say that 2. is less efficient than 1. Copying to a local copy, and then accessing it like you would the original would only be of potential benefit if accessing a stack variable is quicker than accessing a member one, and it's not, so it's not (if you see what I mean).
Option 3. is trickier, since it depends on the implementation of the iter() method (and end(), which may be called once per loop) versus the implementation of the operator [] method. I could irritate some C++ die-hards and say there's an option 4: ask the Vector for a pointer to the array and use a pointer or array index on that directly. That might just be faster than either!
And as for II, there is a double-indirection there. A good compiler should spot that and cache the result for repeated use - but otherwise it would only be marginally slower than not doing so: again, depending on your compiler.
Without optimizations, option 2 would be slower on every imaginable platform, becasue it will incur copy of the vector, and the access time would be identical for local variable and class member.
With optimization, depending on SomeFunction, performance might be the same or worse for option 2. Same performance would happen if SomeFunction is either visible to compiler to not modify it's argument, or it's signature guarantees that argument will not be modified - in this case compiler can optimize away the copy altogether. Otherwise, the copy will remain.
I wanted to know of how to make my use of the command pattern thread-safe while maintaining performance. I have a simulation where I perform upwards of tens of billions of iterations; performance is critical.
In this simulation, I have a bunch of Moves that perform commands on objects in my simulation. The base class looks like this:
class Move
{
public:
virtual ~Move(){}
// Perform a move.
virtual void Perform(Object& obj) = 0;
// Undo a move.
virtual void Undo() = 0;
};
The reason I have the object passed in on Perform rather than the constructor, as is typical with the Command pattern, is that I cannot afford to instantiate a new Move object every iteration. Rather, a concrete implementation of Move would simply take Object, maintain a pointer to it and it's previous state for when it's needed. Here's an example of a concrete implementation:
class ConcreteMove : public Move
{
std::string _ns;
std::string _prev;
Object* _obj;
ConcreteMove(std::string newstring): _ns(newstring) {}
virtual void Perform(Object& obj) override
{
_obj= &obj;
_prev = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo()
{
_obj->SetIdentifier(_prev);
}
};
Unfortunately, what this has cost me is thread-safety. I want to parallelize my loop, where multiple iterators perform moves on a bunch of objects simultaneously. But obviously one instance of ConcreteMove cannot be reused because of how I implemented it.
I considered having Perform return a State object which can be passed into Undo, that way making the implementation thread-safe, since it is independent of the ConcereteMove state. However, the creation and destruction of such an object on each iteration is too costly.
Furthermore, the simulation has a vector of Moves because multiple moves can be performed every iteration stored in a MoveManager class which contains a vector of Move object pointers instantiated by the client. I set it up this way because the constructors of each particular Concrete moves take parameters (see above example).
I considered writing a copy operator for Move and MoveManager such that it can be duplicated amongst the threads, but I don't believe that is a proper answer because then the ownership of the Move objects falls on MoveManager rather than the client (who is only responsible for the first instance). Also, the same would be said for MoveManager and responsibility of maintaining that.
Update: Here's my MoveManager if it matters
class MoveManager
{
private:
std::vector<Move*> _moves;
public:
void PushMove(Move& move)
{
_moves.push_back(&move);
}
void PopMove()
{
_moves.pop_back();
}
// Select a move by index.
Move* SelectMove(int i)
{
return _moves[i];
}
// Get the number of moves.
int GetMoveCount()
{
return (int)_moves.size();
}
};
Clarification: All I need is one collection of Move objects per thread. They are re-used every iteration, where Perform is called on different objects each time.
Does anyone know how to solve this problem efficiently in a thread-safe manner?
Thanks!
What about the notion of a thread ID. Also, why not preconstruct the identifier strings and pass pointers to them?
class ConcreteMove : public Move
{
std::string *_ns;
std::vector<std::string *> _prev;
std::vector<Object *> _obj;
ConcreteMove(unsigned numthreads, std::string *newstring)
: _ns(newstring),
_prev(numthreads),
_obj(numthreads)
{
}
virtual void Perform(unsigned threadid, Object &obj) override
{
_obj[threadid] = &obj;
_prev[threadid] = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo(unsigned threadid)
{
_obj[threadid]->SetIdentifier(_prev[threadid]);
}
};
Impossible with stated requirements. Specifically,
Use the command pattern. "the command pattern is a behavioral design pattern in which an object is used to represent and encapsulate all the information needed to call a method at a later time." Thus you're storing data.
You "can't afford" to allocate memory.
You have "billions" of iterations, which means some large static allocation won't suffice.
You want to store data without any place to store it. Thus there is no answer. However, if you're willing to change your requirements, there are undoubtedly many ways to solve your problem (whatever it may be -- I couldn't tell from the description.)
I also can't estimate how many Move objects you need at once. If that number is reasonably low then a specialized allocation scheme might solve part of your problem. Likewise, if most of the Move objects are duplicates, a different specialized allocation scheme might help.
In general what you're asking can't be solved, but relax the requirements and it shouldn't be hard.
Your Move Manager should not contain a vector of pointers, it should be a vector of Move objects
std::vector<Move> _moves;
It seems you will have one Move Manager per thread, so no issue of multi-threading problems, set the vector capacity at max, and then apply perform and other actions on the move in the vector
No new allocation, and you will be reusing the move objects
I would like to know the exact number of instances of certain objects allocated at certain point of execution. Mostly for hunting possible memory leaks(I mostly use RAII, almost no new, but still I could forget .clear() on vector before adding new elements or something similar). Ofc I could have an
atomic<int> cntMyObject;
that I -- in destructor, ++ increase in constructor, cpy constructor(I hope I covered everything :)).
But that is hardcoding for every class. And it is not simple do disable it in "Release" mode.
So is there any simple elegant way that can be easily disabled to count object instances?
Have a "counted object" class that does the proper reference counting in its constructor(s) and destructor, then derive your objects that you want to track from it. You can then use the curiously recurring template pattern to get distinct counts for any object types you wish to track.
// warning: pseudo code
template <class Obj>
class CountedObj
{
public:
CountedObj() {++total_;}
CountedObj(const CountedObj& obj) {++total_;}
~CountedObj() {--total_;}
static size_t OustandingObjects() {return total_;}
private:
static size_t total_;
};
class MyClass : private CountedObj<MyClass>
{};
you can apply this approach
#ifdef DEBUG
class ObjectCount {
static int count;
protected:
ObjectCount() {
count++;
}
public:
void static showCount() {
cout << count;
}
};
int ObjectCount::count = 0;
class Employee : public ObjectCount {
#else
class Employee {
#endif
public:
Employee(){}
Employee(const Employee & emp) {
}
};
at DEBUG mode, invoking of ObjectCount::showCount() method will return count of object(s) created.
Better off to use memory profiling & leak detection tools like Valgrind or Rational Purify.
If you can't and want to implement your own mechanism then,
You should overload the new and delete operators for your class and then implement the memory diagnostic in them.
Have a look at this C++ FAQ answer to know how to do that and what precautions you should take.
This is a sort of working example of something similar: http://www.almostinfinite.com/memtrack.html (just copy the code at the end of the page and put it in Memtrack.h, and then run TrackListMemoryUsage() or one of the other functions to see diagnostics)
It overrides operator new and does some arcane macro stuff to make it 'stamp' each allocation with information that allow it to count how many instances of an object and how much memory they're usingusing. It's not perfect though, the macros they use break down under certain conditions. If you decide to try this out make sure to include it after any standard headers.
Without knowing your code and your requirements, I see 2 reasonable options:
a) Use boost::shared_ptr. It has the atomic reference counts you suggested built in and takes care of your memory management (so that you'd never actually care to look at the count). Its reference count is available through the use_count() member.
b) If the implications of a), like dealing with pointers and having shared_ptrs everywhere, or possible performance overhead, are not acceptable for you, I'd suggest to simply use available tools for memory leak detection (e.g. Valgrind, see above) that'll report your loose objects at program exit. And there's no need to use intrusive helper classes for (anyway debug-only) tracking object counts, that just mess up your code, IMHO.
We used to have the solution of a base class with internal counter and derive from it, but we changed it all into boost::shared_ptr, it keeps a reference counter and it cleans up memory for you. The boost smart pointer family is quite useful:
boost smart pointers
My approach, which outputs leakage count to Debug Output (via the DebugPrint function implemented in our code base, replace that call with your own...)
#include <typeinfo>
#include <string.h>
class CountedObjImpl
{
public:
CountedObjImpl(const char* className) : mClassName(className) {}
~CountedObjImpl()
{
DebugPrint(_T("**##** Leakage count for %hs: %Iu\n"), mClassName.c_str(), mInstanceCount);
}
size_t& GetCounter()
{
return mInstanceCount;
}
private:
size_t mInstanceCount = 0;
std::string mClassName;
};
template <class Obj>
class CountedObj
{
public:
CountedObj() { GetCounter()++; }
CountedObj(const CountedObj& obj) { GetCounter()++; }
~CountedObj() { GetCounter()--; }
static size_t OustandingObjects() { return GetCounter(); }
private:
size_t& GetCounter()
{
static CountedObjImpl mCountedObjImpl(typeid(Obj).name());
return mCountedObjImpl.GetCounter();
}
};
Example usage:
class PostLoadInfoPostLoadCB : public PostLoadCallback, private CountedObj<PostLoadInfoPostLoadCB>
Adding counters to individual classes was discussed in some of the answers. However, it requires to pick the classes to have counted and modify them in one way or the other. The assumption in the following is, you are adding such counters to find bugs where more objects of certain classes are kept alive than expected.
To shortly recap some things mentioned already: For real memory leaks, certainly there is valgrind:memcheck and the leak sanitizers. However, for other scenarios without real leaks they do not help (uncleared vectors, map entries with keys never accessed, cycles of shared_ptrs, ...).
But, since this was not mentioned: In the valgrind tool suite there is also massif, which can provide you with the information about all pieces of allocated memory and where they were allocated. However, let's assume that valgrind:massif is also not an option for you, and you truly want instance counts.
For the purpose of occasional bug hunting - if you are open for some hackish solution and if the above options don't work - you might consider the following: Nowadays, many objects on the heap are effectively held by smart pointers. This could be the smart pointer classes from the standard library, or the smart pointer classes of the respective helper libraries you use. The trick is then the following (picking the shared_ptr as an example): You can get instance counters for many classes at once by patching the shared_ptr implementation, namely by adding instance counts to the shared_ptr class. Then, for some class Foo, the counter belonging to shared_ptr<Foo> will give you an indication of the number of instances of class Foo.
Certainly, it is not quite as accurate as adding the counters to the respective classes directly (instances referenced only by raw pointers are not counted), but possibly it is accurate enough for your case. And, certainly, this is not about changing the smart pointer classes permanently - only during the bug hunting. At least, the smart pointer implementations are not too complex, so patching them is simple.
This approach is much simpler than the rest of the solutions here.
Make a variable for the count and make it static. Increase that variable by +1 inside the constructor and decrease it by -1 inside the destructor.
Make sure you initialize the variable (it cannot be initialized inside the header because its static).
.h
// Pseudo code warning
class MyObject
{
MyObject();
~MyObject();
static int totalObjects;
}
.cpp
int MyObject::totalObjects = 0;
MyObject::MyObject()
{
++totalObjects;
}
MyObject::~MyObject()
{
--totalObjects;
}
For every new instance you make, the constructor is called and totalObjects automatically grows by 1.
I have unexpected assertions faillures in my code using a checked STL implentation.
After some research, I narrowed down the problem to a push_back in a vector called from a different thread than the one in which the vector was created.
The simplest code to reproduce this problem is :
class SomeClass
{
private:
std::vector<int> theVector;
public:
SomeClass ()
{
theVector.push_back(1); // Ok
}
void add()
{
theVector.push_back(1); // Crash
}
};
The only difference is that SomeClass is instanciated from my main thread, and add is called from another thread. However, there is no concurency issue : in the simplest form of code I used for troubleshooting nobody is reading or writing from this vector except the cases I mentioned above.
Tracing into the push_back code, I noticed that some methods from std::vector like count() or size() return garbage, when it's called from the other thred (method "add"), and correct values when called from the creating thread (in the constructor for example)
Should I conclude that std::vector is not usable in a multithreaded environement ? Or is there a solution for this problem ?
EDIT : removed volatile
EDIT 2 : Do you think it's possible that the problem doesn't lie in multithread ? In my test run, add is called only once (verified using a break point). If I remove the push_back from the constructor, I still crashes. So in the end, even with only one call to a vector's method, in a function called once make the assertion fail. Therefore there can't be concurency, or ... ?
std::vector definitely is usable in a multi-threaded environment, provided you don't access the vector from two threads at once. I do it all the time without trouble.
Since vector isn't the problem, you need to look more closely at your synchronization mechanism, as this is most likely the problem.
I noticed that you marked the vector as volatile. Do you expect that making it volatile will provide synchronization? Because it won't. See here for more information.
EDIT: Originally provided wrong link. This is now fixed. Sorry for confusion.
If you can guarantee that nobody is writing to or reading from the vector when you call push_back, there's no reason that it should fail. You could be dealing with higher-level memory corruption. You should verify that "this" points to a genuine instance of SomeClass, check it's other members, etc.
Most STL implementations are not thread safe. You'll need to use thread synchronization (e.g. mutex) to prevent both threads from stomping on each other while accessing the vector. Basically, what you will need to do is create a class that contains the vector and a mutex and accessor functions that protect the vector for both reading and writing operations.
Whether Standard Library supports multithreading is implementation defined. You have to read documentation for your specific compiler.
Additionaly what you can do is to add some log messages as in the following code:
class SomeClass
{
private:
volatile std::vector<int> theVector;
public:
SomeClass ()
{
std::cout << "SomeClass::SomeClass" << std::endl;
theVector.push_back(1); // Ok
}
~SomeClass ()
{
std::cout << "SomeClass::~SomeClass" << std::endl;
}
void add()
{
std::cout << "SomeClass::add" << std::endl;
theVector.push_back(1);
}
};
Make sure that instance of the SomeClass is still exists when you call add function.