Multiple threads access shared resources - c++

I'm currently working on a particle system, which uses one thread in which the particles are first updated, then drawn. The particles are stored in a std::vector. I would like to move the update function to a separate thread to improve the systems performance. However this means that I encounter problems when the update thread and the draw thread are accessing the std::vector at the same time. My update function will change the values for the position, and colour of all particles, and also almost always resize the std::vector.
Single thread approach:
std::vector<Particle> particles;
void tick() //tick would be called from main update loop
{
//slow as must wait for update to draw
updateParticles();
drawParticles();
}
Multithreaded:
std::vector<Particle> particles;
//quicker as no longer need to wait to draw and update
//crashes when both threads access the same data, or update resizes vector
void updateThread()
{
updateParticles();
}
void drawThread()
{
drawParticles();
}
To fix this problem I have investigated using std::mutex however in practice, with a large amount of particles, the constant locking of threads meant that performance didn't increase. I have also investigated std::atomic however, neither the particles nor std::vector are trivially copyable and so can't use this either.
Multithreaded using mutex:
NOTE: I am using SDL mutex, as far as I am aware, the principles are the same.
SDL_mutex mutex = SDL_CreateMutex();
SDL_cond canDraw = SDL_CreateCond();
SDL_cond canUpdate = SDL_CreateCond();
std::vector<Particle> particles;
//locking the threads leads to the same problems as before,
//now each thread must wait for the other one
void updateThread()
{
SDL_LockMutex(lock);
while(!canUpdate)
{
SDL_CondWait(canUpdate, lock);
}
updateParticles();
SDL_UnlockMutex(lock);
SDL_CondSignal(canDraw);
}
void drawThread()
{
SDL_LockMutex(lock);
while(!canDraw)
{
SDL_CondWait(canDraw, lock);
}
drawParticles();
SDL_UnlockMutex(lock);
SDL_CondSignal(canUpdate);
}
I am wondering if there are any other ways to implement the multi threaded approach? Essentially preventing the same data from being accessed by both threads at the same time, without having to make each thread wait for the other. I have thought about making a local copy of the vector to draw from, but this seems like it would be inefficient, and may run into the same problems if the update thread changes the vector while it's being copied?

I would use a more granular locking strategy. Instead of storing a particle object in your vector, I would store a pointer to a different object.
struct lockedParticle {
particle* containedParticle;
SDL_mutex lockingObject;
};
In updateParticles() I would attempt to obtain the individual locking objects using SDL_TryLockMutex() - if I fail to obtain control of the mutex I would add the pointer to this particular lockedParticle instance to another vector, and retry later to update them.
I would follow a similar strategy inside the drawParticles(). This relies on the fact that draw order does not matter for particles, which is often the case.

If data consistency is not a concern you can avoid blocking the whole vector by encapsulating vector in a custom class and setting mutex on single read/write operations only, something like:
struct SharedVector
{
// ...
std::vector<Particle> vec;
void push( const& Particle particle )
{
SDL_LockMutex(lock);
vec.push_back(particle);
SDL_UnlockMutex(lock);
}
}
//...
SharedVector particles;
Then of course, you need to amend updateParticles() and drawParticles() to use new type instead of std::vector.
EDIT:
You can avoid creating new structure by using mutexes in updateParticles() and drawParticles() methods, e.g.
void updateParticles()
{
//... get Particle particle object
SDL_LockMutex(lock);
particles.push_back(particle);
SDL_UnlockMutex(lock);
}
The same should be done for drawParticles() as well.

If the vector is changing all the time, you can use two vectors. drawParticles would have its own copy, and updateParticles would write to another one. Once both functions are done, swap, copy, or move the vector used by updateParticles to the to be used by drawParticles. (updateParticles can read from the same vector used by drawParticles to get at the current particle positions, so you shouldn't need to create a complete new copy.) No locking necessary.

Related

Trying to control multithreaded access to array using std::atomic

I'm trying to control multithreaded access to a vector of data which is fixed in size, so threads will wait until their current position in it has been filled before trying to use it, or will fill it themselves if no-one else has yet. (But ensure no-one is waiting around if their position is already filled, or no-one has done it yet)
However, I am struggling to understand a good way to do this, especially involving std::atomic. I'm just not very familiar with C++ multithreading concepts aside from basic std::thread usage.
Here is a very rough example of the problem:
class myClass
{
struct Data
{
int res1;
};
std::vector<Data*> myData;
int foo(unsigned long position)
{
if (!myData[position])
{
bar(myData[position]);
}
// Do something with the data
return 5 * myData[position]->res1;
}
void bar(Data* &data)
{
data = new Data;
// Do a whole bunch of calculations and so-on here
data->res1 = 42;
}
};
Now imagine if foo() is being called multi-threaded, and multiple threads may (or may not) have the same position at once. If that happens, there's a chance that a thread may (between when the Data was created and when bar() is finished, try to actually use the data.
So, what are the options?
1: Make a std::mutex for every position in myData. What if there are 10,000 elements in myData? That's 10,000 std::mutexes, not great.
2: Put a lock_guard around it like this:
std::mutex myMutex;
{
const std::lock_guard<std::mutex> lock(myMutex);
if (!myData[position])
{
bar(myData[position]);
}
}
While this works, it also means if different threads are working in different positions, they wait needlessly, wasting all of the threading advantage.
3: Use a vector of chars and a spinlock as a poor man's mutex? Here's what that might look like:
static std::vector<char> positionInProgress;
static std::vector<char> positionComplete;
class myClass
{
struct Data
{
int res1;
};
std::vector<Data*> myData;
int foo(unsigned long position)
{
if (positionInProgress[position])
{
while (positionInProgress[position])
{
; // do nothing, just wait until it is done
}
}
else
{
if (!positionComplete[position])
{
// Fill the data and prevent anyone from using it until it is complete
positionInProgress[position] = true;
bar(myData[position]);
positionInProgress[position] = false;
positionComplete[position] = true;
}
}
// Do something with the data
return 5 * myData[position]->res1;
}
void bar(Data* data)
{
data = new Data;
// Do a whole bunch of calculations and so-on here
data->res1 = 42;
}
};
This seems to work, but none of the test or set operations are atomic, so I have a feeling I'm just getting lucky.
4: What about std::atomic and std::atomic_flag? Well, there are a few problems.
std::atomic_flag doesn't have a way to test without setting in C++11...which makes this kind of difficult.
std::atomic is not movable or copy-constructable, so I cannot make a vector of them (I do not know the number of positions during construction of myClass)
Conclusion:
This is the simplest example that (likely) compiles I can think of that demonstrates my real problem. In reality, myData is a 2-dimensional vector implemented using a special hand-rolled solution, Data itself is a vector of pointers to more complex data types, the data isn't simply returned, etc. This is the best I could come up with.
The biggest problem you're likely to have is that a vector itself is not thread-safe, so you can't do ANY operation that might chage the vector (invalidate references to elements of the vector) while another thread might be accessing it, such as resize or push_back. However, if you vector is effectively "fixed" (you set the size prior to ever spawning threads and thereafter only ever access elements using at or operator[] and never ever modify the vector itself), you can get away with using a vector of atomic objects. In this case you could have:
std::vector<std::atomic<Data*>> myData;
and your code to setup and use an element could look like:
if (!myData[position]) {
Data *tmp = new Data;
if (!mydata[position].compare_exchange_strong(nullptr, tmp)) {
// some other thread did the setup
delete tmp; } }
myData[position]->bar();
Of course you still need to make sure that the operations done on members of Data in bar are themselves threadsafe, as you can get mulitple threads calling bar on the same Data instance here.

C++ Threading using 2 Containers

I have the following problem. I use a vector that gets filled up with values from a temperature sensor. This function runs in one thread. Then I have another thread responsible for publishing all the values into a data base which runs once every second. Now the publishing thread will lock the vector using a mutex, so the function that fills it with values will get blocked. However, while the thread that publishes the values is using the vector I want to use another vector to save the temperature values so that I don't lose any values while the data is getting published. How do I get around this problem? I thought about using a pointer that points to the containers and then switching it to the other container once it gets locked to keep saving values, but I dont quite know how.
I tried to add a minimal reproducable example, I hope it kind of explains my situation.
void publish(std::vector<temperature> &inputVector)
{
//this function would publish the values into a database
//via mqtt and also runs in a thread.
}
int main()
{
std::vector<temperature> testVector;
std::vector<temperature> testVector2;
while(1)
{
//I am repeatedly saving values into the vector.
//I want to do this in a thread but if the vector locked by a mutex
//i want to switch over to the other vector
testVector.push_back(testSensor.getValue());
}
}
Assuming you are using std::mutex, you can use mutex::try_lock on the producer side. Something like this:
while(1)
{
if (myMutex.try_lock()) {
// locking succeeded - move all queued values and push the new value
std::move(testVector2.begin(), testVector2.end(), std::back_inserter(testVector));
testVector2.clear();
testVector.push_back(testSensor.getValue());
myMutex.unlock();
} else {
// locking failed - queue the value
testVector2.push_back(testSensor.getValue());
}
}
Of course publish() needs to lock the mutex, too.
void publish(std::vector<temperature> &inputVector)
{
std::lock_guard<std::mutex> lock(myMutex);
//this function would publish the values into a database
//via mqtt and also runs in a thread.
}
This seems like the perfect opportunity for an additional (shared) buffer or queue, that's protected by the lock.
main would be essentially as it is now, pushing your new values into the shared buffer.
The other thread would, when it can, lock that buffer and take the new values from it. This should be very fast.
Then, it does not need to lock the shared buffer while doing its database things (which take longer), as it's only working on its own vector during that procedure.
Here's some pseudo-code:
std::mutex pendingTempsMutex;
std::vector<temperature> pendingTemps;
void thread2()
{
std::vector<temperature> temps;
while (1)
{
// Get new temps if we have any
{
std::scoped_lock l(pendingTempsMutex);
temps.swap(pendingTemps);
}
if (!temps.empty())
publish(temps);
}
}
void thread1()
{
while (1)
{
std::scoped_lock l(pendingTempsMutex);
pendingTemps.push_back(testSensor.getValue());
/*
Or, if getValue() blocks:
temperature newValue = testSensor.getValue();
std::scoped_lock l(pendingTempsMutex);
pendingTemps.push_back(newValue);
*/
}
}
Usually you'd use a std::queue for pendingTemps though. I don't think it really matters in this example, because you're always consuming everything in thread 2, but it's more conventional and can be more efficient in some scenarios. It can't lose you much as it's backed by a std::deque. But you can measure/test to see what's best for you.
This solution is pretty much what you already proposed/explored in the question, except that the producer shouldn't be in charge of managing the second vector.
You can improve it by having thread2 wait to be "informed" that there are new values, with a condition variable, otherwise you're going to be doing a lot of busy-waiting. I leave that as an exercise to the reader ;) There should be an example and discussion in your multi-threaded programming book.

C++ multithreading application crashes

I'm programming a simple 3D rendering engine just to get more familliar with C++. Today I had my first steps with multithreading and already have a problem I cannot wrap my head around. When the application starts it generates a small, minecraft-like terrain consisting of cubes. They're generated withhin the main thread.
Now when I want to generate more chunks
void VoxelWorld::generateChunk(glm::vec2 chunkPosition) {
Chunk* generatedChunk = m_worldGenerator->generateChunk(chunkPosition);
generatedChunk->shader = m_chunkShader;
generatedChunk->generateRenderObject();
m_chunks[chunkPosition.x][chunkPosition.y] = generatedChunk;
m_loadedChunks.push_back(glm::vec2(chunkPosition.x, chunkPosition.y));
}
void VoxelWorld::generateChunkThreaded(glm::vec2 chunkPosition) {
std::thread chunkThread(&VoxelWorld::generateChunk, this, chunkPosition);
chunkThread.detach();
}
void VoxelWorld::draw() {
for(glm::vec2& vec : m_loadedChunks){
Transformation* transformation = new Transformation();
transformation->getPosition().setPosition(glm::vec3(CHUNK_WIDTH*vec.x, 0, CHUNK_WIDTH*vec.y));
m_chunks[vec.x][vec.y]->getRenderObject()->draw(transformation);
delete(transformation); //TODO: Find a better way
}
}
I have my member function (everything is non-static) generateChunk() which generates a Chunk and stores it in the VoxelWorld class. I have a 2D std::map<..> m_chunks which stores every chunk and a std::vector<glm::vec2> m_loadedChunks which stores the positions of the generated chunks.
Calling generateChunk() works fine as expected. But when I try generateChunkThreaded() the application crashes! I tried commenting out the last line of generateChunk(), then it does not crash. Thats what confuses me so much! m_loadedChunks ist just a regular std::vector. I tried making it public, with no effect. Is there anything obvious I miss?
You are accessing m_loadedChunks from several threads without synchronizing it.
You need to lock the usage of shared usages. So few tips here.
Declare a mutex as a member of the class
std::mutex mtx; // mutex for critical section
Use it to lock via a critical section each time you want to access the elements
std::lock_guard lock(mtx);
m_chunks[chunkPosition.x][chunkPosition.y] = generatedChunk;
m_loadedChunks.push_back(glm::vec2(chunkPosition.x, chunkPosition.y));
Hope that helps
When you have many threads access shared resources, you either have those resources available as read-only, atomic, or guarded with a mutex lock.
So, for your m_loadedChunks member variable, you would want to have it wrapped in a lock. For example:
class VoxelWorld
{
// your class members and more ...
private:
std::mutex m_loadedChunksMutex;
}
void VoxelWorld::generateChunk(glm::vec2 chunkPosition)
{
Chunk* generatedChunk = m_worldGenerator->generateChunk(chunkPosition);
generatedChunk->shader = m_chunkShader;
generatedChunk->generateRenderObject();
m_chunks[chunkPosition.x][chunkPosition.y] = generatedChunk;
{
auto&& scopedLock = std::lock_guard< std::mutex >(m_loadedChunksMutex);
(void)scopedLock;
m_loadedChunks.push_back(glm::vec2(chunkPosition.x, chunkPosition.y));
}
}
The scopedLock will automatically wait for a lock and when the code goes out of scope, the lock will be released.
Now note, that I have a mutex for m_loadedChunks and not a generic mutex covering all variables that may be accessed by threads. This is actually a good practice introduced by Herb Sutter in his "Effective Concurrency" courses and on his talks at cppcon.
So, for whatever shared variables you have, use the above example as one means to solve race issues.

Accessing and modifying automatic variables on another thread's stack

I want to pass some data around threads but want to refrain from using global variables if I can manage it. The way I wrote my thread routine has the user passing in a separate function for each "phase" of a thread's life cycle: For instance this would be a typical usage of spawning a thread:
void init_thread(void *arg) {
graphics_init();
}
void process_msg_thread(message *msg, void *arg) {
if (msg->ID == MESSAGE_DRAW) {
graphics_draw();
}
}
void cleanup_thread(void *arg) {
graphics_cleanup();
}
int main () {
threadCreator factory;
factory.createThread(init_thread, 0, process_msg_thread, 0, cleanup_thread, 0);
// even indexed arguments are the args to be passed into their respective functions
// this is why each of those functions must have a fixed function signature is so they can be passed in this way to the factory
}
// Behind the scenes: in the newly spawned thread, the first argument given to
// createThread() is called, then a message pumping loop which will call the third
// argument is entered. Upon receiving a special exit message via another function
// of threadCreator, the fifth argument is called.
The most straightforward way to do it is using globals. I'd like to avoid doing that though because it is bad programming practice because it generates clutter.
A certain problem arises when I try to refine my example slightly:
void init_thread(void *arg) {
GLuint tex_handle[50]; // suppose I've got 50 textures to deal with.
graphics_init(&tex_handle); // fill up the array with them during graphics init which loads my textures
}
void process_msg_thread(message *msg, void *arg) {
if (msg->ID == MESSAGE_DRAW) { // this message indicates which texture my thread was told to draw
graphics_draw_this_texture(tex_handle[msg->texturehandleindex]); // send back the handle so it knows what to draw
}
}
void cleanup_thread(void *arg) {
graphics_cleanup();
}
I am greatly simplifying the interaction with the graphics system here but you get the point. In this example code tex_handle is an automatic variable, and all its values are lost when init_thread completes, so will not be available when process_msg_thread needs to reference it.
I can fix this by using globals but that means I can't have (for instance) two of these threads simultaneously since they would trample on each other's texture handle list since they use the same one.
I can use thread-local globals but is that a good idea?
I came up with one last idea. I can allocate storage on the heap in my parent thread, and send a pointer to in to the children to mess with. So I can just free it when parent thread leaves away since I intend for it to clean up its children threads before it exits anyway. So, something like this:
void init_thread(void *arg) {
GLuint *tex_handle = (GLuint*)arg; // my storage space passed as arg
graphics_init(tex_handle);
}
void process_msg_thread(message *msg, void *arg) {
GLuint *tex_handle = (GLuint*)arg; // same thing here
if (msg->ID == MESSAGE_DRAW) {
graphics_draw_this_texture(tex_handle[msg->texturehandleindex]);
}
}
int main () {
threadCreator factory;
GLuint *tex_handle = new GLuint[50];
factory.createThread(init_thread, tex_handle, process_msg_thread, tex_handle, cleanup_thread, 0);
// do stuff, wait etc
...
delete[] tex_handle;
}
This looks more or less safe because my values go on the heap, my main thread allocates it then lets children mess with it as they wish. The children can use the storage freely since the pointer was given to all the functions that need access.
So this got me thinking why not just have it be an automatic variable:
int main () {
threadCreator factory;
GLuint tex_handle[50];
factory.createThread(init_thread, &tex_handle, process_msg_thread, &tex_handle, cleanup_thread, 0);
// do stuff, wait etc
...
} // tex_handle automatically cleaned up at this point
This means children thread directly access parent's stack. I wonder if this is kosher.
I found this on the internets: http://software.intel.com/sites/products/documentation/hpc/inspectorxe/en-us/win/ug_docs/olh/common/Problem_Type__Potential_Privacy_Infringement.htm
it seems Intel Inspector XE detects this behavior. So maybe I shouldn't do it? Is it just simply a warning of potential privacy infringement as suggested by the the URL or are there other potential issues that may arise that I am not aware of?
P.S. After thinking through all this I realize that maybe this architecture of splitting a thread into a bunch of functions that get called independently wasn't such a great idea. My intention was to remove the complexity of requiring coding up a message handling loop for each thread that gets spawned. I had anticipated possible problems, and if I had a generalized thread implementation that always checked for messages (like my custom one that specifies the thread is to be terminated) then I could guarantee that some future user could not accidentally forget to check for that condition in each and every message loop of theirs.
The problem with my solution to that is that those individual functions are now separate and cannot communicate with each other. They may do so only via globals and thread local globals. I guess thread local globals may be my best option.
P.P.S. This got me thinking about RAII and how the concept of the thread at least as I have ended up representing it has a certain similarity with that of a resource. Maybe I could build an object that represents a thread more naturally than traditional ways... somehow. I think I will go sleep on it.
Put your thread functions into a class. Then they can communicate using instance variables. This requires your thread factory to be changed, but is the cleanest way to solve your problem.
Your idea of using automatic variables will work too as long as you can guarantee that the function whose stack frame contains the data will never return before your child threads exit. This is not really easy to achieve, even after main() returns child threads can still run.

Is it safe to modify data of pointer in vector from another thread?

Things seem to be working but I'm unsure if this is the best way to go about it.
Basically I have an object which does asynchronous retrieval of data. This object has a vector of pointers which are allocated and de-allocated on the main thread. Using boost functions a process results callback is bound with one of the pointers in this vector. When it fires it will be running on some arbitrary thread and modify the data of the pointer.
Now I have critical sections around the parts that are pushing into the vector and erasing in case the asynch retrieval object is receives more requests but I'm wondering if I need some kind of guard in the callback that is modifying the pointer data as well.
Hopefully this slimmed down pseudo code makes things more clear:
class CAsyncRetriever
{
// typedefs of boost functions
class DataObject
{
// methods and members
};
public:
// Start single asynch retrieve with completion callback
void Start(SomeArgs)
{
SetupRetrieve(SomeArgs);
LaunchRetrieves();
}
protected:
void SetupRetrieve(SomeArgs)
{
// ...
{ // scope for data lock
boost::lock_guard<boost::mutex> lock(m_dataMutex);
m_inProgress.push_back(SmartPtr<DataObject>(new DataObject)));
m_callback = boost::bind(&CAsyncRetriever::ProcessResults, this, _1, m_inProgress.back());
}
// ...
}
void ProcessResults(DataObject* data)
{
// CALLED ON ANOTHER THREAD ... IS THIS SAFE?
data->m_SomeMember.SomeMethod();
data->m_SomeOtherMember = SomeStuff;
}
void Cleanup()
{
// ...
{ // scope for data lock
boost::lock_guard<boost::mutex> lock(m_dataMutex);
while(!m_inProgress.empty() && m_inProgress.front()->IsComplete())
m_inProgress.erase(m_inProgress.begin());
}
// ...
}
private:
std::vector<SmartPtr<DataObject>> m_inProgress;
boost::mutex m_dataMutex;
// other members
};
Edit: This is the actual code for the ProccessResults callback (plus comments for your benefit)
void ProcessResults(CRetrieveResults* pRetrieveResults, CRetData* data)
{
// pRetrieveResults is delayed binding that server passes in when invoking callback in thread pool
// data is raw pointer to ref counted object in vector of main thread (the DataObject* in question)
// if there was an error set the code on the atomic int in object
data->m_nErrorCode.Store_Release(pRetrieveResults->GetErrorCode());
// generic iterator of results bindings for generic sotrage class item
TPackedDataIterator<GenItem::CBind> dataItr(&pRetrieveResults->m_DataIter);
// namespace function which will iterate results and initialize generic storage
GenericStorage::InitializeItems<GenItem>(&data->m_items, dataItr, pRetrieveResults->m_nTotalResultsFound); // this is potentially time consuming depending on the amount of results and amount of columns that were bound in storage class definition (i.e.about 8 seconds for a million equipment items in release)
// atomic uint32_t that is incremented when kicking off async retrieve
m_nStarted.Decrement(); // this one is done processing
// boost function completion callback bound to interface that requested results
data->m_complete(data->m_items);
}
As it stands, it appears that the Cleanup code can destroy an object for which a callback to ProcessResults is in flight. That's going to cause problems when you deref the pointer in the callback.
My suggestion would be that you extend the semantics of your m_dataMutex to encompass the callback, though if the callback is long-running, or can happen inline within SetupRetrieve (sometimes this does happen - though here you state the callback is on a different thread, in which case you are OK) then things are more complex. Currently m_dataMutex is a bit confused about whether it controls access to the vector, or its contents, or both. With its scope clarified, ProcessResults could then be enhanced to verify validity of the payload within the lock.
No, it isn't safe.
ProcessResults operates on the data structure passed to it through DataObject. It indicates that you have shared state between different threads, and if both threads operate on the data structure concurrently you might have some trouble coming your way.
Updating a pointer should be an atomic operation, but you can use InterlockedExchangePointer (in Windows) to be sure. Not sure what the Linux equivalent would be.
The only consideration then would be if one thread is using an obsolete pointer. Does the other thread delete the object pointed to by the original pointer? If so, you have a definite problem.