Best way to copy array(float) to local array(float) - c++

Okay so I have a structure which continuously updates in a seperate thread.
Now I need some of these variables locally somewhere without them changing in between.
I first did this to get them locally which obviously isn't the best method but it worked.
float MyFloatArray[3];
MyFloatArray[0] = otherThread()->floatArray[0];
MyFloatArray[1] = otherThread()->floatArray[1];
MyFloatArray[2] = otherThread()->floatArray[2];
Now I was wondering if there is a better way to do this.
I already tried the following:
float MyFloatArray = otherThread()->floatArray;
float* MyFloatArray = otherThread()->floatArray; //Works but updates the otherThread array(Obviously) but that shouldn't happen
Since I have a decently big project it'll be a lot of work to update all these to std::array<float,3>
Is there any alternative? Otherwise I will update all my float arrays to std::array<float,3> since it's a lot cleaner if there is no alternative.

You could simply call std::copy, making sure the copy is guarded by a synchronisation mechanism such as a mutex. For example:
std::mutex m; // otherThread() must lock this mutex when modifying array
{
std::lock_guard<std::mutex> lock(m);
std::copy(otherThread()->floatArray, otherThread()->floatArray + 3, MyLoatArray);
}
or use a copyable type, such as std::array<float, 3> and use assignment. Again, this has to be protected with a synchronisation mechanism:
std::mutex m; // otherThread() must lock this mutex when modifying array
{
std::lock_guard<std::mutex> lock(m);
MyFloatArray = otherThread()->floatArray;
}

What you need is an atomic copy operation. Unfortunately, that doesn't exist for entire structures, so you will have to use a mutex to lock accesses to the structure for the duration of your copy operation (and, in the other thread, for the duration of modifications to the structure).
Then you can either stick with your element-wise assignment, or switch to std::copy; it doesn't really matter. Fundamentally the latter is still going to compile down to an element-wise assignment. No matter what syntax you use, your CPU still has to copy a series of bytes and it cannot do that in a single, atomic operation. But as long as your reads and writes to the structure are protected by a mutex, you'll be fine.

Related

How to avoid destroying and recreating threads inside loop?

I have a loop with that creates and uses two threads. The threads always do the same thing and I'm wondering how they can be reused instead of created and destroyed each iteration? Some other operations are do inside the loop that affect the data the threads process. Here is a simplified example:
const int args1 = foo1();
const int args2 = foo2();
vector<string> myVec = populateVector();
int a = 1;
while(int i = 0; i < 100; i++)
{
auto func = [&](const vector<string> vec){
//do stuff involving variable a
foo3(myVec[a]);
}
thread t1(func, args1);
thread t2(func, args2);
t1.join();
t2.join();
a = 2 * a;
}
Is there a way to have t1 and t2 restart? Is there a design pattern I should look into? I ask because adding threads made the program slightly slower when I thought it would be faster.
You can use std::async as suggested in the comments.
What you're also trying to do is a very common usage for a Threadpool. I simple header only implementation of which I commonly utilize is here
To use this library, create the pool outside of the loop with a number of threads set during construction. Then enqueue a function in which a thread will go off and execute. With this library, you'll be getting a std::future (much like the std::async steps) and this is what you'd wait on in your loop.
Generically, you'd want to make access to any data thread-safe with mutexs (or other means, there are a lot of ways to do this) but under very specific situations, you'll not need to.
In this case,
so long as the vector isn't being increased in size (doesn't need to reallocate)
Only reading items or only modifying each item at a time in its own thread
the you wouldn't need to worry about synchronization.
Though its just good habit to do the sync anyways... When other people eventually modify the code, they're not going to know your rules and will cause issues.

C++ multithreading application crashes

I'm programming a simple 3D rendering engine just to get more familliar with C++. Today I had my first steps with multithreading and already have a problem I cannot wrap my head around. When the application starts it generates a small, minecraft-like terrain consisting of cubes. They're generated withhin the main thread.
Now when I want to generate more chunks
void VoxelWorld::generateChunk(glm::vec2 chunkPosition) {
Chunk* generatedChunk = m_worldGenerator->generateChunk(chunkPosition);
generatedChunk->shader = m_chunkShader;
generatedChunk->generateRenderObject();
m_chunks[chunkPosition.x][chunkPosition.y] = generatedChunk;
m_loadedChunks.push_back(glm::vec2(chunkPosition.x, chunkPosition.y));
}
void VoxelWorld::generateChunkThreaded(glm::vec2 chunkPosition) {
std::thread chunkThread(&VoxelWorld::generateChunk, this, chunkPosition);
chunkThread.detach();
}
void VoxelWorld::draw() {
for(glm::vec2& vec : m_loadedChunks){
Transformation* transformation = new Transformation();
transformation->getPosition().setPosition(glm::vec3(CHUNK_WIDTH*vec.x, 0, CHUNK_WIDTH*vec.y));
m_chunks[vec.x][vec.y]->getRenderObject()->draw(transformation);
delete(transformation); //TODO: Find a better way
}
}
I have my member function (everything is non-static) generateChunk() which generates a Chunk and stores it in the VoxelWorld class. I have a 2D std::map<..> m_chunks which stores every chunk and a std::vector<glm::vec2> m_loadedChunks which stores the positions of the generated chunks.
Calling generateChunk() works fine as expected. But when I try generateChunkThreaded() the application crashes! I tried commenting out the last line of generateChunk(), then it does not crash. Thats what confuses me so much! m_loadedChunks ist just a regular std::vector. I tried making it public, with no effect. Is there anything obvious I miss?
You are accessing m_loadedChunks from several threads without synchronizing it.
You need to lock the usage of shared usages. So few tips here.
Declare a mutex as a member of the class
std::mutex mtx; // mutex for critical section
Use it to lock via a critical section each time you want to access the elements
std::lock_guard lock(mtx);
m_chunks[chunkPosition.x][chunkPosition.y] = generatedChunk;
m_loadedChunks.push_back(glm::vec2(chunkPosition.x, chunkPosition.y));
Hope that helps
When you have many threads access shared resources, you either have those resources available as read-only, atomic, or guarded with a mutex lock.
So, for your m_loadedChunks member variable, you would want to have it wrapped in a lock. For example:
class VoxelWorld
{
// your class members and more ...
private:
std::mutex m_loadedChunksMutex;
}
void VoxelWorld::generateChunk(glm::vec2 chunkPosition)
{
Chunk* generatedChunk = m_worldGenerator->generateChunk(chunkPosition);
generatedChunk->shader = m_chunkShader;
generatedChunk->generateRenderObject();
m_chunks[chunkPosition.x][chunkPosition.y] = generatedChunk;
{
auto&& scopedLock = std::lock_guard< std::mutex >(m_loadedChunksMutex);
(void)scopedLock;
m_loadedChunks.push_back(glm::vec2(chunkPosition.x, chunkPosition.y));
}
}
The scopedLock will automatically wait for a lock and when the code goes out of scope, the lock will be released.
Now note, that I have a mutex for m_loadedChunks and not a generic mutex covering all variables that may be accessed by threads. This is actually a good practice introduced by Herb Sutter in his "Effective Concurrency" courses and on his talks at cppcon.
So, for whatever shared variables you have, use the above example as one means to solve race issues.

Can I lock multiple variables simultaneously?

I'm asking a question about multithreading.
Say I have two global vectors,
std::vector<MyClass1*> vec1
and
std::vector<MyClass2*> vec2.
In addition, I have a total number of 4 threads which have access to vec1 and vec2. Can I write code as follows ?
void thread_func()
// this is the function that will be executed by a thread
{
MyClass1* myObj1 = someFunction1();
MyClass2* myObj2 = someFunction2();
// I want to push back vec1, then push back vec2 in an atomic way
pthread_mutex_lock(mutex);
vec1.push_back(myObj1);
vec2.push_back(myObj2);
pthread_mutex_unlock(mutex);
}
for(int i=0; i<4; i++)
{
pthread_t tid;
pthread_create(&tid, NULL, thread_func, NULL);
}
What I want to do is that, I want to perform push_back on vec1 followed by push_back on vec2.
I'm a newbie and I have a feeling that one can only lock on one variable with a mutex. In other words, one can only put either vec1.push_back(myObj1) or vec2.push_back(myObj2) in between pthread_mutex_lock(mutex) and pthread_mutex_unlock(mutex).
I don't know if my code above is correct or not. Can someone correct me if I'm wrong?
Your code is correct. The mutex is the thing being locked, not the variable(s). You lock the mutex to protect a piece of code from being executed by more than one thread, most commonly this is to protect data but in general it's really guarding a section of code.
Yes, you can write like this but there are a few techniques you should definitely consider:
Scoped lock pattern for exception-safety and better robustness in general.
This is nicely explained in this answer
Avoid globals to let optimizer work smarter for you. Try to group data into logical classes and implement locking inside it's methods. Smaller scope of variables also gives you better extensibility.

Proper compiler intrinsics for double-checked locking?

When implementing double-checked locking, what is the proper way to do the memory and/or compiler barriers when implementing double-checked locking for initialization?
Something like std::call_once isn't what I want; it's way too slow. It's typically just implemented on top of pthread_mutex_lock and EnterCriticalSection respective to OS.
In my programs, I often run into initialization cases where the initialization is safe to repeat, as long as exactly one thread gets to set the final pointer. If another thread beats it to setting the final pointer to the singleton object, it deletes what it created and makes use of the other thread's. I also often use this in cases where it doesn't matter which thread "wins" because they all come up with the same result.
Here's an unsafe, overly-contrived example, using Visual C++ intrinsics:
MyClass *GetGlobalMyClass()
{
static MyClass *const UNSET_POINTER = reinterpret_cast<MyClass *>(
static_cast<intptr_t>(-1));
static MyClass *volatile s_object = UNSET_POINTER;
if (s_object == UNSET_POINTER)
{
MyClass *newObject = MyClass::Create();
if (_InterlockedCompareExchangePointer(&s_object, newObject,
UNSET_POINTER) != UNSET_POINTER)
{
// Another thread beat us. If Create didn't return null, destroy.
if (newObject)
{
newObject->Destroy(); // calls "delete this;", presumably
}
}
}
return s_object;
}
On a weakly-ordered memory architecture, my understanding is that it's possible that the new value of s_object is visible to other threads before other variables written inside MyClass::Create or MyClass::MyClass are visible. Also, the compiler itself could arrange the code this way in the absence of a compiler barrier (in Visual C++, _WriteBarrier, but _InterlockedCompareExchange acts as a barrier).
Do I need like a store fence intrinsic function in there or something in order to ensure that MyClass's variables are visible to all threads before s_object becomes somethings besides -1?
Fortunately, the rules in C++ are very simple:
If there is a data race, the behaviour is undefined.
In you code the data race is caused by the following read, which conflicts with the write operation in __InterlockedCompareExchangePointer.
if (s_object.m_void == UNSET_POINTER)
A thread-safe solution without blocking might look as follows. Note that on x86 a load operation with sequential consistency has basically no overhead compared to a regular load operation. If you care about other architectures, you can also use acquire release instead of sequential consistency.
static std::atomic<MyClass*> s_object{nullptr};
MyClass* o = s_object.load(std::memory_order_seq_cst);
if (o == nullptr) {
o = new MyClass{...};
MyClass* expected = nullptr;
if (!s_object.compare_exchange_strong(expected, o, std::memory_order_seq_cst)) {
delete o;
o = expected;
}
}
return o;
For a proper C++11 implementation any function-local static variable will be constructed in a thread-safe fashion by the first thread passing through this variable.

Do I need to use volatile keyword if I declare a variable between mutexes and return it?

Let's say I have the following function.
std::mutex mutex;
int getNumber()
{
mutex.lock();
int size = someVector.size();
mutex.unlock();
return size;
}
Is this a place to use volatile keyword while declaring size? Will return value optimization or something else break this code if I don't use volatile? The size of someVector can be changed from any of the numerous threads the program have and it is assumed that only one thread (other than modifiers) calls getNumber().
No. But beware that the size may not reflect the actual size AFTER the mutex is released.
Edit:If you need to do some work that relies on size being correct, you will need to wrap that whole task with a mutex.
You haven't mentioned what the type of the mutex variable is, but assuming it is an std::mutex (or something similar meant to guarantee mutual exclusion), the compiler is prevented from performing a lot of optimizations. So you don't need to worry about return value optimization or some other optimization allowing the size() query from being performed outside of the mutex block.
However, as soon as the mutex lock is released, another waiting thread is free to access the vector and possibly mutate it, thus changing the size. Now, the number returned by your function is outdated. As Mats Petersson mentions in his answer, if this is an issue, then the mutex lock needs to be acquired by the caller of getNumber(), and held until the caller is done using the result. This will ensure that the vector's size does not change during the operation.
Explicitly calling mutex::lock followed by mutex::unlock quickly becomes unfeasible for more complicated functions involving exceptions, multiple return statements etc. A much easier alternative is to use std::lock_guard to acquire the mutex lock.
int getNumber()
{
std::lock_guard<std::mutex> l(mutex); // lock is acquired
int size = someVector.size();
return size;
} // lock is released automatically when l goes out of scope
Volatile is a keyword that you use to tell the compiler to literally actually write or read the variable and not to apply any optimizations. Here is an example
int example_function() {
int a;
volatile int b;
a = 1; // this is ignored because nothing reads it before it is assigned again
a = 2; // same here
a = 3; // this is the last one, so a write takes place
b = 1; // b gets written here, because b is volatile
b = 2; // and again
b = 3; // and again
return a + b;
}
What is the real use of this? I've seen it in delay functions (keep the CPU busy for a bit by making it count up to a number) and in systems where several threads might look at the same variable. It can sometimes help a bit with multi-threaded things, but it isn't really a threading thing and is certainly not a silver bullet