CreateThread's threadProc race condition

CreateThread's threadProc race condition - c++

I need to spawn 4 threads, that basically do the same thing but with a different variable each. So I call ::CreateThread 4 times, giving the same threadProc and 'this' as a parameter. Now in threadProc, I need to pick the right variable to work with. I have a vector of objects, and I push into it the object immediately after each of the CreateThread call.
// at this point myVec has say, 2 items
HANDLE hThread = ::CreateThread( NULL, NULL, threadProc, (LPVOID)this, NULL, NULL );
myVecObj.threadHandle = hThread;
myVec.push_back(myVecObj); // myVec.Size = 3 now
DWORD CALLBACK myClass::threadProc(LPVOID lpContext)
{
myClass *pMyClass = (myClass *)lpContext;
int vecCount = pMyClass->myVec.size; // Is this 3??
char * whatINeed = (char*)pMyClass->myVec[vecCount-1].whatINeed;
}
My doubt/question is how fast does the threadProc fire - could it beat the call to myVec.push_back()? Is this a race condition that I'm introducing here? I'm trying to make the assumption that when each threadProc starts (they start at different times, not one after the other), I can safely take the last object in the class' vector.

I need to spawn 4 threads, that basically do the same thing but with a different variable each. So I call ::CreateThread 4 times, giving the same threadProc and this as a parameter.
Now in threadProc, I need to pick the right variable to work with.
Why not pass the thread a pointer to the actual object it needs to act on?
I have a vector of objects, and I push into it the object immediately after each of the CreateThread call.
That is the wrong way to handle this. And yes, that is a race condition. Not only for the obvious reason - the thread might start running before the object is pushed - but also because any push into a vector can potentially reallocate the internal array of the vector, which would be very bad for threads that have already obtained a pointer to their data inside the vector. The data would move around in memory behind the thread's back.
To solve this, you need to either:
push all of the objects into the vector first, then start your threads. You can pass a pointer to each vector element to its respective thread. But this works only if you don't modify the vector anymore while any thread is running, for the reason stated above.
start the threads in a suspended state first, and then resume them after you have pushed all the objects into the vector. This also requires that you don't modify the vector anymore. It also means you will have to pass each thread an index to a vector element, rather than passing it a pointer to the element.
get rid of the vector altogether (or at least change it to hold object pointers instead of actual objects). Dynamically allocate your objects using new and pass those pointers to each thread (and optionally to the vector) as needed. Let each thread delete its object before exiting (and optionally remove it from the vector, with proper synchronizing).
My doubt/question is how fast does the threadProc fire
That is entirely up to the OS scheduler to decide.
could it beat the call to myVec.push_back()?
Yes, that is a possibility.
Is this a race condition that I'm introducing here?
Yes.
I'm trying to make the assumption
Don't make assumptions!
that when each threadProc starts (they start at different times, not one after the other), I can safely take the last object in the class' vector.
That is not a safe assumption to make.

There is no synchronisation between the modification of myVec, i.e., the myVec.push_back() call, and reading the size of the object in another thread. I do realise that you don't use standard threads but applying the C++11 rules there is a data race and the program has undefined behaviour.
Note that the data race isn't just theoretical: there is a fair chance that you see the modification happen after the read. Creating a thread may not be fast but some implementations actually don't create OS level threads but rather keep a pool of threads around which are used when apparently spawning a new thread.
In similar contexts I heard the excellent argument "... but it only happens once in a million times!". This particular issue would have happened on the 48 core machine about 10 times per second, assuming the estimate "once in a million" were correct.

Related

Are mutex locks necessary when modifying values?

I have an unordered_map and I'm using mutex locks for emplace and delete, find operations, but I don't use a mutex when modifying map's elements, because I don't see any point. but I'm curious whether I'm wrong in this case.
Should I use one when modifying element value?
std::unordred_map<std::string, Connection> connections;
// Lock at Try_Emplace
connectionsMapMutex.lock();
auto [element, inserted] = connections.try_emplace(peer);
connectionsMapMutex.unlock();
// No locks here from now
auto& connection = element->second;
// Modifying Element
connection.foo = "bar";

Consider what can happen when you have one thread reading from the map and the other one writing to it:
Thread A starts executing the command string myLocalStr = element->second.foo;
As part of the above, the std::string copy-constructor starts executing: it stores foo's character-buffer-pointer into a register, and starts dereferencing it to copy out characters from the original string's buffer to myLocalStr's buffer.
Just then, thread A's quantum expires, and thread B gains control of the CPU and executes the command connection.foo = "some other string"
Thread B's assignment-operator causes the std::string to deallocate its character-buffer and allocate a new one to hold the new string.
Thread A then starts running again, and continues executing the std::string copy-constructor from step 2, but now the pointer it is dereferencing to read in characters is no longer pointing at valid data, because Thread A deleted the buffer! Poof, Undefined Behavior is invoked, resulting in a crash (if you're lucky) or insidious data corruption (if you're unlucky, in which case you'll be spending several weeks trying to figure out why your program's data gets randomly corrupted only about once a month).
And note that the above scenario is just on a single-core CPU; on a multicore system there are even more ways for unsynchronized accesses to go wrong, since the CPUs have to co-ordinate their local and shared memory-caches correctly, which they won't know to do if there is no synchronization code included.
To sum up: Neither std::unordered_map nor std::string are designed for unsynchronized multithreaded access, and if you try to get away with it you're likely to regret it later on.

Here's what I would do, if and only if I'm threading and there's a chance other threads are manipulating the list and its contents.
I would create a mutex lock when manipulating the list (which you've done) or when traversing the list.
And if I felt it was necessary to protect an individual item in the list (you're calling methods on it), I'd give each one a distinct mutex. You could change element A and element B simultaneously and it's fine, but by using the local locks for each item, each is safe.
However, it's very rare I've had to be that careful.

What is the purpose of std::thread::swap?

From what I understand, std::thread::swap swaps the thread ids of two threads. It doesn't change thread local storage or affect execution of either of the threads in any way. What is the purpose behind it? How/where/why is it used?

Don't think of it as "swapping the thread IDs of two threads", think of it as swapping two thread variables:
std::thread a(...); // starts thread 1
std::thread b(...); // starts thread 2
a.swap(b); // or std::swap(a, b);
// now variable a points to thread 2
// and variable b points to thread 1
Most variables can be swapped with std::swap, even the ones that can't be copied. I'm not sure why they also created a swap member function for std::thread.
This is useful because you can, for example, sort a vector of threads. The sort algorithm will use swap to rearrange the threads in the vector. Remember that objects can't be moved in C++, only their contents. swap, and move semantics, allow you to move the contents of most objects into different objects. For thread, it moves the "pointer" to the thread.

std::thread is internally referential type - similar to the way for example std::vector is. Internally, std::vector too simply refers to a memory buffer stored in dynamic memory.
What you describe is how swapping works with such types. Swapping a vector merely swaps pointers to the buffers (as well as some book keeping information) - none of the stored objects are affected by the operation. These pointers are "handles" to a resource. Similarly, swapping threads swaps the internal thread id - which is a handle to the thread resource - without affecting the state of the referred threads.
As for purpose of swapping... It's simply a useful thing to do. It is most notably used in sorting and other re-ordering algorithms. I don't know of a particular reason to re-order thread objects, but neither do I know of a reason why that shouldn't be possible.

What happens to a thread in a vector when function execution ends?

I want to know more about std::thread, and specifically what will happen if I have a vector of threads, and one of the threads finishes executing.
Picture this example:
A vector of threads is created, which all execute the following function:
function_test(char* flag)
{
while(*flag == 1) { // Do Something
}
}
'char* flag' points to a flag signalling the function to stop execution.
Say, for example, the vector contains 10 threads, which are all executing. Then the flag is set to zero for thread number 3. (The 4th thread in the vector, as vector starts from zero.)
Good practice is to then join the thread.
vector_of_threads[3].join();
How many std::threads will the vector now contain? Can I re-start the finished thread with the same function again, or even a different function?
The reason for my question is that I have a vector of threads, and sometimes they will be required to stop executing, and then execution "falls off the end" of the function.
One solution to restart that thread would (I assume, perhaps incorrectly?) be to erase that element from the vector and then insert a new thread, which will then begin executing. Is this correct though, since when a thread stops, will it still be inside the vector? I assume it would be?
Edit
'function_test' is not allowed to modify any other functions flags. The flags are modified by their own function and the calling function. (For the purposes of this, imagine flag enables communication between main and the thread.)
Does this fix the data-race problem, or is it still an issue?

It's not specifically what you're asking about, but flag should be atomic<char>* or you have a data race, i.e. undefined behaviour. Also, if it only holds true or false I'd use atomic<bool>* and just test if (*flag).
As for your actual question:
How many std::threads will the vector now contain?
It will contains exactly the same number as it did previously, but one of them is no longer "joinable" because it doesn't represent a running thread. When the thread stops running it doesn't magically alter the vector to remove an element, it doesn't even know the vector exists! The only change visible in the main thread is that calling vector_of_threads[3].join() will not block and will return immediately, because the thread has already finished so you don't have to wait to join it.
You could erase the joined std::thread from the vector and insert a new one, but another alternative is to assign another std::thread to it that represents a new thread of execution:
vector_of_threads[3] = std::thread(f, &flags[3]);
Now vector_of_threads[3] represents a running thread and is "joinable" again.

Preventing variables from going out of scope so they persist for another thread

I have a function that creates a bunch of local variables, then passes their addresses to a function that runs in a separate thread - something like this:
void MyFunction()
{
MyClass a;
AnotherClass b;
...
FinalClass z;
CallFunctionInNewThread(&a,&b,&c,...,&z);
}
Of course, these variables are destroyed when the MyFunction goes out of scope (so the function in a thread is now pointing to garbage), so this setup doesn't work. What are my options here? If I allocate the variables on the heap with 'new', I will never get a chance to delete them. If I make them smart pointers or similar, I'd have to make the threaded function accept them as smart pointers, or their reference count will not be increased so they will still get destroyed immediately. It seems like they kind of want to be member variables of a wrapper class of MyFunction, but there are a few hundred lines and tens of these things and that would just be crazy messy. Are there any other choices?

Are there any other choices?
Simply copy (if trivial) or move/swap the data (if heavy to create) -- similar to transferring ownership from one thread to the other. Seems Thread A really does not need a reference from the description. Bonus: This removes concurrent access complexities from your program.

One little trick you can do is to pass a semaphore object into the thread function and then wait for that semaphore to be signaled. You do need to check that the thread was created successfully.
The new thread first makes local copies of the values (or references in the case of smart pointers), then signals the semaphore and carries on.
The calling thread can then continue and drop those objects off its stack without interfering with your new thread. It can even delete the semaphore object since it is no longer required by either thread.
It does mean that the calling thread has to wait until the thread is started and has copied its data, but that probably will be a short time. If you are going to the effort of spawning a thread to do any work at all, then this slight delay in the parent thread ought to be acceptable.

creating more than 1000 threads using pthread_create()

I'm trying to create 1000 threads using the pthread_create() function.
This is the statement I'm using:
for (int i=0 ; i <1000; i++)
{
retValue = pthread_create(&threadId, NULL, simplethreadFunction, NULL);
}
Everytime this for-loop runs does it create a new thread?
This is a simple thing. But I'm unable to understand it.

Everytime this for-loop runs does it create a new thread?
Yes, it does.
This is a simple thing. But I'm unable to understand it.
I will add a few more points:
First parameter to the function pthread_create is pointer type to pthread_t. Basically you are passing an address to this function, which this function uses to assign 'something'.
When this function creates a thread, an 'opaque, unique identifier' for this thread is created and the pointer you passed is made to point to this location, so that you can access it later, if required.
If you will pass the same pointer all the 1000 times, you will have access to the unique identifier for only one (the last one) thread created out of all 1000, because each time the previous value will get over written.
This unique value is required if you would want to perform further operations on a thread (like joining etc).
For details about this function and other thread related functions you can go though this and this.
Don't forget to call pthread_exit in your main context, otherwise complete program (including the created threads) might terminate even before all your threads would have finished.
Also regarding the time, this thing might not have any effect on time of creation as far as I think, will just reduce the usability of threads you have created. Also, this time you are calculating is not THE time for creating 1000 threads, will depend on lot of other factors like platform/implementation etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js