Windows Threads: when should you use InterlockedExchangeAdd()?

Windows Threads: when should you use InterlockedExchangeAdd()? - c++

The naming of this function seems like this is some complicated stuff going on. When exactly does one know that this is the way to go instead of doing something like this:
Preparation
CRITICAL_SECTION cs;
int *p = malloc(sizeof(int)); // Allocation Site
InitializeCriticalSection(&cs); // HINT for first Write
Thread #1
{
*p = 1; // First Write
}
Thread #2
{
EnterCriticalSection(&cs);
*p = 2; // Second Write
LeaveCriticalSection(&cs);
}
I have a write that gets done in one thread:
Run()
{
// some code
m_bIsTerminated = TRUE;
// some more code
}
Then, I have a read that gets done in another thread (potentially at the same time):
Terminate()
{
// some code
if( m_bIsTerminated )
{
m_dwThreadId = 0;
m_hThread = NULL;
m_evExit.SetEvent();
return;
}
// even more code
}
What's the best solution to solve this race condition? Are critical sections the way to go or is the use of InterlockedExchangeAdd() more useful?

In your case, there's no race condition. The variable is never reset back to FALSE, is it? It's just a "please die" switch for the thread, right? Then no need for synchronization of any kind.
The InterlockedXXX family of functions makes use of Intel CPU's atomic 3-operand commands (XADD and CMPXCNG). So they're much cheaper than a critical section. And the one you want for thread-safe assignment is InterlockedCompareExchange().
UPD: and the mark the variable as volatile.

InterlockedExchangeAdd is used to add a value to an integer as an atomic operation, meaning that you won't have to use a critical section. This also removes the risk of a deadlock if one of your threads throws an exception - you need to make sure that you don't keep any lock of any kind as that would prevent other threads from acquiring that lock.
For your scenario you can definitely use an Interlocked...- function, but I would use an event (CreateEvent, SetEvent, WaitForSingleObject), probably because I often find myself needing to wait for more than one object (you can wait for zero seconds in your scenario).
Upd: Using volatile for the variable may work, however it isn't recommended, see: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html and http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags/c%2B%2B0x for instance.
If you want to be portable, take a look at boost::thread.

Make sure m_bIsTerminated is marked as volatile, and you should be ok. Although it seems pretty weird to me that you'd // some more code after setting "is terminated" to true. What exactly does that variable indicate?
Your "race condition" is that your various elements of // more code can execute in different orders. Your variable doesn't help that. Is your goal to get them to execute in a deterministic order? If yes, you'd need a condition variable to wait on one thread and set in another. If you just don't want them executing concurrently, a critical section would be fine.

Related

Join threads created recursively

I have a function that basically fetches a data from a database, then parse this data and fetches others data to which it is dependant, and so on...
The function is thus recursive, and I want to use multithreading to do so.
To simplify the problem, I just writed a dummy program, just for expressing the "spirit" of the function:
void DummyFunction(std::vector<std::thread>& threads, int& i)
{
++i;
if (i < 10)
threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
int main()
{
std::vector<std::thread> threads;
int i = 0;
DummyFunction(threads, i);
// Coming here, "DummyFunction" is still running and potentially creating new threads
// Issue is thus we may enter the for loop when we still don't have the actual number of threads created
for (std::thread& thread : threads)
{
thread.join();
}
}
The issue comes from the need to wait for all the threads to finish running before going any further (hence the for loop to join the threads). But of course, since the "DummyFunction" is still running, new threads can be created and so this way it can't work...
Question is, how can I design such thing properly (if there is a way...)? Can we actually use multi threading recursively?

If you have C++20 available consider using the new thread that automatically joins on destruction. It goes by the name jthread and will save you all the trouble from having to manually join threads.

Try a thought experiment: add an else clause to your if statement:
if (i < 10)
{
threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
else
{
// do something here
}
Once you make that change, a few minutes' worth of thinking will reach the following conclusion: the "do something here" part gets executed exactly once, in one of the execution threads, after all of the execution threads get created.
Now, the solution should be very obvious:
Add a mutex, a condition variable, and a boolean flag. You can either make them global; pass them as additional parameters into DummyFunction, or, better yet: turn your threads vector into its own class containing the vector, the mutex, the condition variable, and the boolean flag, and pass that in recursively instead of just the vector.
main() locks the mutex, clears the condition variable, and after DummyFunction() returns it waits on the condition variable until the boolean flag is set.
The "do something here" part locks the same mutex, sets the boolean flag, signals the condition variable, and unlocks the mutex.
Once you reach this point, you will also suddenly realize one more thing: as is, you have different execution threads all attempting to push_back something into the same vector. Vectors are not thread-safe, so this is undefined behavior. Therefore, you will also need to implement a separate mutex (or reuse the existing one, this looks eminently possible to me) to also lock the access to the vector.

Is mutex mandatory to access extern variable from a different thread?

I am developing an application in Qt/C++. At some point, there are two threads : one is the UI thread and the other one is the background thread. I have to do some operation from the background thread based on the value of an extern variable which is type of bool. I am setting this value by clicking a button on UI.
header.cpp
extern bool globalVar;
mainWindow.cpp
//main ui thread on button click
setVale(bool val){
globalVar = val;
}
backgroundThread.cpp
while(1){
if(globalVar)
//do some operation
else
//do some other operation
}
Here, writing to globalVar happens only when the user clicks the button whereas reading happens continuously.
So my question is :
In a situation like the one above, is mutex mandatory?
If read and write happens at the same time, does this cause the application to crash?
If read and write happens at same time, is globalVar going to have some value other than true or false?
Finally, does the OS provide any kind of locking mechanism to prevent the read/write operation to access a memory location at the same time by a different thread?

The loop
while(1){
if(globalVar)
//do some operation
else
//do some other operation
}
is busy waiting, which is extremely wasteful. Thus, you're probably better off with some classic synchronization that will wake the background thread (mostly) when there is something to be done. You should consider adapting this example of std::condition_variable.
Say you start with:
#include <thread>
#include <mutex>
#include <condition_variable>
std::mutex m;
std::condition_variable cv;
bool ready = false;
Your worker thread can then be something like this:
void worker_thread()
{
while(true)
{
// Wait until main() sends data
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{return ready;});
ready = false;
lk.unlock();
}
The notifying thread should do something like this:
{
std::lock_guard<std::mutex> lk(m);
ready = true;
}
cv.notify_one();

Since it is just a single plain bool, I'd say a mutex is overkill, you should just go for an atomic integer instead. An atomic will read and write in a single CPU clock so no worries there, and it will be lock free, which is always better if possible.
If it is something more complex, then by all means go for a mutex.
It won't crash from that alone, but you can get data corruption, which may crash the application.
The system will not manage that stuff for you, you do it manually, just make sure all access to the data goes through the mutex.
Edit:
Since you specify a number of times that you don't want a complex solution, you may opt for simply using a mutex instead of the bool. There is no need to protect the bool with a mutex, since you can use the mutex as a bool, and yes, you could go with an atomic, but that's what the mutex already does (plus some extra functionality in the case of recursive mutexes).
It also matters what is your exact workload, since your example doesn't make a lot of sense in practice. It would be helpful to know what those some operations are.
So in your ui thread you could simply val ? mutex.lock() : mutex.unlock(), and in your secondary thread you could use if (mutex.tryLock()) doStuff; mutex.unlock(); else doOtherStuff;. Now if the operation in the secondary thread takes too long and you happen to be changing the lock in the main thread, that will block the main thread until the secondary thread unlocks. You could use tryLock(timeout) in the main thread, depending on what you prefer, lock() will block until success, while tryLock(timeout) will prevent blocking but the lock may fail. Also, take care not to unlock from a thread other than the one you locked with, and not to unlock an already unlocked mutex.
Depending on what you are actually doing, maybe an asynchronous event driven approach would be more appropriate. Do you really need that while(1)? How frequently do you perform those operations?

In situation like above does mutex is necessary?
A mutex is one tool that will work. What you actually need are three things:
a means of ensuring an atomic update (a bool will give you this as it's mandated to be an integral type by the standard)
a means of ensuring that the effects of a write made by one thread is actually visible in the other thread. This may sound counter-intuitive but the c++ memory model is single-threaded and optimisations (software and hardware) do not need to consider cross-thread communication, and...
a means of preventing the compiler (and CPU!!) from re-ordering the reads and writes.
The answer to the implied question is 'yes'. You will need something at does all of these things (see below)
If read and write happend at the same time does this cause to crash the application?
not when it's a bool, but the program won't behave as you expect. In fact, because the program is now exhibiting undefined behaviour you can no longer reason about its behaviour at all.
If read and write happens at same time, is globalVar going to have some value other thantrue or false?
not in this case because it's an intrinsic (atomic) type.
And is it going to happen the access(read/write) of a memory location at same time by different thread, does OS providing any kind of locking mechanism to prevent it?
Not unless you specify one.
Your options are:
std::atomic<bool>
std::mutex
std::atomic_signal_fence

Realistically speaking, as long as you use an integer type (not bool), make it volatile, and keep inside of its own cache line by properly aligning its storage, you don't need to do anything special at all.
In situation like above does mutex is necessary?
Only if you want to keep the value of the variable synchronized with other state.
If read and write happed at the same time does this cause to crash the application?
According to C++ standard, it's undefined behavior. So anything can happen: e.g. your application might not crash, but its state might be subtly corrupted. In real life, though, compilers often offer some sane implementation defined behavior and you're fine unless your platform is really weird. Anything commonplace, like 32 and 64 bit intel, PPC and ARM will be fine.
If read and write happens at same time, is globalVar going to have some value other thantrue or false?
globalVar can only have these two values, so it makes no sense to speak of any other values unless you're talking about its binary representation. Yes, it could happen that the binary representation is incorrect and not what the compiler would expect. That's why you shouldn't use a bool but a uint8_t instead.
I wouldn't love to see such flag in a code review, but if a uint8_t flag is the simplest solution to whatever problem you're solving, I say go for it. The if (globalVar) test will treat zero as false, and anything else as true, so temporary "gibberish" is OK and won't have any odd effects in practice. According to the standard, you'll be facing undefined behavior, of course.
And is it going to happen the access(read/write) of a memory location at same time by different thread, does OS providing any kind of locking mechanism to prevent it?
It's not the OS's job to do that.
Speaking of practice, though: on any reasonable platform, the use of a std::atomic_bool will have no overhead over the use of a naked uint8_t, so just use that and be done.

Does a getter function need a mutex?

I have a class that is accessed from multiple threads. Both of its getter and setter functions are guarded with locks.
Are the locks for the getter functions really needed? If so, why?
class foo {
public:
void setCount (int count) {
boost::lock_guard<boost::mutex> lg(mutex_);
count_ = count;
}
int count () {
boost::lock_guard<boost::mutex> lg(mutex_); // mutex needed?
return count_;
}
private:
boost::mutex mutex_;
int count_;
};

The only way you can get around having the lock is if you can convince yourself that the system will transfer the guarded variable atomicly in all cases. If you can't be sure of that for one reason or another, then you'll need the mutex.
For a simple type like an int, you may be able to convince yourself this is true, depending on architecture, and assuming that it's properly aligned for single-instruction transfer. For any type that's more complicated than this, you're going to have to have the lock.

If you don't have a mutex around the getter, and a thread is reading it while another thread is writing it, you'll get funny results.

Is the mutex really only protecting a single int? It makes a difference -- if it is a more complex datatype you definitely need locking.
But if it is just an int, and you are sure that int is an atomic type (i.e., the processor will not have to do two separate memory reads to load the int into a register), and you have benchmarked the performance and determined you need better performance, then you may consider dropping the lock from both the getter and the setter. If you do that, make sure to qualify the int as volatile. And write a comment explaining why you do not have mutex protection, and under what conditions you would need it if the class changes.
Also, beware that you don't have code like this:
void func(foo &f) {
int temp = f.count();
++temp;
f.setCount(temp);
}
That is not threadsafe, regardless of whether you use a mutex or not. If you need to do something like that, the mutex protection has to be outside the setter/getter functions.

The synchronization concern is already covered in other answers (specifically David Schwartz's).
There's another concern I don't see addressed, though: this is usually a bad design.
Consider David's example code, assuming we have a correctly-synchronized version of foo
{
foo j;
some_func(j);
while (j.count() == 0)
{
// do we still expect (j.count() == 0) here?
bar();
}
}
The code suggests that the while condition still holds in the body. That's how single-threaded code works, after all.
But of course, even if we correctly synchronize the implementation of a getter, the setter can still be called from another thread, between our while condition succeeding and the first instruction of the loop body executing.
So, if any logic in the loop body can't depend on the condition being true, what was the point of testing it?
Sometimes it makes perfect sense, such as
while (foo.shouldKeepRunning())
{
// foo event loop or something
}
where it's OK if our shouldKeepRunning state changes during the loop body, because we only need to test it periodically. However, if you're going to do something with count, you need a longer-lived lock, and an interface to support it:
{
auto guard = j.lock_guard();
while (j.count(guard) == 0) // prove to count that we're locked
{
// now we _know_ count is zero in the body
// (but bar should release and re-acquire the lock or that can never change)
bar(j);
}
} // guard goes out of scope and unlocks

in you case probably not, if your cpu is 32 bit, however if count is a complex object or cpu needs more than one instruction to update its value, then yes

The lock is necessary to serialize access to shared resource. In your specific case you might get away with just atomic integer operations but in general, for larger objects that require more then one bus transaction, you do need locks to guarantee that reader always sees a consistent object.

It depends on the exact implementation of the object being locked. However, in general you do not want someone modifying (setting?) an object while someone else is in the process of reading (getting?) it. The easiest way to prevent that is to have a reader lock it.
In more complicated setups the lock will be implemented in such a way that any number of folks can read at once, but nobody can write to it while anyone is reading, and nobody can read while a write is going on.

They are really needed.
Imagine if you have an instance of class foo that's completely local to some piece of code. And you have something like this:
{
foo j;
some_func(j); // this stashes a reference to j where another thread can find it
while (j.count() == 0)
bar();
}
Suppose the optimizer looks carefully at the code to bar and sees that it can't possibly modify j.count_. This allows the optimizer to rewrite the code as follows:
{
foo j;
some_func(j); // this stashes a reference to j where another thread can find it
if (j.count() == 0)
{
while (1)
bar();
}
}
Clearly this is a disaster. Another thread might call j.setCount(5) and the thread wouldn't exit to loop.
The compiler can prove that bar can't modify the return value of j.count(). If it was required to assume that another thread could modify every memory value it accesses, it could never stash anything in a register ever, which would clearly be an untenable situation.
So, yes, the lock is needed. Alternatively, you need to use some other construct that provides similar guarantees.
Do not ever write code that relies on compilers not being able to make any optimization that they are permitted to make unless you really have no other practical choice. I have seen this cause a lot of pain over the many years I've been programming. Optimizers today can do things that would have been considered absurdly implausible a decade ago and lots of code lasts longer than you expect.

How to guarantee a function will not be entered again if it does not return within a thread?

I don't want the function to be entered simultaneously by multiple threads, neither do I want it to be entered again when it has not returned yet. Is there any approach to achieve my goal? Thank you very much!

Both goals can be achieved with a mutex semaphore.

Blocking the function from being entered by other threads while it's in progress on one thread is pretty straightforward as explained by the other answers. But if you want it to block in the same thread when it's already been entered... well, that's a deadlock.

Use a critical section (InitializeCriticalSection(), EnterCriticalSection(), LeaveCriticalSection() ) and also implement an entry counter. The critical section will guard against reentry from different threads and the entry counter will guard against reentry from the same thread.
To implement an entry counter use a common variable (boolean for your case) and a bracket class. Once you've already entered the critical section (and therefore no other thread will execute the same code in parallel) check the value of the variable. If it states that the function has been entered already - leave (first release the critical section, then leave the function). Otherwise construct your bracket class instance that will change the variable value. So the next time this thread enters the function it will check the variable, see that reentry has happened and leave. The destructor of the bracket class will change the variable to its original value once you leave the function.
It's wise to use bracket classes for both the critical section entry and for the entry counter changing so that your code is exception safe and all actions are performed in necessary order and regardless of how yoou leave the function - on exception or on return statement.

Since you're saying C++ and Windows, have a look at critical sections. You'll likely want to to wrap it in a couple of C++ classes, though, for ease of use.
Critical sections try spinlooping for a short duration, if the lock is already taken. For short pieces of code, this can often avoid doing a full blocking wait, and thus the overhead of user<>kernel mode etc.

f will only be called runs only, when nobody else is currently running it.
(This is concept demonstration with only Win32 calls)
void f();
err call_f()
{
static HMUTEX hMutex;
if( !hMutex )
{
hMutex = ::CreateMutex( 0, TRUE, 0 );
}
else
{
if( WaitForSingleObject( hMutex, 0 ) != WAIT_OBJECT_0 )
return ERR_ALREADY_RUNNING;
}
// calling f here
f();
ReleaseMutex( hMutex );
return S_OK;
}
Beware the minimal checking, the missing cleanup code for the mutex and the race-condition on first enter.

Generally you need to introduce a monitor, e.g. in Java by adding the "synchronized" keyword to your method signature.
(Am I right?)

You can do something like this:
int some_shared_var = 0;
...
for (;some_shared_var != rank;) ;
run_my_function();
some_shared_var++;
rank is your thread number (assumig you have threads with numbers 0 to size-1).
This is only example. Real implementation will be different. It depends on what library/functions you want to use to parallelize your code (fork, MPI etc). But I hope it gives you some useful thoughts.

C++ Thread question - setting a value to indicate the thread has finished

Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc

You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.

Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)

I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();

If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.

Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js