Whats the proper way to flag a thread to exit using boost without c++11

Whats the proper way to flag a thread to exit using boost without c++11 - c++

After reading various answers on how volatile should not be used to flag a running thread to exit, (And the suggestions to use boost:atomic<>) I still cannot find an answer on how to properly do this using boost without C++11.
Should I use boost::mutex?
If so, do I need to lock on my m_stopThread variable where I change it to true and in my run loop where I check it?
Is boost::mutex lock call going to make a call into the operating system or is it lighter just using memory barriers instructions etc.?

I suppose it is only necessary to call something to issue write mamory barrier after setting and read memory barrier before testing. It may be atomic operation, mutex access or anything else.
(I suppose that even entering different mutexes will be Ok :)
If you don't hurry, you may do nothing, because the right barrier instruction should be issued somewhen in the future (at least when hardware interrupt occurs).
Of course, m_stopThread should be declared volatile
(Although I may be wrong from the Stadard viewpoint)

Related

Is C++ std::atomic compatible with pthreads?

I have 2 pthread threads where one is writing a bool value and another is reading it.
I dont care for portability. Its x86 architecture. The only which concerns me is writing thread sets bool to true and starts doing its own work (which happens once a day at midnight) closing a file. And the other thread had read the bool as false and proceeds with its work (writing to a file) at the same time. Its very difficult to reproduce this scenario so I better get best possible theoretical solution.
Can I use std::atomic in case of pthreads?

Can I use std::atomic in case of pthreads?
Yes, that's what std::atomic is for.
It works with std::thread, POSIX threads, and any other kind of threads. Behind the scenes it uses "magical" compiler annotations to prevent certain thread-incompatible optimizations, and processor-specific locking instructions to guarantee that thread-safe code is generated1.
It makes (almost) no sense to use std::atomic without threads (you could use std::atomic instead of volatile for signal handlers, but there is no advantage in doing so).
The only which concerns me ...
The rest of your question makes no sense to me.
1 When used correctly, which is often non-trivial thing to do, and which is why you generally should try not to use std::atomic unless you are an expert.

Efficient Memory Barriers

I have a multithreaded application, where each thread has a variable of integer type. These variables are incremented during execution of the program. At certain points in the code, a thread compares its counting variable with those of the other threads.
Now since, we know that threads running on multicore might execute out of order, a thread might not read the expected counter values of the other threads. To solve this problem, one way is to use atomic variable, such as std::atomic<> of C++11. However, performing a memory fence at each increment of counters will significantly slow down the program.
Now what I want to do is that when a thread is about to read other thread's counter, only then a memory fence is created and counters of all the threads are updated in the memory at that point. How can this be done in C++. I am using Linux and g++.

The C++11 standard library includes support for fences in <atomic> with std::atomic_thread_fence.
Calling this invokes a full fence:
std::atomic_thread_fence(std::memory_order_seq_cst);
If you want to emit only an acquire or only a release fence, you can use std:memory_order_acquire and std::memory_order_release instead.

There are x86 intrinsics that correspond to memory barriers that you can use yourself. The Windows header has a memory barrier macro, so you should be able to find something equivalent for Linux.

You can use boost::asio::strand for this exact purpose. Create a handler responsible for reading the counter. That handler can be called from multiple threads. Instead of directly calling the handler, wrap it inside a boost::asio::strand. This will ensure the handler can not be concurrently called by multiple threads.
http://www.boost.org/doc/libs/1_35_0/doc/html/boost_asio/tutorial/tuttimer5.html
I hope I understood the question right.

My suggestion would be to have a collectTimers() function in a higher level class that can ask each thread for its counter (via queue/msg). This way updating timers are not delayed, but collecting timers is a bit slower.
This only works if you have some kind of communication mechanism between the threads.

And why not having a "control" thread, to whom each thread reports its counter increments and ask for the values of others ?
It would make it very efficient and simple. Just a suggestion.

You could try something like the signal-theft limit counter design in Secion 4.4.3 of http://mirror.nexcess.net/kernel.org/linux/kernel/people/paulmck/perfbook/perfbook.2011.08.28a.pdf
This kind of design can eliminate the atomic operations from the fastpath (incrementing the per-thread counter). Whether the complexity is worth it is up to you to decide, of course.

Do I need Read-Write lock here

I have writing a multi threaded code. I am not sure, whether I would need a read and write lock mechanism. Could you please go through the usecase and tell me do I have to use read-write lock or just normal mutex will do.
Use case:
1) Class having two variables. These are accessed by every thread before doing operation.
2) When something goes wrong, these variables are updated to reflect the error scenarios.
Thus threads reading these variables can take different decisions (including abort)
Here, in second point, I need to update the data. And in first point, every thread will use the data. So, my question is do I have to use write lock while updating data and read lock while reading the data. (Note: Both variables are in memory. Just a boolean flag & string)
I am confused because as my both vars are in memory. So does OS take care when integrity. I mean I can live with 1 or 2 threads missing the updated value when some thread is writing the data in mutex.
Please tell if I am right or wrong? Also please tell If I have to use read-write lock or just normal mutex would do.
Update: I am sorry that I did not give platform and compiler name. I am on RHEL 5.0 and using gcc 4.6. My platform is x86_64. But I don not want my code to be OS specific because we are going to port the code shortly to Solaris 10.

First off, ignore those other answerers talking about volatile. Volatile is almost useless for multithreaded programming, and any false sense of safety given by it is just that - false.
Now, whether you need a lock depends on what you're doing with these variables. You will need a memory barrier at least (locks imply one).
So let's give an example:
One flag is an error flag. If zero, you continue, otherwise, you abort.
Another flag is a diagnostic code flag. It gives the precise reason for the error.
In this case, one option would be to do the following:
Read the error flag without a lock, but with read memory barriers after the read.
When an error occurs, take a lock, set the diagnostic code and error flags, then release the lock. If the diagnostic code is already set, release the lock immediately.
The memory barriers are needed, as otherwise the compiler (or CPU!) may choose to cache the same result for every read.
Of course, if the semantics of your two variables are different, the answer may vary. You'll need to be more specific.
Note that the exact mechanism for specifying locks and memory barriers depends on the compiler. C++0x provides a portable mechanism, but few compilers fully implement the C++0x standard yet. Please specify your compiler and OS for a more detailed answer.
As for your output data, you will almost certainly need a lock there. Try to avoid taking these locks too often though, as too much lock contention will kill your performance.

If they are atomic variables (C1x stdatomic.h or C++0x atomic), then you don't need read/write locks. With earlier C/C++ standards, there's no portable way of using multiple threads at all, so you need to look into how the implementation you are using does things. In most implementations, data types that can be accessed with a single machine instruction are atomic.
Note that just having an atomic variable is not enough -- you probably also need to declare it as volatile to guarantee that the compiler does not do things that will cause you to miss updates from other threads.

Thus threads reading these variables can take different decisions (including abort)
So each thread need to ensure that it reads the updated data. Also since the variables are shared, you need to take care about the race condition as well.
So in short - you need to use read/write locks when reading and writing to these shared variables.
See if you can use volatile variables - that should save you from using locks when you read the values (however write should still be with locks). This is applicable only because you said that -
I mean I can live with 1 or 2 threads missing the updated value when some thread is writing the data in mutex

WaitForSingleObject is not locking, Still allowing other threads to change value in C++

I'm trying to use WaitForSingleObject(fork[leftFork], Infinite); to lock a variable using multiple threads but it doesn't seem to lock anything
I set the Handle fork[5] and then use the code below but it doesn't seem to lock anything.
while(forks[rightFork] == 0 || forks[leftFork] == 0) Sleep(0);
WaitForSingleObject(fork[leftFork], INFINITE);
forks[leftFork]--;
WaitForSingleObject(fork[rightFork], INFINITE);
forks[rightFork]--;
I have tried as a WaitForMultipleObjects as well and same result. When I create the mutex I use fork[i]= CreateMutex(NULL, FALSE,NULL);
I was wondering if this is only good for each thread or do they share it?

First of all, you haven't shown enough code for us to be able to help you with any great certainty of correctness. But, having made that proviso, I'm going to try anyway!
Your use of the word fork suggests to me that you are approaching Windows threading from a pthreads background. Windows threads are a little different. Rather confusingly, the most effective in-process mutex object in Windows is not the mutex, it is in fact the critical section.
The interface for the critical section is much simpler to use, it being essentially an acquire function and a corresponding release function. If you are synchronizing within a single process, and you need a simple lock (rather than, say, a semaphore), you should use critical sections rather than mutexes.
In fact, only yesterday here on Stack Overflow, I wrote a more detailed answer to a question which described the standard usage pattern for critical sections. That post has lots of links to the pertinent sections of MSDN documentation.
Having said that, it would appear that all you are trying to do is to synchronize the decrementing of an array of integer values. If that is so then you can do this most simply in a lock free manner with InterlockIncrement or one of its friends.
You only need to use a mutex when you are performing cross process synchronization. Indeed you should only use a mutex when you are synchronizing across a process because critical sections perform so much better (i.e. faster). Since you are updating a simple array here, and since there is no obviously visible IPC going on, I can only conclude that this really is in-process.
If I'm wrong and you really are doing cross-process work and require a mutex then we would need to see more code. For example I don't see any calls to ReleaseMutex. I don't know exactly how you are creating your mutexes.
If this doesn't help, please edit your question to include more code, and also a high level overview of what you are trying to achieve.

Overhead of pthread mutexes?

I'm trying to make a C++ API (for Linux and Solaris) thread-safe, so that its functions can be called from different threads without breaking internal data structures. In my current approach I'm using pthread mutexes to protect all accesses to member variables. This means that a simple getter function now locks and unlocks a mutex, and I'm worried about the overhead of this, especially as the API will mostly be used in single-threaded apps where any mutex locking seems like pure overhead.
So, I'd like to ask:
do you have any experience with performance of single-threaded apps that use locking versus those that don't?
how expensive are these lock/unlock calls, compared to eg. a simple "return this->isActive" access for a bool member variable?
do you know better ways to protect such variable accesses?

All modern thread implementations can handle an uncontended mutex lock entirely in user space (with just a couple of machine instructions) - only when there is contention, the library has to call into the kernel.
Another point to consider is that if an application doesn't explicitly link to the pthread library (because it's a single-threaded application), it will only get dummy pthread functions (which don't do any locking at all) - only if the application is multi-threaded (and links to the pthread library), the full pthread functions will be used.
And finally, as others have already pointed out, there is no point in protecting a getter method for something like isActive with a mutex - once the caller gets a chance to look at the return value, the value might already have been changed (as the mutex is only locked inside the getter method).

"A mutex requires an OS context switch. That is fairly expensive. "
This is not true on Linux, where mutexes are implemented using something called futex'es. Acquiring an uncontested (i.e., not already locked) mutex is, as cmeerw points out, a matter of a few simple instructions, and is typically in the area of 25 nanoseconds w/current hardware.
For more info:
Futex
Numbers everybody should know

This is a bit off-topic but you seem to be new to threading - for one thing, only lock where threads can overlap. Then, try to minimize those places. Also, instead of trying to lock every method, think of what the thread is doing (overall) with an object and make that a single call, and lock that. Try to get your locks as high up as possible (this again increases efficiency and may /help/ to avoid deadlocking). But locks don't 'compose', you have to mentally at least cross-organize your code by where the threads are and overlap.

I did a similar library and didn't have any trouble with lock performance. (I can't tell you exactly how they're implemented, so I can't say conclusively that it's not a big deal.)
I'd go for getting it right first (i.e. use locks) then worry about performance. I don't know of a better way; that's what mutexes were built for.
An alternative for single thread clients would be to use the preprocessor to build a non-locked vs locked version of your library. E.g.:
#ifdef BUILD_SINGLE_THREAD
inline void lock () {}
inline void unlock () {}
#else
inline void lock () { doSomethingReal(); }
inline void unlock () { doSomethingElseReal(); }
#endif
Of course, that adds an additional build to maintain, as you'd distribute both single and multithread versions.

I can tell you from Windows, that a mutex is a kernel object and as such incurs a (relatively) significant locking overhead. To get a better performing lock, when all you need is one that works in threads, is to use a critical section. This would not work across processes, just the threads in a single process.
However.. linux is quite a different beast to multi-process locking. I know that a mutex is implemented using the atomic CPU instructions and only apply to a process - so they would have the same performance as a win32 critical section - ie be very fast.
Of course, the fastest locking is not to have any at all, or to use them as little as possible (but if your lib is to be used in a heavily threaded environment, you will want to lock for as short a time as possible: lock, do something, unlock, do something else, then lock again is better than holding the lock across the whole task - the cost of locking isn't in the time taken to lock, but the time a thread sits around twiddling its thumbs waiting for another thread to release a lock it wants!)

A mutex requires an OS context switch. That is fairly expensive. The CPU can still do it hundreds of thousands of times per second without too much trouble, but it is a lot more expensive than not having the mutex there. Putting it on every variable access is probably overkill.
It also probably is not what you want. This kind of brute-force locking tends to lead to deadlocks.
do you know better ways to protect such variable accesses?
Design your application so that as little data as possible is shared. Some sections of code should be synchronized, probably with a mutex, but only those that are actually necessary. And typically not individual variable accesses, but tasks containing groups of variable accesses that must be performed atomically. (perhaps you need to set your is_active flag along with some other modifications. Does it make sense to set that flag and make no further changes to the object?)

I was curious about the expense of using a pthred_mutex_lock/unlock.
I had a scenario where I needed to either copy anywhere from 1500-65K bytes without using
a mutex or to use a mutex and do a single write of a pointer to the data needed.
I wrote a short loop to test each
gettimeofday(&starttime, NULL)
COPY DATA
gettimeofday(&endtime, NULL)
timersub(&endtime, &starttime, &timediff)
print out timediff data
or
ettimeofday(&starttime, NULL)
pthread_mutex_lock(&mutex);
gettimeofday(&endtime, NULL)
pthread_mutex_unlock(&mutex);
timersub(&endtime, &starttime, &timediff)
print out timediff data
If I was copying less than 4000 or so bytes, then the straight copy operation took less time. If however I was copying more than 4000 bytes, then it was less costly to do the mutex lock/unlock.
The timing on the mutex lock/unlock ran between 3 and 5 usec long including the time for
the gettimeofday for the currentTime which took about 2 usec

For member variable access, you should use read/write locks, which have slightly less overhead and allow multiple concurrent reads without blocking.
In many cases you can use atomic builtins, if your compiler provides them (if you are using gcc or icc __sync_fetch*() and the like), but they are notouriously hard to handle correctly.
If you can guarantee the access being atomic (for example on x86 an dword read or write is always atomic, if it is aligned, but not a read-modify-write), you can often avoid locks at all and use volatile instead, but this is non portable and requires knowledge of the hardware.

Well a suboptimal but simple approach is to place macros around your mutex locks and unlocks. Then have a compiler / makefile option to enable / disable threading.
Ex.
#ifdef THREAD_ENABLED
#define pthread_mutex_lock(x) ... //actual mutex call
#endif
#ifndef THREAD_ENABLED
#define pthread_mutex_lock(x) ... //do nothing
#endif
Then when compiling do a gcc -DTHREAD_ENABLED to enable threading.
Again I would NOT use this method in any large project. But only if you want something fairly simple.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js