Is it practically safe to write static data from multiple threads - c++

I have some status data that I want to cache from a database. Any of several threads may modify the status data. After the data is modified it will be written to the database. The database writes will always be done in series by the underlying database access layer which queues database operations in a different process so I cam not concerned about race conditions for those.
Is it a problem to just modify the static data from several threads? In theory it is possible that modifications are implemented as read, modify, write but in practice I can't imagine that this is so.
My data handling class will look something like this:
class StatusCache
{
public:
static void SetActivityStarted(bool activityStarted)
{ m_activityStarted = activityStarted; WriteToDB(); }
static void SetActivityComplete(bool activityComplete);
{ m_activityComplete = activityComplete; WriteToDB(); }
static void SetProcessReady(bool processReady);
{ m_processReady = processReady; WriteToDB(); }
static void SetProcessPending(bool processPending);
{ m_processPending = processPending; WriteToDB(); }
private:
static void WriteToDB(); // will write all the class data to the db (multiple requests will happen in series)
static bool m_activityStarted;
static bool m_activityComplete;
static bool m_processReady;
static bool m_processPending;
};
I don't want to use locks as there are already a couple of locks in this part of the app and adding more will increase the possibility of deadlocks.
It doesn't matter if there is some overlap between 2 threads in the database update, e.g.
thread 1 thread 2 activity started in db
SetActivityStarted(true) SetActivityStarted(false)
m_activityStated = true
m_activityStarted = false
WriteToDB() false
WriteToDB() false
So the db shows the status that was most recently set by the m_... = x lines. This is OK.
Is this a reasonable approach to use or is there a better way of doing it?
[Edited to state that I only care about the last status - order is unimportant]

No, it's not safe.
The code generated that does the writing to m_activityStarted and the others may be atomic, but that is not garantueed. Also, in your setters you do two things: set a boolean and make a call. That is definately not atomic.
You're better off synchronizing here using a lock of some sort.
For example, one thread may call the first function, and before that thread goes into "WriteDB()" another thread may call another function and go into WriteDB() without the first going there. Then, perhaps the status is written in the DB in the wrong order.
If you're worried about deadlocks then you should revise your whole concurrency strategy.

On multi CPU machines, there's no guarantee that memory writes will be seen by threads running on different CPUs in the correct order without issuing a synchronisation instruction. It's only when you issue a synch order, e.g. a mutex lock or unlock, that the each thread's view of the data is guaranteed to be consistent.
To be safe, if you want the state shared between your threads, you need to use synchronisation of some form.

You never know exactly how things are implemented at the lower levels. Especially when you start dealing with multiple cores, the various cache levels, pipelined execution, etc. At least not without a lot of work, and implementations change frequently!
If you don't mutex it, eventually you will regret it!
My favorite example involves integers. This one particular system wrote its integer values in two writes. E.g. not atomic. Naturally, when the thread was interrupted between those two writes, well, you got the upper bytes from one set() call, and the lower bytes() from the other. A classic blunder. But far from the worst that can happen.
Mutexing is trivial.
You mention: I don't want to use locks as there are already a couple of locks in this part of the app and adding more will increase the possibility of deadlocks.
You'll be fine as long as you follow the golden rules:
Don't mix mutex lock orders. E.g. A.lock();B.lock() in one place and B.lock();A.lock(); in another. Use one order or the other!
Lock for the briefest possible time.
Don't try to use one mutex for multiple purposes. Use multiple mutexes.
Whenever possible use recursive or error-checking mutexes.
Use RAII or macros to insure unlocking.
E.g.:
#define RUN_UNDER_MUTEX_LOCK( MUTEX, STATEMENTS ) \
do { (MUTEX).lock(); STATEMENTS; (MUTEX).unlock(); } while ( false )
class StatusCache
{
public:
static void SetActivityStarted(bool activityStarted)
{ RUN_UNDER_MUTEX_LOCK( mMutex, mActivityStarted = activityStarted );
WriteToDB(); }
static void SetActivityComplete(bool activityComplete);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mActivityComplete = activityComplete );
WriteToDB(); }
static void SetProcessReady(bool processReady);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mProcessReady = processReady );
WriteToDB(); }
static void SetProcessPending(bool processPending);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mProcessPending = processPending );
WriteToDB(); }
private:
static void WriteToDB(); // read data under mMutex.lock()!
static Mutex mMutex;
static bool mActivityStarted;
static bool mActivityComplete;
static bool mProcessReady;
static bool mProcessPending;
};

Im no c++ guy but i dont think it will be safe to write to it if you dont have some sort of synchronization..

It looks like you have two issues here.
#1 is that your boolean assignment is not necessarily atomic, even though it's one call in your code. So, under the hood, you could have inconsistent state. You could look into using atomic_set(), if your threading/concurrency library supports that.
#2 is synchronization between your reading and writing. From your code sample, it looks like your WriteToDB() function writes out the state of all 4 variables. Where is WriteToDB serialized? Could you have a situation where thread1 starts WriteToDB(), which reads m_activityStarted but doesn't finish writing it to the database, then is preempted by thread2, which writes m_activityStarted all the way through. Then, thread1 resumes, and finishes writing its inconsistent state through to the database. At the very least, I think that you should have write access to the static variables locked out while you are doing the read access necessary for the database update.

In theory it is possible that modifications are implemented as read, modify, write but in practice I can't imagine that this is so.
Generally it is so unless you've set up some sort of transactional memory. Variables are generally stored in RAM but modified in hardware registers, so the read isn't just for kicks. The read is necessary to copy the value out of RAM and into a place it can be modified (or even compared to another value). And while the data is being modified in the hardware register, the stale value is still in RAM in case somebody else wants to copy it into another hardware register. And while the modified data is being written back to RAM somebody else may be in the process of copying it into a hardware register.
And in C++ ints are guaranteed to take at least a byte of space. Which means it is actually possible for them to have a value other than true or false, say due to race condition where the read happens partway through a write.
On .Net there is some amount of automatic synchronization of static data and static methods. There is no such guarantee in standard C++.
If you're looking at only ints, bools, and (I think) longs, you have some options for atomic reads/writes and addition/subtraction. C++0x has something. So does Intel TBB. I believe that most operating systems also have the needed hooks to accomplish this.

While you may be afraid of deadlocks, I am sure you will be extremely proud of your code to know it works perfectly.
So I would recommend you throw in the locks, you may also want to consider semaphores, a more primitive(and perhaps more versatile) type of lock.

You may get away with it with bools, but if the static objects being changed are of types of any great complexity, terrible things will occur. My advice - if you are going to write from multiple threads, always use synchronisation objects, or you will sooner or later get bitten.

This is not a good idea. There are many variables that will affect the timing of different threads.
Without some kind of lock you will not be guaranteed to have the correct last state.
It is possible that two status updates could be written to the database out of order.
As long as the locking code is designed properly dead locks should not be an issue with a simple process like this.

As others have pointed out, this is generally a really bad idea (with some caveats).
Just because you don't see a problem on your particular machine when you happen to test it doesn't prove that the algorithm works right. This is especially true for concurrent applications. Interleavings can change dramatically for example when you switch to a machine with a different number of cores.
Caveat: if all your setters are doing atomic writes and if you don't care about the timing of them, then you may be okay.
Based on what you've said, I'd think that you could just have a dirty flag that's set in the setters. A separate database writing thread would poll the dirty flag every so often and send the updates to the database. If some items need extra atomicity, their setters would need to lock a mutex. The database writing thread must always lock the mutex.

Related

Thread safety among classes with other classes for private variables

I'm writing a game engine (for fun), and have a lot of threads running concurrently. I have a class which holds an instance of another class as a private variable, which in turn holds and instance of a different class as a private variable. My question is, which one of these classes should I strive to make thread safe?
Do I make all of them thread safe, and have each of them protect their data with a mutex, do I make just one of them thread safe, and assume that anybody using my code must understand that if you are using underlying classes they aren't inherently thread safe.
Example:
class A {
private:
B b;
}
class B {
private:
C c;
}
class C {
// data
}
I understand I need every class's data to avoid being corrupted via a data race, however I would like to avoid throwing a ton of mutexes on every single method of every class. I'm not sure what the proper convention is.
You almost certainly don't want to try to make every class thread-safe, since doing so would end up being very inefficient (with lots of unnecessary locking and unlocking of mutexes for no benefit) and also prone to deadlocks (the more mutexes you have to lock at once, the more likely you are to have different threads locking sequences of mutexes in a different order, which is the entry condition for a deadlock and therefore your program freezing up on you).
What you want to do instead if figure out which data structures needs to be accessed by which thread(s). When designing your data structures, you want to try to design them in such a way that the amount of data shared between threads is as minimal as possible -- if you can reduce it to zero, then you don't need to do any serialization at all! (you probably won't manage that, but if you do a CSP/message-passing design you can get pretty close, in that the only mutexes you ever need to lock are the ones protecting your message-passing queues)
Keep in mind also that your mutexes are there not just to "protect the data" but also to allow a thread to make a series of changes appear to be atom from the viewpoint of the other threads that might access that data. That is, if your thread #1 needs to make changes to objects A, B, and C, and all three of those objects each have their own mutex, which thread #1 locks before modifying the object and then unlocks afterwards, you can still have a race condition, because thread #2 might "see" the update half-completed (i.e. thread #2 might examine the objects after you've updated A but before you've updated B and C). Therefore you usually need to push your mutexes up to a level where they cover all the objects you might need to change in one go -- in the ABC example case, that means you might want to have a single mutex that is used to serialize access to A, B, and C.
One way to approach it would be to start with just a single global mutex for your entire program -- anytime any thread needs to read or write any data structure that is accessible to other threads, that is the mutex it locks (and unlocks afterwards). That design probably won't be very efficient (since threads might spend a lot of time waiting for the mutex), but it will definitely not suffer from deadlock problems. Then once you have that working, you could look to see if that single mutex is actually a noticeable performance bottleneck for you -- if not, you're done, ship your program :) OTOH if it is a bottleneck, you can then analyze which of your data structures are logically independent from each other, and split your global mutex into two mutexes -- one to serialize access to subset A of the data structures, and another one to serialize access to subset B. (Note that the subsets don't need to be equal size -- subset B might contain just one particular data structure that is critical to performance) Repeat as necessary until either you're happy with performance, or your program starts to get too complicated or buggy (in which case you might want to dial the mutex-granularity back again a bit in order to regain your sanity).

Efficient way to have a thread wait for a value to change in memory?

For some silly reason, there's a piece of hardware on my (GNU/Linux) machine that can only communicate a certain occurrence by writing a value to memory. Assume that by some magic, the area of memory the hardware writes to is visible to a process I'm running. Now, I want to have a thread within that process keep track of that value, and as soon as possible after it has changed - execute some code. However, it is more important to me that the thread not waste CPU time than for it to absolutely minimize the response delay. So - no busy-waiting on a volatile...
How should I best do this (using modern C++)?
Notes:
I don't mind a solution involving atomics, or synchronization mechanisms (in fact, that would perhaps be preferable) - as long as you bear in mind that the hardware doesn't support atomic operations on host memory - it performs a plain write.
The value the hardware writes can be whatever I like, as can the initial value in the memory location it writes to.
I used C++11 since it's the popular tag for Modern C++, but really, C++14 is great and C++17 is ok. On the other hand, even a C-based solution will do.
So, the naive thing to do would be non-busy sleeping, e.g.:
volatile int32_t* special_location = get_special_location();
auto polling_interval_in_usec = perform_tradeoff_between_accuracy_and_cpu_load();
auto polling_interval = std::chrono::microseconds(polling_interval_in_usec);
while(should_continue_polling()) {
if (*special_location == HardwareIsDone) {
do_stuff();
return;
}
std::this_thread::sleep_for(polling_interval);
}
This is usually done via std::condition_variable.
... as long as you bear in mind that the hardware doesn't support atomic operations on host memory - it performs a plain write.
Implementations of std::atomic may fall back to mutexes in such cases
UPD - Possible implementation details: assuming you have some data structure in a form of:
struct MyData {
std::mutex mutex;
std::condition_variable cv;
some_user_type value;
};
and you have an access to it from several processes. Writer process overrides value and notifies cv via notify_one, reader process waits on cv in a somewhat similar to busy wait manner, but thread yields for the wait duration. Everything else I could add is already present in the referred examples.

Shared Variables in C++11

So I took an OS class last semester and we had a concurrency/threading project. It was an airport sim that landed planes / had them take off into the direction that the wind was blowing. We had to do it in Java. So now that finals are over and I'm bored, I'm trying to do it in C++11. In Java I used a synchronized variable for the wind (0 - 360) in main and passed it to the 3 threads I was using. My question is: Can you do that in C++11? It's a basic reader/writer, one thread writes/updates the wind, the other 2 (takeoff/land) read.
I got it working by having a global wind variable in my "threads.cpp" implementation file. But is there a way to pass a variable to as many threads as I want and all of them keep up with it? Or is it actually better for me to just use the global variable and not pass anything?(why/why not?) I was looking at std::ref() but that didn't work.
EDIT: I'm already using mutex and lock_guard. I'm just trying to figure out how to pass and keep a variable up to date in all threads. Right now it only updates in the write thread.
You can use a std::mutex with std::lock_guard to synchronize access to the shared data. Or if the shared data fits in an integer, you can use std::atomic<int> without locking.
If you want to avoid global variables, simply pass the address of the shared state to the thread functions when you launch them. For example:
void thread_entry1(std::atomic<int>* val) {}
void thread_entry2(std::atomic<int>* val) {}
std::atomic<int> shared_value;
std::thread t1(thread_entry1, &shared_value);
std::thread t2(thread_entry2, &shared_value);
Using std::mutex and std::lock_guard mimicks what a Java synchronized variable does (only in Java this happens secretly without you knowing, in C++ you do it explicitly).
However, having one producer (there is just one direction of wind) and otherwise only consumers, it suffices to write to a e.g. std::atomic<int> variable with relaxed ordering, and to read from that variable from each consumer, again with relaxed ordering. Unless you have the requirement that the global view of all airplanes are consistent (but then you would have to run a lockstep simulation, which makes threading nonsensical), there is no need for synchronization, you only have to make sure that any value that any airplane reads at any time is eventually correct and that no garbled intermediate results can occur. In other words, you need an atomic update.
Relaxed memory ordering is sufficient too, since if all you read is one value, you do not need any happens-before guarantees.
An atomic update (or rather, atomic write) is at least an order of magnitude, if not more, faster. Atomic reads and writes with relaxed ordering are indeed plain normal reads and writes on many (most) mainstream architectures.
The variable needs not be global, you can as well keep it in the main thread's simultion loop's scope and pass a reference (or pointer) to the threads.
You might want to create say, the wind object, on the heap with new through an std::shared_ptr. Pass this pointer to all interested threads and use a std::mutex and std::lock_guard to change it.

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?
It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.
Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.
The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.
No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

Lockless reader/writer

I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!
What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).
There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2