Are non-mutating operations inherently thread safe (C++)?

Are non-mutating operations inherently thread safe (C++)? - c++

This is probably a dumb question, but consider the following psuedo-code:
struct Person {
std::string name;
};
class Registry {
public:
const std::string& name(int id) const {return _people[id].name;}
void name(int id, const std::string& name) { [[scoped mutex]]; _people[id].name = name;}
private:
std::map<int, Person> _people;
};
In this simple example, assume Registry is a singleton that will be accessed by multiple threads. I'm locking during an operation that mutates the data, but not during non-mutating access.
Is this thread safe, or should I also lock during the read operation? I'm preventing multiple threads from trying to modify the data at the same time, but I don't know what would happen if a thread was trying to read at the same time another was writing.

If any thread can modify the data, then you need to lock for all access.
Otherwise, one of your "reading" threads could access the data when it is in an indeterminate state. Modifying a map, for example, requires manipulating several pointers. Your reading thread could acces the map while some - but not all - of the map has been adjusted.
If you can guarantee that the data is not being modified, multiple reads from multiple threads do not need to be locked, however that introduces a fragile scenario that you would have to keep a close eye on.

It's not thread safe to be reading the data while it is being modified. It is perfectly safe to have multiple threads reading the data at once.
This difference is what reader-writer locks are for; they will allow any number of readers but when a writer tries to lock the resource new readers will no longer be allowed and the writer will block until all the current readers are done. Then the writer will proceed and once it's done all the readers will be allowed access again.
The reason it's not safe to read data during modification is that the data can be or can appear to be in an inconsistent state (e.g., the object may temporarily not fulfill invariant). If the reader reads it at that point then it's just like there's a bug in the program failing to keep the data consistent.
// example
int array[10];
int size = 0;
int &top() {
return array[size-1];
}
void insert(int value) {
size++;
top() = value;
}
Any number of threads can call top() at the same time, but if one thread is running insert() then a problem occurs when the lines get interleaved like this:
// thread calling insert thread calling top
size++;
return array[size-1];
array[size-1] = value
The reading thread gets garbage.
Of course this is just one possible way things can go wrong. In general you can't even assume the program will behave as though lines of code on different threads will just interleave. In order to make that assumption valid the language simply tells you that you can't have data races (i.e., what we've been talking about; multiple threads accessing a (non-atomic) object with at least one thread modifying the object)*.
* And for completeness; that all atomic accesses use a sequentially consistent memory ordering. This doesn't matter for you since you're not doing low level work directly with atomic objects.

Is this thread safe, or should I also lock during the read operation?
It is not thread-safe.
Per Paragraph 1.10/4 of the C++11 Standard:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Moreover, per Paragraph 1.10/21:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [...]

It's not thread safe.
The reading operation could be going through the map (it's a tree) to find the requested object, while a writing operation suddenly adds or removes the something from the map (or worse, the actual point the iterator is at).
If you're lucky you'll get an exception, otherwise it will be just undefined behavior while your map is in an inconsistant state.

I don't know what would happen if a thread was trying to read at the
same time another was writing.
Nobody knows. Chaos would ensue.
Multiple threads can share a read-only resource, but as soon as anyone wants to write it, it becomes unsafe for everyone to access it in any way until the writing is done.
Why?
Writes are not atomic. They happen over multiple clock cycles. A process attempting to read an object as it's written may find a half-modified version, temporary garbage.
So
Lock your reads if they are expected to be concurrent with your writes.

Absolutely not safe!
If you are changing Person::Name from "John Smith" to "James Watt", you may well read back a value of "Jame Smith" or "James mith". Or possibly even something completely different because the way that "change this value for that" may not just copy the new data into the existing place, but entirely replace it with a newly allocated piece of memory that contains some completely undefined content [including something that isn't a valid string].

Related

Are mutex locks necessary when modifying values?

I have an unordered_map and I'm using mutex locks for emplace and delete, find operations, but I don't use a mutex when modifying map's elements, because I don't see any point. but I'm curious whether I'm wrong in this case.
Should I use one when modifying element value?
std::unordred_map<std::string, Connection> connections;
// Lock at Try_Emplace
connectionsMapMutex.lock();
auto [element, inserted] = connections.try_emplace(peer);
connectionsMapMutex.unlock();
// No locks here from now
auto& connection = element->second;
// Modifying Element
connection.foo = "bar";

Consider what can happen when you have one thread reading from the map and the other one writing to it:
Thread A starts executing the command string myLocalStr = element->second.foo;
As part of the above, the std::string copy-constructor starts executing: it stores foo's character-buffer-pointer into a register, and starts dereferencing it to copy out characters from the original string's buffer to myLocalStr's buffer.
Just then, thread A's quantum expires, and thread B gains control of the CPU and executes the command connection.foo = "some other string"
Thread B's assignment-operator causes the std::string to deallocate its character-buffer and allocate a new one to hold the new string.
Thread A then starts running again, and continues executing the std::string copy-constructor from step 2, but now the pointer it is dereferencing to read in characters is no longer pointing at valid data, because Thread A deleted the buffer! Poof, Undefined Behavior is invoked, resulting in a crash (if you're lucky) or insidious data corruption (if you're unlucky, in which case you'll be spending several weeks trying to figure out why your program's data gets randomly corrupted only about once a month).
And note that the above scenario is just on a single-core CPU; on a multicore system there are even more ways for unsynchronized accesses to go wrong, since the CPUs have to co-ordinate their local and shared memory-caches correctly, which they won't know to do if there is no synchronization code included.
To sum up: Neither std::unordered_map nor std::string are designed for unsynchronized multithreaded access, and if you try to get away with it you're likely to regret it later on.

Here's what I would do, if and only if I'm threading and there's a chance other threads are manipulating the list and its contents.
I would create a mutex lock when manipulating the list (which you've done) or when traversing the list.
And if I felt it was necessary to protect an individual item in the list (you're calling methods on it), I'd give each one a distinct mutex. You could change element A and element B simultaneously and it's fine, but by using the local locks for each item, each is safe.
However, it's very rare I've had to be that careful.

Is it safe to read(only) the same non-atomic variable in different threads without locking?

I have a class method in which I want to read the same non-atomic class member simultaneously in different threads. The method is const, so it doesn't write to the member being read. Is it safe to not bother about any locks in this case?
Edit:
I should have given an example:
class SomeClass()
{
public:
void someMethod() const;
//...
private:
std::string someMemeber_; // might be changed by some methods
//...
}
void SomeClass::someMethod() const
{
std::jthread thr1([](){/*read someMember_ here*/});
std::jthread thr2([](){/*read someMember_ here*/});
//...
}

It is safe, provided you can guarantee that the data being read won't change during the period that multiple threads have unsynchronized access to it.
(Note that this period includes not just the brief moments in time when threads are calling your const-method, but the entire region of time when a thread could call your const-method. So for example in many programs this no-modifications-allowed period might begin at the moment when the first child-thread is spawned, and end at the moment the last child-thread has exited and been join()'d)

Given that:
Yes, the value can be changed by other methods, but not during the execution of the method with multiple threads.
It is not inherently safe to blindly read non-atomic values that have potentially been modified by some other thread even if you know for 100% certain that no modification will be happening concurrently with the reads.
There has to be syncrhonization between the two threads before a modification is seen from reading threads.
The potential problems go beyond reading an out-of-date value. If the compiler can establish that there is no synchronization happening between two subsequent reads of the same memory location, it's perfectly allowed to only read it once and cache the result.
There are various ways for the synchronization to be done, and which one is best in a given scenario will depend on the context.
see https://en.cppreference.com/w/cpp/language/memory_model for more details
However, once synchronization has been established, any number of threads can concurrently read from the same memory location.

I have a class method in which I want to read the same non-atomic class member simultaneously in different threads. The method is const, so it doesn't write to the member being read. Is it safe to not bother about any locks in this case?
Yes, it is 100% safe. I am assuming that the only issue is that multiple threads are reading and that the code would be safe if a single thread were reading. That additional threads are reading the same data has no effect on whether the reads are safe or not.
A data race can only occur between a read and a modification. So if one read from one thread wouldn't race with any modifications, additional reads can't either.

C++ member update visibility inside a critical section when not atomic

I stumbled across the following Code Review StackExchange and decided to read it for practice. In the code, there is the following:
Note: I am not looking for a code review and this is just a copy paste of the code from the link so you can focus in on the issue at hand without the other code interfering. I am not interested in implementing a 'smart pointer', just understanding the memory model:
// Copied from the link provided (all inside a class)
unsigned int count;
mutex m_Mutx;
void deref()
{
m_Mutx.lock();
count--;
m_Mutx.unlock();
if (count == 0)
{
delete rawObj;
count = 0;
}
}
Seeing this makes me immediately think "what if two threads enter when count == 1 and neither see the updates of each other? Can both end up seeing count as zero and double delete? And is it possible for two threads to cause count to become -1 and then deletion never happens?
The mutex will make sure one thread enters the critical section, however does this guarantee that all threads will be properly updated? What does the C++ memory model tell me so I can say this is a race condition or not?
I looked at the Memory model cppreference page and std::memory_order cppreference, however the latter page seems to deal with a parameter for atomic. I didn't find the answer I was looking for or maybe I misread it. Can anyone tell me if what I said is wrong or right, and whether or not this code is safe or not?
For correcting the code if it is broken:
Is the correct answer for this to turn count into an atomic member? Or does this work and after releasing the lock on the mutex, all the threads see the value?
I'm also curious if this would be considered the correct answer:
Note: I am not looking for a code review and trying to see if this kind of solution would solve the issue with respect to the C++ memory model.
#include <atomic>
#include <mutex>
struct ClassNameHere {
int* rawObj;
std::atomic<unsigned int> count;
std::mutex mutex;
// ...
void deref()
{
std::scoped_lock lock{mutex};
count--;
if (count == 0)
delete rawObj;
}
};

"what if two threads enter when count == 1" -- if that happens, something else is fishy. The idea behind smart pointers is that the refcount is bound to an object's lifetime (scope). The decrement happens when the object (via stack unrolling) is destroyed. If two threads trigger that, the refcount can not possibly be just 1 unless another bug is present.
However, what could happen is that two threads enter this code when count = 2. In that case, the decrement operation is locked by the mutex, so it can never reach negative values. Again, this assumes non-buggy code elsewhere. Since all this does is to delete the object (and then redundantly set count to zero), nothing bad can happen.
What can happen is a double delete though. If two threads at count = 2 decrement the count, they could both see the count = 0 afterwards. Just determine whether to delete the object inside the mutex as a simple fix. Store that info in a local variable and handle accordingly after releasing the mutex.
Concerning your third question, turning the count into an atomic is not going to fix things magically. Also, the point behind atomics is that you don't need a mutex, because locking a mutex is an expensive operation. With atomics, you can combine operations like decrement and check for zero, which is similar to the fix proposed above. Atomics are typically slower than "normal" integers. They are still faster than a mutex though.

In both cases there’s a data race. Thread 1 decrements the counter to 1, and just before the if statement a thread switch occurs. Thread 2 decrement the counter to 0 and then deletes the object. Thread 1 resumes, sees that count is 0, and deletes the object again.
Move the unlock() to the end of th function.or, better, use std::lock_guard to do the lock; its destructor will unlock the mutex even when the delete call throws an exception.

If two threads potentially* enter deref() concurrently, then, regardless of the previous or previously expected value of count, a data race occurs, and your entire program, even the parts that you would expect to be chronologically prior, has undefined behavior as stated in the C++ standard in [intro.multithread/20] (N4659):
Two actions are potentially concurrent if
(20.1) they are performed by different threads, or
(20.2) they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The potentially concurrent actions in this case, of course, are the read of count outside of the locked section, and the write of count within it.
*) That is, if current inputs allow it.
UPDATE 1: The section you reference, describing atomic memory order, explains how atomic operations synchronize with each other and with other synchronization primitives (such as mutexes and memory barriers). In other words, it describes how atomics can be used for synchronization so that some operations aren't data races. It does not apply here. The standard takes a conservative approach here: Unless other parts of the standard explicitly make clear that two conflicting accesses are not concurrent, you have a data race, and hence UB (where conflicting means same memory location, and at least one of them isn't read-only).

Your lock prevents that operation count-- gets in a mess when performed concurrently in different threads. It does not guarantee, however, that the values of count are synchronized, such that repeated reads outside a single critical section will bear the risk of a data race.
You could rewrite it as follows:
void deref()
{
bool isLast;
m_Mutx.lock();
--count;
isLast = (count == 0);
m_Mutx.unlock();
if (isLast) {
delete rawObj;
}
}
Thereby, the lock makes sure that access to count is synchronized and always in a valid state. This valid state is carried over to the non-critical section through a local variable (without race condition). Thereby, the critical section can be kept rather short.
A simpler version would be to synchronize the complete function body; this might get a disadvantage if you want to do more elaborative things than just delete rawObj:
void deref()
{
std::lock_guard<std::mutex> lock(m_Mutx);
if (! --count) {
delete rawObj;
}
}
BTW: std::atomic allone will not solve this issue as this synchronizes just each single access, but not a "transaction". Therefore, your scoped_lock is necessary, and - as this spans the complete function then - the std::atomic becomes superfluous.

C++ constructor memory synchronization

Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.

In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)

If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.

Lockless reader/writer

I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!

What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).

There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js