Thread safety in C++

Thread safety in C++ - c++

I have question regarding thread safety as below ( I have only two threads in which one of the threads only read from the map, the other threads would be writing and reading as shown):
//Thread 2: the reading and writing thread
unordered_map<int, unordered_map<classA*>*>testMap;
//need lock because writing to the map?
testMap[1] = new unordered_map<int, classA*>;
//do not need lock because only reading and the other thread is only reading?
unordered_map<classA*>* ptr = testMap[1];
//need lock because writing?
(*ptr)[1] = new classA;
//do not need lock because only reading and the other thread is only reading?
classA* ptr2 = (*ptr)[1];
//din't modify the map, but modify the data pointed by the pointer stored by the map, do I need lock?
ptr2->field1 = 5;
ptr2->field2 = 6;
//end of reading and writing thread
What is the correct way to lock to unordered_map? Also, should I use a single lock or multiple locks?
Thanks.

If your map is the only shared resource, a single mutex is sufficient.
You need to lock the writing in the first thread, and lock the reading in the second one. If you lock the map only when writing on it, the second thread could read it while you are writing in it.
You dont need a lock in the last example regarding the pointers, since you dont deal with any data stored in the map.
Edit : in fact, it depends on what your are doing with the pointers and in which thread you do it.
You should read this great article : http://herbsutter.com/2010/09/24/effective-concurrency-know-when-to-use-an-active-object-instead-of-a-mutex/

You need to lock both, reading and writing. If you do not lock reading then a write can occur while you are reading and you may access the map in an inconsistent state.
What would be best in your situation would be a reader-writer-lock. Such a lock allows multiple readers to read at the same time but only one writer at the same time and no readers while a writer writes and vice versa.

Couple of things:
You should consider smart pointers to store in your map.
What you are doing is potentially quite dangerous (i.e. you may not be modifying the main map), but you are modifying what's stored there and if you do this outside of a lock, the end result could be anything - let's say that thread one has also read the same pointer and starts iterating whilst thread two is writing the instance of classA - what happens then?
I would have a lock around the main map, and then another lock for each payload map. Any operations on either map should require to obtain the lock at the correct level. I'd also be careful not to return iterators outside of the class that manages the lock, so basically you should implement all the methods you'd need within the class.

Related

If only one thread modifies an std::vector, does that same thread need to use a lock when it reads from the vector?

As I understand a data race can only occur with a std::vector when one or more threads are modifying a vector. If all threads are simply reading that is thread-safe.
Now assume I have 2 threads. One which can only read from the vector, but the other reads and modifies the size of the vector. I have a lock to remain thread-safe to the vector.
When the read-only thread reads from the vector it must use the lock since it does not know what the other thread is doing at the same time.
However, the read and write thread knows that the read-only thread is read-only and thus when it is reading the other thread can only be reading or not. This means that a data-race is impossible in that case and it does not have to use the lock.
The read and write thread must still use the lock when modifying the vector.
Is that right?

You are correct. If your single writer thread is in read mode, then a lock does not need to happen because the other thread is only a reader. Readers can't conflict with each other. The read only thread will still need to acquire the lock on every read, and the writing thread will need to lock when it writes, but it doesn't need to lock to read.

Should I use different mutexes for different objects?

I am new to threading . Correct me if I am wrong that mutex locks the access to a shared data structure so that it cannot be used by other threads until it is unlocked . So, lets consider that there are 2 or more shared data structures . So , should I make different mutex objects for different data structures ? If no ,then how std::mutex will know which object it should lock ? What If I have to lock more than 1 objects at the same time ?

There are several points in your question that can be made more precise. Perhaps clearing this will solve things for you.
To begin with, a mutex, by itself, does not lock access to anything. It is basically something that your code can lock and unlock, and some "magic" ensures that only one thread can lock it at a time.
If, by convention, you decide that any code accessing some data structure foo will first begin by locking a mutex foo_mutex, then it will have the effect of protecting this data structure.
So, having said that, regarding your questions:
It depends on whether the two data structures need to be accessed together or not (e.g., can updating one without the other leave the system in an inconsistent state). If so, you should lock them with a single mutex. If not, you can improve parallelism by using two.
The mutex does not lock anything. It is you who decide by convention whether you can access 1, 2, or a million data structures, while holding it.

If you always needs to access both structures then it could be considered as a single resource so only a single lock is needed.
If you sometimes, even just once, need to access one of the structures independently then they can no longer be considered a single resource and you might need two locks. Of course, a single lock could still be sufficient, but then that lock would lock both resources at once, prohibiting other threads from accessing any of the structures.

Mutex does not "know" anything other than about itself. The lock is performed on mutex itself.
If there are two objects (or pieces of code) that need synchronized access (but can be accessed at the same time) then you have the liberty to use just one mutex for both or one for each. If you use one mutex they will not be accessed at the same time from two different threads.
If it cannot happen that access to one object is required while accessing the other object then you can use two mutexes, one for each. But if it can happen that one object must be accessed while the thread already holds another mutex then care must be taken that code never can reach a deadlock, where two threads hold one mutex each, and both at the same time wait that the other mutex is released.

c++ map contaiter under multithread - iterator->seconds change under read mutex

Is changing address of pointer always atomic operation? is it safe to change pointer = NULL to some value(address) under multithread application?
more info:
There is std::map<int, SomePointer*. I access map with find() with locked mutex for READING. I get iterator to some element and then I want to change iterator->second (in another words SomePointer*)
So, while mutex of map locked for reading, map will not changed at this time. Only other readers will have access to the map. So, for me it's fine if other readers will get old value NULL or NEW ADDRESS... but, of course, I fear that some thread will access to some average state and get corrupted address. So, is it safe to change iterator->second under read mutex?

No, changing the value of anything that isn't explicitly made thread safe is not thread safe.
If you wish to perform possibly-concurrent writes and reads to/from a pointer without a data race or a lock, use a std::atomic.

C++ Vector is this thread-safe? multithreading

So i have multiple threads accessing this function to retrieve database information, is it thread safe?
vector<vector<string> > Database::query(const char* query)
{
pthread_rwlock_wrlock(&mylock); //Write-lock
...
vector<vector<string> > results;
results.push...
pthread_rwlock_unlock(&mylock); //Write-lock
return results;
}
for editors -> sometimes 'fixing' > > to >> is not a good idea but thanks for the rest.

Since results is a local variable, it is in itself safe to use without locks, since there will be a unique copy per thread (it is on the stack, the contents of the vector dynamically allocated in some way, etc). So as long as your database is thread safe, you don't need any locks at all. If the DB is not threadsafe, you need to protect that, of course.
As noted in the other answer, if, for some reason, for example the creation of a string causes a throw bad_alloc;, you need to deal with the fallout of that, and make sure the lock is unlocked (unless you really wish to deadlock all other threads!)

Generally speaking, multiple threads can hold "read" locks. Only one thread can hold "write" lock. And no "read" locks might be held while there is a "write" lock.
It means that while mylock is held locked inside query method, no-one else can have it locked for either read or write, so it is thread-safe. You can read more about readers-writer lock here. Whether you need that mutex locked in there or not is another question.
The code is not exception-safe, however. You must employ RAII in order to unlock a mutex automatically, including on stack unwinding.

It is thread safe because results is created as a local variable, so only one thread will ever access it any instance of results within this method.
If you need a thread-safe vector for some other reason, see this answer on Threadsafe Vector Class for C++.

How to synchronize access to many objects

I have a thread pool with some threads (e.g. as many as number of cores) that work on many objects, say thousands of objects. Normally I would give each object a mutex to protect access to its internals, lock it when I'm doing work, then release it. When two threads would try to access the same object, one of the threads has to wait.
Now I want to save some resources and be scalable, as there may be thousands of objects, and still only a hand full of threads. I'm thinking about a class design where the thread has some sort of mutex or lock object, and assigns the lock to the object when the object should be accessed. This would save resources, as I only have as much lock objects as I have threads.
Now comes the programming part, where I want to transfer this design into code, but don't know quite where to start. I'm programming in C++ and want to use Boost classes where possible, but self written classes that handle these special requirements are ok. How would I implement this?
My first idea was to have a boost::mutex object per thread, and each object has a boost::shared_ptr that initially is unset (or NULL). Now when I want to access the object, I lock it by creating a scoped_lock object and assign it to the shared_ptr. When the shared_ptr is already set, I wait on the present lock. This idea sounds like a heap full of race conditions, so I sort of abandoned it. Is there another way to accomplish this design? A completely different way?
Edit:
The above description is a bit abstract, so let me add a specific example. Imagine a virtual world with many objects (think > 100.000). Users moving in the world could move through the world and modify objects (e.g. shoot arrows at monsters). When only using one thread, I'm good with a work queue where modifications to objects are queued. I want a more scalable design, though. If 128 core processors are available, I want to use all 128, so use that number of threads, each with work queues. One solution would be to use spatial separation, e.g. use a lock for an area. This could reduce number of locks used, but I'm more interested if there's a design which saves as much locks as possible.

You could use a mutex pool instead of allocating one mutex per resource or one mutex per thread. As mutexes are requested, first check the object in question. If it already has a mutex tagged to it, block on that mutex. If not, assign a mutex to that object and signal it, taking the mutex out of the pool. Once the mutex is unsignaled, clear the slot and return the mutex to the pool.

Without knowing it, what you were looking for is Software Transactional Memory (STM).
STM systems manage with the needed locks internally to ensure the ACI properties (Atomic,Consistent,Isolated). This is a research activity. You can find a lot of STM libraries; in particular I'm working on Boost.STM (The library is not yet for beta test, and the documentation is not really up to date, but you can play with). There are also some compilers that are introducing TM in (as Intel, IBM, and SUN compilers). You can get the draft specification from here
The idea is to identify the critical regions as follows
transaction {
// transactional block
}
and let the STM system to manage with the needed locks as far as it ensures the ACI properties.
The Boost.STM approach let you write things like
int inc_and_ret(stm::object<int>& i) {
BOOST_STM_TRANSACTION {
return ++i;
} BOOST_STM_END_TRANSACTION
}
You can see the couple BOOST_STM_TRANSACTION/BOOST_STM_END_TRANSACTION as a way to determine a scoped implicit lock.
The cost of this pseudo transparency is of 4 meta-data bytes for each stm::object.
Even if this is far from your initial design I really think is what was behind your goal and initial design.

I doubt there's any clean way to accomplish your design. The problem that assigning the mutex to the object looks like it'll modify the contents of the object -- so you need a mutex to protect the object from several threads trying to assign mutexes to it at once, so to keep your first mutex assignment safe, you'd need another mutex to protect the first one.
Personally, I think what you're trying to cure probably isn't a problem in the first place. Before I spent much time on trying to fix it, I'd do a bit of testing to see what (if anything) you lose by simply including a Mutex in each object and being done with it. I doubt you'll need to go any further than that.
If you need to do more than that I'd think of having a thread-safe pool of objects, and anytime a thread wants to operate on an object, it has to obtain ownership from that pool. The call to obtain ownership would release any object currently owned by the requesting thread (to avoid deadlocks), and then give it ownership of the requested object (blocking if the object is currently owned by another thread). The object pool manager would probably operate in a thread by itself, automatically serializing all access to the pool management, so the pool management code could avoid having to lock access to the variables telling it who currently owns what object and such.

Personally, here's what I would do. You have a number of objects, all probably have a key of some sort, say names. So take the following list of people's names:
Bill Clinton
Bill Cosby
John Doe
Abraham Lincoln
Jon Stewart
So now you would create a number of lists: one per letter of the alphabet, say. Bill and Bill would go in one list, John, Jon Abraham all by themselves.
Each list would be assigned to a specific thread - access would have to go through that thread (you would have to marshall operations to an object onto that thread - a great use of functors). Then you only have two places to lock:
thread() {
loop {
scoped_lock lock(list.mutex);
list.objectAccess();
}
}
list_add() {
scoped_lock lock(list.mutex);
list.add(..);
}
Keep the locks to a minimum, and if you're still doing a lot of locking, you can optimise the number of iterations you perform on the objects in your lists from 1 to 5, to minimize the amount of time spent acquiring locks. If your data set grows or is keyed by number, you can do any amount of segregating data to keep the locking to a minimum.

It sounds to me like you need a work queue. If the lock on the work queue became a bottle neck you could switch it around so that each thread had its own work queue then some sort of scheduler would give the incoming object to the thread with the least amount of work to do. The next level up from that is work stealing where threads that have run out of work look at the work queues of other threads.(See Intel's thread building blocks library.)

If I follow you correctly ....
struct table_entry {
void * pObject; // substitute with your object
sem_t sem; // init to empty
int nPenders; // init to zero
};
struct table_entry * table;
object_lock (void * pObject) {
goto label; // yes it is an evil goto
do {
pEntry->nPenders++;
unlock (mutex);
sem_wait (sem);
label:
lock (mutex);
found = search (table, pObject, &pEntry);
} while (found);
add_object_to_table (table, pObject);
unlock (mutex);
}
object_unlock (void * pObject) {
lock (mutex);
pEntry = remove (table, pObject); // assuming it is in the table
if (nPenders != 0) {
nPenders--;
sem_post (pEntry->sem);
}
unlock (mutex);
}
The above should work, but it does have some potential drawbacks such as ...
A possible bottleneck in the search.
Thread starvation. There is no guarantee that any given thread will get out of the do-while loop in object_lock().
However, depending upon your setup, these potential draw-backs might not matter.
Hope this helps.

We here have an interest in a similar model. A solution we have considered is to have a global (or shared) lock but used in the following manner:
A flag that can be atomically set on the object. If you set the flag you then own the object.
You perform your action then reset the variable and signal (broadcast) a condition variable.
If the acquire failed you wait on the condition variable. When it is broadcast you check its state to see if it is available.
It does appear though that we need to lock the mutex each time we change the value of this variable. So there is a lot of locking and unlocking but you do not need to keep the lock for any long period.
With a "shared" lock you have one lock applying to multiple items. You would use some kind of "hash" function to determine which mutex/condition variable applies to this particular entry.

Answer the following question under the #JohnDibling's post.
did you implement this solution ? I've a similar problem and I would like to know how you solved to release the mutex back to the pool. I mean, how do you know, when you release the mutex, that it can be safely put back in queue if you do not know if another thread is holding it ?
by #LeonardoBernardini
I'm currently trying to solve the same kind of problem. My approach is create your own mutex struct (call it counterMutex) with a counter field and the real resource mutex field. So every time you try to lock the counterMutex, first you increment the counter then lock the underlying mutex. When you're done with it, you decrement the coutner and unlock the mutex, after that check the counter to see if it's zero which means no other thread is trying to acquire the lock . If so put the counterMutex back to the pool. Is there a race condition when manipulating the counter? you may ask. The answer is NO. Remember you have a global mutex to ensure that only one thread can access the coutnerMutex at one time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js