multiple threads performing writes? - c++

I was hoping someone could advise on how multiple threads can write to a common container (eg a map). In a scenario where some threads might share the same key using Boost and C++
The map might be of type : std::map, with different threads accessing the object to modify different data members. Would each thread wait upon hitting unique_lock for the current thread to finish before proceeding?
would it be as simple as each thread entering a critical section as this example:
//somewhere within the code
boost::unique_lock mutex;
void modifyMap(const std::string& key,const unsigned int dataX,
const unsigned int dataY)
{
// would each thread wait for exclusive access?
boost::unique_lock<boost::shared_mutex> lock (mutex);
// i now have exclusive access no race conditions;
m_map.find(key)->second.setDataX(dataX);
m_map.find(key)->second.setDataX(dataY);
}
thanks in advance

You should create a thread-safe implementation of a data structure. It can be either lock-based (for example implemented by using mutexes) or lock-free (using atomic operations or memory orderings which are supported in C++11 and boost).
I can briefly describe the lock-based approach. For example, you may want to design a thread-safe linked list. If your threads perform only read operations everything is safe. On the other hand if you try to write to this data structure you might need a previous and a next node pointers in the list (if its double-linked you need to update their pointers to point to the inserted node) and while you modify them some other thread might read the incorrect pointer data so you need a lock on the two nodes between which you want to insert your new node. This creates serialization (other threads wait for mutex to be unlocked) and reduces the potential for concurrency.
The full example with a lookup table is available in the book "C++ Concurrency: Practical Multithreading" by Anthony Williams at page 171, listing 6.11. The book itself is a good start for a multithreading programming with the latest C++ standard as the author of the book also designed both boost::thread and C++11 thread libraries.
update: to make your example work for read/write (if you need more operations you need to protect them also) you're better off using boost::shared_mutex which essentially allows multiple-read single write access: if one thread wants to write than it is going acquire an exclusive lock and all other threads will have to wait. Here's some code:
template <typename mapType>
class threadSafeMap {
boost::shared_mutex map_mutex;
mapType* m_map;
public:
threadSafeMap() {
m_map = new mapType();
}
void modifyMap(std::string& key,const unsigned int dataX,
const unsigned int dataY)
{
//std::lock_guard in c++11. std::shared_mutex is going to be available in C++14
//acquire exclusive access - other threads wait
boost::lock_guard<boost::shared_mutex> lck(map_mutex);
m_map.find(key)->second.setDataX(dataX);
m_map.find(key)->second.setDataX(dataY);
}
int getValueByKey(std::string& key)
{
//std::lock_guard in c++11. std::shared_mutex is going to be available in C++11
//acquire shared access - other threads can read. If the other thread needs access it has to wait for a fully unlocked state.
boost::shared_lock<boost::shared_mutex> lck(map_mutex);
return m_map.getValue(key);
}
~threadSafeMap() {
delete m_map;
}
};
Lock-guard objects are destructed and mutex is unlocked at the end of the lifetime. mapType template can be replaced by your map type.

Related

preventing data races in shared hash table

I'm sorry if this is a duplicate, but as much as I search I only find solutions that don't apply:
so I have a hash table, and I want multiple threads to be simultaneously reading and writing to the table. But how do I prevent data races when:
threads writing to the same hash as another
threads writing to a hash being read
edit:
if possible, because this hash needs to be extremely fast as it's accessed extremely frequently, is there a way to lock two racing threads only if they are accessing the same index of the hash table?
I have answered variations of this question before. Please read my previous answer regarding this topic.
Many people have tried to implement thread safe collection classes (lists, hash tables, maps, sets, queues, etc... ) and failed. Or worse, failed, didn't know it, but shipped it anyway.
A naive way to build a thread-safe hash table is to start with an existing hash table implementation and add a mutex to all public methods. You could imagine a hypothetical implementation is this:
// **THIS IS BAD**
template<typename K, typename V>
class ThreadSafeMap
{
private:
std::map<K,V> _map;
std::mutex _mutex;
public:
void insert(const K& k, const V& v)
{
std::lock_guard lck(_mutex);
_map[k] = v;
}
const V& at(const K& key)
{
std::lock_guard lck(_mutex);
return _map.at(k);
}
// other methods not shown - but are essentially a repeat of locking a mutex
// before accessing the underlying data structure
};
In the the above example, std::lock_guard locks mutex when the lck variable is instantiated, and lock_guard's destructor will release the mutex when the lck variable goes out of scope
And to a certain extent, it is thread safe. But then you start to use the above data structure in a complex ways, it breaks down.
Transactions on hash tables are often multi-step operations. For example, an entire application transaction on the table might be to lookup a record and upon successfully returning it, change some member on what the record points to.
So imagine we had used the above class across different threads like the following:
ThreadSafeMap g_map<std::string, Item>;
// thread 1
Item& item = g_map.at(key);
item.value++;
// thread 2
Item& item = g_map.at(key);
item.value--;
// thread 3
g_map.erase(key);
g_map[key] = newItem;
It's easy to think the above operations are thread safe because the hash table itself is thread safe. But they are not. Thread 1 and thread 2 are both trying to access the same item outside of the lock. Thread 3 is even trying to replace that record that might be referenced by the other two threads. There's a lot of undefined behavior here.
The solution? Stick with a single threaded hash table implementation and use the mutex at the application/transaction level. Better:
std::unordered_map<std::string, Item> g_map;
std::mutex g_mutex;
// thread 1
{
std::lock_guard lck(g_mutex);
Item& item = g_map.at(key);
item.value++;
}
// thread 2
{
std::lock_guard lck(g_mutex);
Item& item = g_map.at(key);
item.value--;
}
// thread 3
{
std::lock_guard lck(g_mutex);
g_map.erase(key);
g_map[key] = newItem;
}
Bottom line. Don't just stick mutexes and locks on your low-level data structures and proclaim it to be thread safe. Use mutexes and locks at the level the caller expects to do its set of operations on the hash table itself.
The most reliable and appropriate way to avoid data races is to serialize access to the hash table using a mutex; i.e. each thread needs to acquire the mutex before performing any operations (read or write) on the hash table, and release the mutex after it is done.
What you're probably looking for, though, is to implement a lock-free hash table, but ensuring correct multithreaded behavior without locks is extremely difficult to do correctly, and if you were at the technical level required to implement such a thing, you wouldn't need to ask about it on Stackoverflow. So I strongly suggest that you either stick with the serialized-access approach (which works fine for 99% of the software out there, and is possible to implement correctly without in-depth knowledge of the CPU, cache architecture, RAM, OS, scheduler, optimizer, C++ language spec, etc) or if you must use a lock-free data structure, that you find a premade one from a reputable source to use rather than trying to roll your own. In fact, even if you want to roll your own, you should start by looking through the source code of working examples, to get an idea of what they are doing and why they are doing it.
So you need basic thread synchronization or what? You must use mutex, lock_guard, or some other mechanism for thread synchronization in the read and write functions. In cppreference.com you have the documentation of the standard library.

Any proper way to achieve locks in a situation like this?

I have an array of objects that I want to operate on in threads, but I also want to be able to access at times. This feels like a hacky way to achieve my goal, but is there a better way to do something like this?:
*the basic goal is to have 2 locks. One that allows all individual threads to work concurrently while blocking access from the array until they are all finished, and one that allows shuts down access from the thread to ensure that no objects are touched by other threads while a function runs.
atomic<int> inThreadCount;
atomic<int> arrayLock;
map<string, MyObj*> myMap;
mutex mu1;
class MyObj{
mutex mu2;
int myInt;
public:
void update(bool shouldLowerCount){
mu2.lock();
myInt++;
if (shouldLowerCount)
inThreadCount--;
mu2.unlock();
}
}
//Some operation that requires all threads to finish first
//and doesn't allow threads to access the objects while running
void GetSnapshot(){
mu1.lock();
arrayLock++;
while (inThreadCount > 0)
Sleep(0);
map<string, MyObj *>::iterator it = myMap.begin();
auto t = time(nullptr);
auto tm = *localtime(&t);
cout << put_time(&tm, "%d-%m-%Y %H-%M-%S") << endl;
for( ; it != myMap.end(); ++it){
cout << it->first << ":" << it->second->counter);
}
arrayLock--;
mu1.unlock();
}
void updateObject(MyObj* myObj){
while (arrayLock > 0)
Sleep(0);
inThreadCount++;
async(std::launch::async, myObj->update(true));
}
PS, I realize that there is a tiny window of opportunity for error between Sleep() and arrayLock/inThreadCount++. That's part of the problem I want to solve!
I think you're asking for a shared mutex.
A shared mutex (or read-write mutex) allows many threads to lock an object in parallel while also allowing one thread at a time to lock it exclusively.
In simple terms, if a thread requests shared access it is granted unless a thread holds the object exclusively. A thread is granted exclusivity when the object isn't held (shared or exclusively) by any other thread.
It's common use is read-write exclusivity. See shared access to read and exclusive access to write. That's valid because a data race can only occur when two ore more threads access the same data and at least one of them is a write operation. Multiple readers is not a data race.
There are normally overheads in implementing a shared lock as opposed to an exclusive lock, and the model only normally helps where there are 'many' readers
that read 'frequently' and write operations are 'infrequent'. What 'many', 'frequent' and 'infrequent' mean depends on the platform and problem in hand.
That's exactly what a shared mutex is for.
C++17 supports that out of the box with std::shared_mutex but I noticed the question is tagged C++11.
Some implementations have offered that for some time (it's a classic locking strategy)
Or you can try boost::shared_mutex<>.
NB: One of the challenges in a shared-lock is to avoid live-lock on the writer.
If there are many readers that read frequently it can be easy the writer to get 'locked out' indefinitely and never progress (or progress very slowly).
A good shared lock will provide some guarantee of the writer eventually getting a turn. That may be absolute priority (no writers allowed to start after a thread starts waithing

How to Create Thread-Safe Buffers / POD?

My problem is quite common I suppose, but it drives me crazy:
I have a multi-threaded application with 5 threads. 4 of these threads do their job, like network communication and local file system access, and then all write their output to a data structure of this form:
struct Buffer {
std::vector<std::string> lines;
bool has_been_modified;
}
The 5th thread prints these buffer/structures to the screen:
Buffer buf1, buf2, buf3, buf4;
...
if ( buf1.has_been_modified ||
buf2.has_been_modified ||
buf3.has_been_modified ||
buf4.has_been_modified )
{
redraw_screen_from_buffers();
}
How do I protect the buffers from being overwritten while they are either being read from or written to?
I can't find a proper solution, although I think this has to be a quiet common problem.
Thanks.
You should use a mutex. The mutex class is std::mutex. With C++11 you can use std::lock_guard<std::mutex> to encapsulate the mutex using RAII. So you would change your Buffer struct to
struct Buffer {
std::vector<std::string> lines;
bool has_been_modified;
std::mutex mutex;
};
and whenever you read or write to the buffer or has_been_modified you would do
std::lock_guard<std::mutex> lockGuard(Buffer.mutex); //Do this for each buffer you want to access
... //Access buffer here
and the mutex will be automatically released by the lock_guard when it is destroyed.
You can read more about mutexes here.
You can use a mutex (or mutexes) around the buffers to ensure that they're not modified by multiple threads at the same time.
// Mutex shared between the multiple threads
std::mutex g_BufferMutex;
void redraw_screen_from_buffers()
{
std::lock_guard<std::mutex> bufferLockGuard(g_BufferMutex);
//redraw here after mutex has been locked.
}
Then your buffer modification code would have to lock the same mutex when the buffers are being modified.
void updateBuffer()
{
std::lock_guard<std::mutex> bufferLockGuard(g_BufferMutex);
// update here after mutex has been locked
}
This contains some mutex examples.
What appears you want to accomplish is to have multiple threads/workers and one observer. The latter needs to do its job only when all workers are done/signal. If this is the case then check code in this SO q/a. std::condition_variable - Wait for several threads to notify observer
mutex are a very nice thing when trying to avoid dataraces, and I'm sure the answer posted by #Phantom will satisfy most people. However, one should know that this is not scalable to large systems.
By locking you are synchronising your threads. As only one at a time can be accessing the vector, on thread writting to the container will cause the other one to wait for it to finish ... with may be good for you but causes serious performance botleneck when high performance is needed.
The best solution would be to use a more complexe lock free structure. Unfortunatelly I don't think there is any standart lockfree structure in the STL. One exemple of lockfree queue is available here
Using such a structure, your 4 working threads would be able to enqueue messages to the container while the 5th one would dequeue them, without any dataraces
More on lockfree datastructure can be found here !

Lock std::map C++

I have a problem with using std:map in my multithread application. I need to lock the map object when thread is writing to this object. Parallely another threads which reading this object should shop until writing process is finish.
Sample code:
std::map<int, int> ClientTable;
int write_function() //<-- only one thread uses this function
{
while(true)
{
//lock ClientTable
ClientTable.insert(std::pair<int, int>(1, 2)); // random values
//unlock ClientTable
//thread sleeps for 2 secs
}
}
int read_function() //<--- many thread uses this function
{
while(true)
{
int test = ClientTable[2]; // just for test
}
}
How to lock this std::map object and correctly synchronise this threads?
Looks like what you need is a typical read-write lock, allowing any number of readers but a single writer. You can have a look at boost's shared_mutex.
Additional usage examples can be found here: Example for boost shared_mutex (multiple reads/one write)?
Well, since a std::map doesn't have a builtin lock, you would have to use a separate lock that protects the map. If the map is all that's protected by that lock you could subclass map to add the lock there along with "lock" and "unlock" calls, but if the lock will be used for other items as well then it's probably not the best idea to do that.
As for "how to correctly synchronize" -- that's very specific to the application at hand. However, for the example given, you have to insert the lock/unlock calls around the read operation as well.
If you have read/write locks, this might also be a good application for one of these.

Implementing thread-safe arrays

I want to implement a array-liked data structure allowing multiple threads to modify/insert items simultaneously. How can I obtain it in regard to performance? I implemented a wrapper class around std::vector and I used critical sections for synchronizing threads. Please have a look at my code below. Each time a thread want to work on the internal data, it may have to wait for other threads. Hence, I think its performance is NOT good. :( Is there any idea?
class parallelArray{
private:
std::vector<int> data;
zLock dataLock; // my predefined class for synchronizing
public:
void insert(int val){
dataLock.lock();
data.push_back(val);
dataLock.unlock();
}
void modify(unsigned int index, int newVal){
dataLock.lock();
data[index]=newVal; // assuming that the index is valid
dataLock.unlock();
}
};
Take a look at shared_mutex in the Boost library. This allows you to have multiple readers, but only one writer
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types.shared_mutex
The best way is to use some fast reader-writer lock. You perform shared locking for read-only access and exclusive locking for writable access - this way read-only access is performed simultaneously.
In user-mode Win32 API there are Slim Reader/Writer (SRW) Locks available in Vista and later.
Before Vista you have to implement reader-writer lock functionality yourself that is pretty simple task. You can do it with one critical section, one event and one enum/int value. Though good implementation would require more effort - I would use hand-crafted linked list of local (stack allocated) structures to implement fair waiting queue.