Access pthread shared std:map without data race - c++

My scenario is to have a main thread and tens of worker threads. Worker threads will process incoming messages from different ports.
What I want to do is to have main and worker threads share a same map, the worker threads save data into map (in different bucket). And the main thread grep the map content periodically.
The code goes like:
struct cStruct
{
std::map<string::string> map1;
pthread_mutex_t mutex1;
pthread_mutex_t mutex2;
};
int main(){
struct cStruct cStruct1;
while (condition){
pthread_t th;
int th_rc=pthread_create(&th,NULL,&task,(void *) &cStruct1);
}
}
void* task(void* arg){
struct cStruct cs = * (struct cStruct*) arg;
while (coming data){
if (main_thread_work){
pthread_cond_wait(&count_cond, &cs.mutex1)
}
pthread_mutex_lock(&cs.mutex1);
// add a new bucket to the map
cs.map1(thread_identifier)=processed_value;
pthread_mutex_unlock(&cs.mutex1);
}
void* main_thread_task(void* arg){
sleep(sleep_time);
main_thread_work = true;
pthread_mutex_lock(&cs.mutex1);
// main_thread reads the std::map
main_thread_work = false;
pthread_cond_broadcast(&count_cond, &cs.mutex1)
pthread_mutex_unlock(&cs.mutex1);
}
My questions are:
For map size change, I should use lock to protect the map.
But for map with certain key update, can I let different threads modify the map concurrently? (assume no two identical buckets of map will be accessed at same time)
For the main thread greps the map, I thought of use conditional wait to hold all the worker threads while main thread is grepping the map content, and do a pthread_cond_broadcast to wake then up. The problem is that if a worker thread is updating map while main starts to work, there will be data race.
Please share some ideas to help me improve my design.
Edit 1:
Add main_thread_task().
The thing I want to avoid is worker thread arriving pthread_cond_wait "after" pthread_cond_broadcast and the logic goes wrong.
So I false the main_thread_work before main thread broadcasts workers thread.

while (coming data){
if (main_thread_work){
pthread_cond_wait(&count_cond, &cs.mutex1)
}
pthread_mutex_lock(&cs.mutex1);
This clearly can't be right. You can't check main_thread_work unless you hold the lock that protects it. How can the call to pthread_cond_wait release a lock it doesn't hold?!
This should be something like:
void* task(void* arg){
struct cStruct cs = * (struct cStruct*) arg;
// Acquire the lock so we can check shared state
pthread_mutex_lock(&cs.mutex1);
// Wait until the shared state is what we need it to be
while (main_thread_work)
pthread_cond_wait(&count_cond, &cs.mutex1)
// Do whatever it is we're supposed to do when the
// shared state is in this state
cs.map1(thread_identifier)=processed_value;
// release the lock
pthread_mutex_unlock(&cs.mutex1);
}

You should use mutex locking mechanism on each access to the map (in your case) and not only on adding a new 'bucket'. In case T1 tries to write some value to the map while T2 inserts a new bucket, the pointer/iterator which is used by T1 becomes invalid.
Regarding the pthread_cond_wait. It may do the job in case the only thing that the other threads do is just modifying the map. If they perform other calculations or process some non shared data, it is better to use the same mutex just to protect access to the map and let other threads do their job which may be at that point not related to the shared map.

Related

And odd use of conditional variable with local mutex

Poring through legacy code of old and large project, I had found that there was used some odd method of creating thread-safe queue, something like this:
template < typename _Msg>
class WaitQue: public QWaitCondition
{
public:
typedef _Msg DataType;
void wakeOne(const DataType& msg)
{
QMutexLocker lock_(&mx);
que.push(msg);
QWaitCondition::wakeOne();
}
void wait(DataType& msg)
{
/// wait if empty.
{
QMutex wx; // WHAT?
QMutexLocker cvlock_(&wx);
if (que.empty())
QWaitCondition::wait(&wx);
}
{
QMutexLocker _wlock(&mx);
msg = que.front();
que.pop();
}
}
unsigned long size() {
QMutexLocker lock_(&mx);
return que.size();
}
private:
std::queue<DataType> que;
QMutex mx;
};
wakeOne is used from threads as kind of "posting" function" and wait is called from other threads and waits indefinitely until a message appears in queue. In some cases roles between threads reverse at different stages and using separate queues.
Is this even legal way to use a QMutex by creating local one? I kind of understand why someone could do that to dodge deadlock while reading size of que but how it even works? Is there a simpler and more idiomatic way to achieve this behavior?
Its legal to have a local condition variable. But it normally makes no sense.
As you've worked out in this case is wrong. You should be using the member:
void wait(DataType& msg)
{
QMutexLocker cvlock_(&mx);
while (que.empty())
QWaitCondition::wait(&mx);
msg = que.front();
que.pop();
}
Notice also that you must have while instead of if around the call to QWaitCondition::wait. This is for complex reasons about (possible) spurious wake up - the Qt docs aren't clear here. But more importantly the fact that the wake and the subsequent reacquire of the mutex is not an atomic operation means you must recheck the variable queue for emptiness. It could be this last case where you previously were getting deadlocks/UB.
Consider the scenario of an empty queue and a caller (thread 1) to wait into QWaitCondition::wait. This thread blocks. Then thread 2 comes along and adds an item to the queue and calls wakeOne. Thread 1 gets woken up and tries to reacquire the mutex. However, thread 3 comes along in your implementation of wait, takes the mutex before thread 1, sees the queue isn't empty, processes the single item and moves on, releasing the mutex. Then thread 1 which has been woken up finally acquires the mutex, returns from QWaitCondition::wait and tries to process... an empty queue. Yikes.

Thread about socket communication

I want to make function that when receive buffer from socket, thread make whole program freeze out of my function until my function is finished. I try these as below
Function Listen
void Listen(can* _c) {
while (true)
{
std::lock_guard<std::mutex>guard(_c->connection->mutex);
thread t(&connect_tcp::Recv_data,_c->connection,_c->s,ref(_c->response),_c->signals);
if (t.joinable())
t.join();
}
}
Function dataset_browseCan
void dataset_browseCan(can* _c) {
thread org_th(Listen, _c); // I call thread here
org_th.detach();
dataset_browse(_c->cotp, _c->mms_obj, _c->connection, _c->response, _c->list, _c->size_encoder, _c->s);
dataset_signals_browse(_c->cotp, _c->mms_obj, _c->connection, _c->response, _c->list, _c->size_encoder, _c->s);
Sleep(800);
_c->signals = new Signals[_c->response.real_signals_and_values.size()];
}
Function Recv Data
void connect_tcp::Recv_data(SOCKET s,mms_response &response,Signals *signals) {
LinkedList** list = new LinkedList * [1000];
uint8_t* buffer = new uint8_t [10000];
Sleep(800);
/*std::lock_guard<std::mutex>guard(mutex);*/
thread j(recv,s, (char*)buffer, 10000, 0);
j.join()
/*this->mutex.unlock();*/
decode_bytes(response,buffer, list,signals);
}
I tried mutex and this_thread::sleep_for() but everytime my main function keep running.
Is make program freeze possible ?
You use threads in order to allow things to keep running while something else is happening, so wanting to "stop main" seems counter-intuitive.
However, if you want to share data between threads (e.g. between the thread that runs main and a background thread) then you need to use some form of synchronization. One way to do that is to use a std::mutex. If you lock the mutex before every access, and unlock it afterwards (using std::lock_guard or std::unique_lock) then it will prevent another thread from locking the same mutex while you are accessing the data.
If you need to prevent concurrent access for a long time, then you should not hold a mutex for the whole time. Either consider whether threads are the best solution to your problem, or use a mutex-protected flag to indicate whether the data is ready, and then either poll or use std::condition_variable or similar to wait until the flag is set.

how avoid freezing other threads when one thread locks a big map

How to avoid freezing other threads which try to access the same map that is being locked by current thread? see below code:
//pseudo code
std::map<string, CSomeClass* > gBigMap;
void AccessMapForWriting(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
if (obj){
gBigMap.erase(aString);
delete obj;
obj = NULL;
}
pthread_mutex_unlock(&MapLock);
}
void AccessMapForReading(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
//below code consumes much time
//sometimes it even sleeps for milliseconds
if (obj){
obj->TimeConsumingOperation();
}
pthread_mutex_unlock(&MapLock);
}
//other threads will also call
//the same function -- AccessMap
void *OtherThreadFunc(void *){
//call AccessMap here
}
Consider using a read write lock instead, pthread_rwlock_t
There are some details here
It says
"Using a normal mutex, when a thread obtains the mutex all other
threads are forced to block until that mutex is released by the owner.
What about the situation where the vast majority of threads are simply
reading the data? If this is the case then we should not care if there
is 1 or up to N readers in the critical section at the same time. In
fact the only time we would normally care about exclusive ownership is
when a writer needs access to the code section."
You have a std::string as a key. Can you break down that key in a short suffix (possibly just a single letter) and a remainder? Because in that case, you might implement this datastructure as 255 maps with 255 locks. That of course means that most of the time, there's no lock contention because the suffix differs, and therefore the lock.

Is it possible to use mutex to lock only one element of a data structure ?

Is it possible to use mutex to lock only one element of a data structure ?
e.g.
boost::mutex m_mutex;
map<string, int> myMap;
// initialize myMap so that it has 10 elements
// then in thread 1
{
boost::unique_lock<boost::mutex> lock(m_mutex);
myMap[1] = 5 ; // write map[1]
}
// in thread 2
{
boost::unique_lock<boost::mutex> lock(m_mutex);
myMap[2] = 4 ; // write map[1]
}
My question:
When thread 1 is writing map[1], thread 2 can writing map[2] at the same time ?
The thread lock the whole map data structure or only an element, e.g. map[1] or map[2].
thanks
If you can guarantee that nobody is modifying the container itself (via insert and erase etc.), then as long as each thread accesses a different element of the container, you should be fine.
If you need per-element locking, you could modify the element type to something that offers synchronized access. (Worst case a pair of a mutex and the original value.)
You need a different mutex for every element of the map. You can do this with a map of mutex or adding a mutex to the mapped type (in your case it is int, so you can't do it without creating a new class like SharedInt)
Mutexes lock executable regions not objects. I always think about locking any code regions that read/modify thread objects. If an object is locked within a region but that object is accessible within another un-synchronized code region, you are not safe (ofcourse). In your case, I'd lock access to the entire object as insertions and reading from containers can easily experience context switching and thus increase the likelihood of data corruption.
Mutex is all about discipline. One thread can call write and other thread can call write1. C++ runtime will assume it is intentional. But most of the cases it is not the programmer intended. Summary is as long as all threads/methods follow the discipline (understand the the critical section and respect it) there will be consistency.
int i=0;
Write()
{
//Lock
i++;
//Unlock
}
Write1()
{
i++;
}

Threading using pthread

Let say I have an array of 5 threads :
//main thread
pthread_t t[5];
pthread_mutex_t mutex[5];
queue<int> q[5];
for(int i = 0; i < 5; i++){
pthread_create(&pthread_t[i], NULL, worker, NULL);
}
for(int i = 0; i < 5; i++){
pthread_mutex_lock(&mutex[i]);
queue[i].push_back(i);
pthread_mutex_unlock(&mutex[i]);
}
void* worker(void* arg){
pthread_mutex_lock(&mutex[?]);
}
I am confused with the mutex_lock here. My question is:
How could I let the worker know which mutex to lock?
When I access the mutex through mutex[i], do I need another lock since the child thread might be accessing the mutex array as well?
Thanks.
You need to be clear which threads are sharing which queues. The code you've written suggests each worker thread works on a specific queue, but the main thread (that spawns the workers) will be pushing back new values onto those queues. If that's what you want, then what you've done is basically correct, and you can let the worker threads know the array index of the mutex they're to lock/unlock by casting it to void* and passing it as the argument to pthread_create, which will in turn be passed as a void* to the worker function. You do not need any additional layer of locking around the mutex array - it is entirely safe to access specific elements independently, though if it were say a vector that was being resized at run-time, then you would need that extra level of locking.
Associate the mutex with the queue creating a new struct;
typedef struct {
pthread_mutex_t mutex;
queue<int> q;
} safe_queue;
safe_queue queue_pool [5];
void* worker(safe_queue){
pthread_mutex_lock(&safe_queue.mutex);
}
That last argument to the pthread_create is handed over to the thread when it's called, so you can just pass a value to the specific thread.
Since you want both a specific mutex and a specific queue, you're better off passing in the value of i directly.
for(int i = 0; i < 5; i++){
pthread_create(&pthread_t[i], NULL, worker, (void*)i);
}
void *worker (void *pvI) {
int idx = (int)pvI; // Check for cast problems.
// Use mutex[idx] and q[idx].
}
However, if you want to do it this way, I'd go for a single queue and mutex.
That's because the act of putting something on the queue is almost certainly going to be much faster than processing an item on the queue (otherwise you wouldn't need threads at all).
If you have multiple queues, the main thread has to figure out somehow which are the underutilised threads so it can select the best queue. If you have one queue and one mutex to protect it, the threads will self-organise for efficiency. Those threads that do long jobs won't try to get something from the queue. Those doing short jobs will come back sooner.
I should mention that mutexes on their own are not a good solution for this producer/consumer model. You can't have a thread lock the mutex then wait indefinitely on the queue since that will prevent the main thread putting anything on the queue.
So that means your worker threads will be constantly polling the queues looking for work.
If you use a mutex combined with a condition variable, it will be a lot more efficient. That's because the threads are signalled by the main thread when work is available rather than constantly grabbing the mutex, checking for work, then releasing the mutex.
The basic outline will be, for the main thread:
initialise
while not finished:
await work
lock mutex
put work on queue
signal condvar
unlock mutex
terminate
and, for the worker threads:
initialise
while not finished:
lock mutex
while queue is empty:
wait on condvar
get work from queue
unlock mutex
do work
terminate
Don't pass a NULL pointer as arg to the thread. Instead use a pointer to an object that defines what the thread has to do.
How could I let the worker know which mutex to lock?
Pass the number as the last parameter to pthread_create()
for(int i = 0; i < 5; i++)
{
pthread_create(&pthread_t[i], NULL, worker, reinterpret_cast<void*>(i));
}
Then you can get the value like this:
void* worker(void* arg)
{
int index = reinterpret_cast<int>(arg);
pthread_mutex_lock(&mutex[index]);
}
When I access the mutex through mutex[i], do I need another lock since the child thread might be accessing the mutex array as well?
No. Because the variable mutex itself is never modified. Each member of the array behaves in an atomic fashion via the pthread_mutext_X() methods.
A slightly better design would be:
//main thread
struct ThreadData
{
pthread_mutex_t mutex;
queue<int> queue;
};
pthread_t t[5];
ThreadData d[5];
for(int i = 0; i < 5; i++)
{
pthread_create(&t[i], NULL, worker, &d[i]); // Pass a pointer to ThreadData
}
void* worker(void* arg)
{
// Retrieve the ThreadData object.
ThreadData d = reinterpret_cast<ThreadData*>(arg);
pthread_mutex_lock(&(d.mutex));
<STUFF>
}