C++ Access to vector from multiple threads - c++

In my program I've some threads running. Each thread gets a pointer to some object (in my program - vector). And each thread modifies the vector.
And sometimes my program fails with a segm-fault. I thought it occurred because thread A begins doing something with the vector while thread B hasn't finished operating with it? Can it be true?
How am I supposed to fix it? Thread synchronization? Or maybe make a flag VectorIsInUse and set this flag to true while operating with it?

vector, like all STL containers, is not thread-safe. You have to explicitly manage the synchronization yourself. A std::mutex or boost::mutex could be use to synchronize access to the vector.
Do not use a flag as this is not thread-safe:
Thread A checks value of isInUse flag and it is false
Thread A is suspended
Thread B checks value of isInUse flag and it is false
Thread B sets isInUse to true
Thread B is suspended
Thread A is resumed
Thread A still thinks isInUse is false and sets it true
Thread A and Thread B now both have access to the vector
Note that each thread will have to lock the vector for the entire time it needs to use it. This includes modifying the vector and using the vector's iterators as iterators can become invalidated if the element they refer to is erase() or the vector undergoes an internal reallocation. For example do not:
mtx.lock();
std::vector<std::string>::iterator i = the_vector.begin();
mtx.unlock();
// 'i' can become invalid if the `vector` is modified.

If you want a container that is safe to use from many threads, you need to use a container that is explicitly designed for the purpose. The interface of the Standard containers is not designed for concurrent mutation or any kind of concurrency, and you cannot just throw a lock at the problem.
You need something like TBB or PPL which has concurrent_vector in it.

That's why pretty much every class library that offers threads also has synchronization primitives such as mutexes/locks. You need to setup one of these, and aquire/release the lock around every operation on the shared item (read AND write operations, since you need to prevent reads from occuring during a write too, not just preventing multiple writes happening concurrently).

Related

Multiple vector writers without locking

I have a few threads writing in a vector. It's possible that different threads try to write the same byte. There is no reads. Can I use only an atomic_fecth_or(), like in the example, so the vector will become thread safe? It compiled with GCC without errors or warnings.
std::vector<std::atomic<uint8_t>> MapVis(1024*1024);
void threador()
{
...
std::atomic_fetch_or(&MapVis[i], testor1);
}
It compiled with GCC without errors or warnings
That doesn't mean anything because compilers don't perform that sort of concurrency analysis. There are dedicated static analysis tools that may do this with varying levels of success.
Can I use only an atomic_fetch_or ...
you certainly can, and it will be safe at the level of each individual std::atomic<uint8_t>.
... the vector will become thread safe?
it's not sufficient that each element is accessed safely. You specifically need to avoid any operation that invalidates iterators (swap, resize, insert, push_back etc.).
I'd hesitate to say vector is thread-safe in this context - but you're limiting yourself to a thread-safe subset of its interface, so it will work correctly.
Note that as VTT suggests, keeping a separate partial vector per thread is better if possible. Partly because it's easier to prove correct, and partly because it avoids false sharing between cores.
Yes this is guaranteed to be thread safe due to atomic opperations being guaranteed of:
Isolation from interrupts, signals, concurrent processes and threads
Thus when you access an element of MapVis atomically you're guaranteed that any other process writing to it has already completed. And that your process will not be interrupted till it finishes writing.
The concern if you were using non-atomic variables would be that:
Thread A fetches the value of MapVis[i]
Thread B fetches the value of MapVis[i]
Thread A writes the ored value to MapVis[i]
Thread B writes the ored value to MapVis[i]
As you can see Thread B needed to wait until Thread A had finished writing otherwise it's just going to stomp Thread A's changes to MapVis[i]. With atomic variables the fetch and write cannot be interrupted by concurrent threads. Meaning that Thread B couldn't interrupt Thread A's read-write operations.

Safely Destroying a Thread Pool

Consider the following implementation of a trivial thread pool written in C++14.
threadpool.h
threadpool.cpp
Observe that each thread is sleeping until it's been notified to awaken -- or some spurious wake up call -- and the following predicate evaluates to true:
std::unique_lock<mutex> lock(this->instance_mutex_);
this->cond_handle_task_.wait(lock, [this] {
return (this->destroy_ || !this->tasks_.empty());
});
Furthermore, observe that a ThreadPool object uses the data member destroy_ to determine if its being destroyed -- the destructor has been called. Toggling this data member to true will notify each worker thread that it's time to finish its current task and any of the other queued tasks then synchronize with the thread that's destroying this object; in addition to prohibiting the enqueue member function.
For your convenience, the implementation of the destructor is below:
ThreadPool::~ThreadPool() {
{
std::lock_guard<mutex> lock(this->instance_mutex_); // this line.
this->destroy_ = true;
}
this->cond_handle_task_.notify_all();
for (auto &worker : this->workers_) {
worker.join();
}
}
Q: I do not understand why it's necessary to lock the object's mutex while toggling destroy_ to true in the destructor. Furthermore, is it only necessary for setting its value or is it also necessary for accessing its value?
BQ: Can this thread pool implementation be improved or optimized while maintaining it's original purpose; a thread pool that can pool N amount of threads and distribute tasks to them to be executed concurrently?
This thread pool implementation is forked from Jakob Progsch's C++11 thread pool repository with a thorough code step through to understand the purpose behind its implementation and some subjective style changes.
I am introducing myself to concurrent programming and there is still much to learn -- I am a novice concurrent programmer as it stands right now. If my questions are not worded correctly then please make the appropriate correction(s) in your provided answer. Moreover, if the answer can be geared towards a client who is being introduced to concurrent programming for the first time then that would be best -- for myself and any other novices as well.
If the owning thread of the ThreadPool object is the only thread that atomically writes to the destroy_ variable, and the worker threads only atomically read from the destroy_ variable, then no, a mutex is not needed to protect the destroy_ variable in the ThreadPool destructor. Typically a mutex is necessary when an atomic set of operations must take place that can't be accomplished through a single atomic instruction on a platform, (i.e., operations beyond an atomic swap, etc.). That being said, the author of the thread pool may be trying to force some type of acquire semantics on the destroy_ variable without restoring to atomic operations (i.e. a memory fence operation), and/or the setting of the flag itself is not considered an atomic operation (platform dependent)... Some other options include declaring the variable as volatile to prevent it from being cached, etc. You can see this thread for more info.
Without some sort of synchronization operation in place, the worst case scenario could end up with a worker that won't complete due to the destroy_ variable being cached on a thread. On platforms with weaker memory ordering models, that's always a possibility if you allowed a benign memory race condition to exist ...
C++ defines a data race as multiple threads potentially accessing an object simultaneously with at least one of those accesses being a write. Programs with data races have undefined behavior. If you were to write to destroy in your destructor without holding the mutex, your program would have undefined behavior and we cannot predict what would happen.
If you were to read destroy elsewhere without holding the mutex, that read could potentially happen while the destructor is writing to it which is also a data race.

Can an std::map be moved while iterating through it?

When I'm iterating through an std::map, is there a possibility that by for example adding an element to the map in another thread, the objects in it will be removed causing the iteration to be corrupt? (As the iterator will be pointing to a non-existing variable as it's moved)
In theory when you add an element to an std::map, all the iterators in that map should stay valid. But the problem is that the operations are not atomic. If the OS suspends the inserting thread in the middle of the operation and gives control back to the iterating thread, the state of std::map might be invalid.
You need to synchronize access to the map via mutex or something similar. Alternatively you could use concurrency friendly collection from TBB or another similar library. TBB provides concurrent_unordered_map and concurrent_hash_map.
STL containers aren't thread safe. No guarantees at all. So you need to synchronize access to any standard container if they are used by different threads.
Yes--if another thread may be modifying the vector, you'll need to use something like a mutex to assure that only one thread has access to the vector at any given time.
With a map, the effects of a modification are much more limited -- rather than potentially moving the entire contents of a vector, a modification only affects an individual node in the map. Nonetheless, if one thread deletes a node just as another thread is trying to read that node, bad things will happen, so you still need a mutex to assure that only one thread is operating on the map at any given time.

C++ Vector is this thread-safe? multithreading

So i have multiple threads accessing this function to retrieve database information, is it thread safe?
vector<vector<string> > Database::query(const char* query)
{
pthread_rwlock_wrlock(&mylock); //Write-lock
...
vector<vector<string> > results;
results.push...
pthread_rwlock_unlock(&mylock); //Write-lock
return results;
}
for editors -> sometimes 'fixing' > > to >> is not a good idea but thanks for the rest.
Since results is a local variable, it is in itself safe to use without locks, since there will be a unique copy per thread (it is on the stack, the contents of the vector dynamically allocated in some way, etc). So as long as your database is thread safe, you don't need any locks at all. If the DB is not threadsafe, you need to protect that, of course.
As noted in the other answer, if, for some reason, for example the creation of a string causes a throw bad_alloc;, you need to deal with the fallout of that, and make sure the lock is unlocked (unless you really wish to deadlock all other threads!)
Generally speaking, multiple threads can hold "read" locks. Only one thread can hold "write" lock. And no "read" locks might be held while there is a "write" lock.
It means that while mylock is held locked inside query method, no-one else can have it locked for either read or write, so it is thread-safe. You can read more about readers-writer lock here. Whether you need that mutex locked in there or not is another question.
The code is not exception-safe, however. You must employ RAII in order to unlock a mutex automatically, including on stack unwinding.
It is thread safe because results is created as a local variable, so only one thread will ever access it any instance of results within this method.
If you need a thread-safe vector for some other reason, see this answer on Threadsafe Vector Class for C++.

Thread safety for STL queue

I am using a queue to communicate between threads. I have one reader and multiple writer threads. My question is do I need to lock the queue every time when I use push/front/pop from the queue for the reader? Can I do something like the following:
//reader threads
getLock();
get the number of elements from the queue
releaseLock();
int i = 0;
while( i < numOfElements){
queue.front();
queue.pop();
i++
}
The idea is that I want to reduce the granularity of the locked code and since the writer thread would only write to the back of the queue and there is only a single reader thread. As long as I get the number of elements, then I could get the elements from the queue OR do I need to enclose the front() and pop() in the lock as well?
As others have already mentioned, standard containers are not required to guarantee thread safety so what you're asking for cannot be implemented portably. You can reduce the time your reader thread is locking the writers out by using 2 queues and a queue pointer that indicates the queue that is currently in use by the writers.
Each writer would:
Acquire lock
Push element(s) into the queue currently pointed to by the queue pointer
Release lock
The reader can then do the following:
Acquire lock
Switch queue pointer to point to the second queue
Release lock
Process elements from the first queue
Any type that doesn't explicitly state its thread-safety guarantees should always be controlled by a mutex. That said, your implementation's stdlib may allow some variation of this — but you can't know for all implementations of std::queue.
As std::queue wraps another container (it's a container adapter), you need to look at the underlying container, which defaults to deque.
You may find it easier, better, or more portable to write your own container adapter that makes the guarantees you need. I don't know of anything that does this exactly for a queue in Boost.
I haven't looked at C++0x enough to know if it has any solution for this out-of-the-box, but that could be another option.
This is absolutely implementation-dependent. The C++ standard makes no mention about threads or thread safety, so whether or not this will work depends on how your implementation handles queue elements.
In your case, the reader is actually popping the queue, which is considered a write operation. I doubt any of the common implementations actually guarantee thread-safety in this case, when multiple threads simultaneously write to a container. At least VC++ does not:
For reads to the same object, the object is thread safe for reading when no writers on other threads.
For writes to the same object, the object is thread safe for writing from one thread when no readers on other threads.
Sometimes you can resolve a lot of concurrency headache by avoiding sharing state or resources among threads. If you have multiple threads that access a container concurrently in order to push in their work then try to have them work on dedicated containers. At specific points you then collect the containers' elements onto the central container in a non-concurrent manner.
If you can avoid sharing state or resources among threads then you have no problem running threads concurrently. Threads then need not worry about each other, because they are completely isolated and bear no effect whatsoever on each other.
Your hunch is correct: Even though you cannot count on STD queue to be thread safe, a queue should be thread safe by design.
A nice explanation of why that is the case and a standard implementation of thread safe, lock free queues in C++ is given by van Dooren