Concurrency issue with std::map insert/erase - c++

I'm writing a threaded application that will process a list of resources and may or may not place a resulting item in a container (std::map) for each resource.
The processing of resources takes place in multiple threads.
The result container will be traversed and each item acted upon by a seperate thread which takes an item and updates a MySQL database (using mysqlcppconn API), then removes the item from the container and continues.
For simplicities sake, here's the overview of the logic:
queueWorker() - thread
getResourcesList() - seeds the global queue
databaseWorker() - thread
commitProcessedResources() - commits results to a database every n seconds
processResources() - thread x <# of processor cores>
processResource()
queueResultItem()
And the pseudo-implementation to show what I'm doing.
/* not the actual stucts, but just for simplicities sake */
struct queue_item_t {
int id;
string hash;
string text;
};
struct result_item_t {
string hash; // hexadecimal sha1 digest
int state;
}
std::map< string, queue_item_t > queue;
std::map< string, result_item_t > results;
bool processResource (queue_item_t *item)
{
result_item_t result;
if (some_stuff_that_doesnt_apply_to_all_resources)
{
result.hash = item->hash;
result.state = 1;
/* PROBLEM IS HERE */
queueResultItem(result);
}
}
void commitProcessedResources ()
{
pthread_mutex_lock(&resultQueueMutex);
// this can take a while since there
for (std::map< string, result_item_t >::iterator it = results.begin; it != results.end();)
{
// do mysql stuff that takes a while
results.erase(it++);
}
pthread_mutex_unlock(&resultQueueMutex);
}
void queueResultItem (result_item_t result)
{
pthread_mutex_lock(&resultQueueMutex);
results.insert(make_pair(result.hash, result));
pthread_mutex_unlock(&resultQueueMutex);
}
As indicated in processResource(), the problem is there and is that when commitProcessedResources() is running and resultQueueMutex is locked, we'll wait here for queueResultItem() to return since it'll try to lock the same mutex and therefore will wait until it's done, which might take a while.
Since there, obviously, is a limited number of threads running, as soon as all of them are waiting for queueResultItem() to finish, no more work will be done until the mutex is released and usable for queueResultItem().
So, my question is how I best go about implementing this? Is there a specific kind of standard container that can be inserted into and deleted from simultaneously or does there exist something that I just don't know of?
It is not strictly necessary that each queue item can have it's own unique key as is the case here with the std::map, but I would prefer it since several resources can produce the same result and I would prefer to only send a unique result to the database even if it does use INSERT IGNORE to ignore any duplicates.
I'm fairly new to C++ so I've no idea what to look for on Google, unfortunately. :(

You do not have to hold the lock for the queue all the time during processing in commitProcessedResources (). You can instead swap the queue with empty one:
void commitProcessedResources ()
{
std::map< string, result_item_t > queue2;
pthread_mutex_lock(&resultQueueMutex);
// XXX Do a quick swap.
queue2.swap (results);
pthread_mutex_unlock(&resultQueueMutex);
// this can take a while since there
for (std::map< string, result_item_t >::iterator it = queue2.begin();
it != queue2.end();)
{
// do mysql stuff that takes a while
// XXX You do not need this.
//results.erase(it++);
}
}

You will need to use synchronization methods (i.e. the mutex) to make this work properly. However, the goal of parallel programming is to minimize the critical section (i.e. the amount of code which is executed while you hold the lock).
That said, if your MySQL queries can be run in parallel without synchronization (i.e. multiple calls won't conflict with each other), take them out of the critical section. This will greatly reduce overhead. For instance, a simple refactor as follow could do the trick
void commitProcessedResources ()
{
// MOVING THIS LOCK
// this can take a while since there
pthread_mutex_lock(&resultQueueMutex);
std::map<string, result_item_t>::iterator end = results.end();
std::map<string, result_item_t>::iterator begin = results.begin();
pthread_mutex_unlock(&resultQueueMutex);
for (std::map< string, result_item_t >::iterator it = begin; it != end;)
{
// do mysql stuff that takes a while
pthread_mutex_lock(&resultQueueMutex); // Is this the only place we need it?
// This is a MUCH smaller critical section
results.erase(it++);
pthread_mutex_unlock(&resultQueueMutex); // Unlock or everything will block until end of loop
}
// MOVED UNLOCK
}
This will give you concurrent "real-time" access to the data across multiple threads. That is, as every write finishes, the map is updated and can be read elsewhere with current information.

Up through C++03, the standard didn't define anything about threading or thread safety at all (and since you're using pthreads, I'm guess that's pretty much what you're using).
As such, it's up to you to do locking on your shared map, to ensure that only one thread tries to access the map at any given time. Without that, you're likely to corrupt its internal data structure, so the map is no longer valid at all.
Alternatively (and I'd generally prefer this) you could have your multiple thread just put their data into a thread-safe queue, and have a single thread that gets data from that queue and puts it into the map. Since it's single-threaded, you no longer have to lock the map when its in use.
There are a few reasonable possibilities for dealing with the delay while you flush the map to the disk. Probably the simplest is to have the same thread read from the queue, insert into the map, and periodically flush the map to disk. In this case, the incoming data just sits in the queue while the map is being flushed to disk. This keeps access to the map simple -- since only one thread ever touches it directly, it can use the map without any locking.
Another would be to have two maps. At any given time, the thread that flushes to disk gets one map, and the thread the retrieves from the queue and inserts into the map gets the other. When the flushing thread needs to do its thing, it just swaps the roles of the two. Personally, I think I prefer the first though -- eliminating all the locking around the map has a great deal of appeal, at least to me.
Yet another variant that would maintain that simplicity would be for the queue->map thread to create map, fill it, and when it's full enough (i.e., after the appropriate length of time) stuff it into another queue, then repeat from the start (i.e., create new map, etc.) The flushing thread retrieves a map from its incoming queue, flushes it to disk, and destroys it. Though this adds a bit of overhead creating and destroying maps, you're not doing it often enough to care a lot. You still keep single-threaded access to any map at any time, and still keep all the database access segregated from everything else.

Related

Where can we use std::barrier over std::latch?

I recently heard new c++ standard features which are:
std::latch
std::barrier
I cannot figure it out ,in which situations that they are applicable and useful over one-another.
If someone can raise an example for how to use each one of them wisely it would be really helpful.
Very short answer
They're really aimed at quite different goals:
Barriers are useful when you have a bunch of threads and you want to synchronise across of them at once, for example to do something that operates on all of their data at once.
Latches are useful if you have a bunch of work items and you want to know when they've all been handled, and aren't necessarily interested in which thread(s) handled them.
Much longer answer
Barriers and latches are often used when you have a pool of worker threads that do some processing and a queue of work items that is shared between. It's not the only situation where they're used, but it is a very common one and does help illustrate the differences. Here's some example code that would set up some threads like this:
const size_t worker_count = 7; // or whatever
std::vector<std::thread> workers;
std::vector<Proc> procs(worker_count);
Queue<std::function<void(Proc&)>> queue;
for (size_t i = 0; i < worker_count; ++i) {
workers.push_back(std::thread(
[p = &procs[i], &queue]() {
while (auto fn = queue.pop_back()) {
fn(*p);
}
}
));
}
There are two types that I have assumed exist in that example:
Proc: a type specific to your application that contains data and logic necessary to process work items. A reference to one is passed to each callback function that's run in the thread pool.
Queue: a thread-safe blocking queue. There is nothing like this in the C++ standard library (somewhat surprisingly) but there are a lot of open-source libraries containing them e.g. Folly MPMCQueue or moodycamel::ConcurrentQueue, or you can build a less fancy one yourself with std::mutex, std::condition_variable and std::deque (there are many examples of how to do this if you Google for them).
Latch
A latch is often used to wait until some work items you push onto the queue have all finished, typically so you can inspect the result.
std::vector<WorkItem> work = get_work();
std::latch latch(work.size());
for (WorkItem& work_item : work) {
queue.push_back([&work_item, &latch](Proc& proc) {
proc.do_work(work_item);
latch.count_down();
});
}
latch.wait();
// Inspect the completed work
How this works:
The threads will - eventually - pop the work items off of the queue, possibly with multiple threads in the pool handling different work items at the same time.
As each work item is finished, latch.count_down() is called, effectively decrementing an internal counter that started at work.size().
When all work items have finished, that counter reaches zero, at which point latch.wait() returns and the producer thread knows that the work items have all been processed.
Notes:
The latch count is the number of work items that will be processed, not the number of worker threads.
The count_down() method could be called zero times, one time, or multiple times on each thread, and that number could be different for different threads. For example, even if you push 7 messages onto 7 threads, it might be that all 7 items are processed onto the same one thread (rather than one for each thread) and that's fine.
Other unrelated work items could be interleaved with these ones (e.g. because they weree pushed onto the queue by other producer threads) and again that's fine.
In principle, it's possible that latch.wait() won't be called until after all of the worker threads have already finished processing all of the work items. (This is the sort of odd condition you need to look out for when writing threaded code.) But that's OK, it's not a race condition: latch.wait() will just immediately return in that case.
An alternative to using a latch is that there's another queue, in addition to the one shown here, that contains the result of the work items. The thread pool callback pushes results on to that queue while the producer thread pops results off of it. Basically, it goes in the opposite direction to the queue in this code. That's a perfectly valid strategy too, in fact if anything it's more common, but there are other situations where the latch is more useful.
Barrier
A barrier is often used to make all threads wait simultaneously so that the data associated with all of the threads can be operated on simultaneously.
typedef Fn std::function<void()>;
Fn completionFn = [&procs]() {
// Do something with the whole vector of Proc objects
};
auto barrier = std::make_shared<std::barrier<Fn>>(worker_count, completionFn);
auto workerFn = [barrier](Proc&) {
barrier->count_down_and_wait();
};
for (size_t i = 0; i < worker_count; ++i) {
queue.push_back(workerFn);
}
How this works:
All of the worker threads will pop one of these workerFn items off of the queue and call barrier.count_down_and_wait().
Once all of them are waiting, one of them will call completionFn() while the others continue to wait.
Once that function completes they will all return from count_down_and_wait() and be free to pop other, unrelated, work items from the queue.
Notes:
Here the barrier count is the number of worker threads.
It is guaranteed that each thread will pop precisely one workerFn off of the queue and handle it. Once a thread has popped one off of the queue, it will wait in barrier.count_down_and_wait() until all the other copies of workerFn have been popped off by other threads, so there is no chance of it popping another one off.
I used a shared pointer to the barrier so that it will be destroyed automatically once all the work items are done. This wasn't an issue with the latch because there we could just make it a local variable in the producer thread function, because it waits until the worker threads have used the latch (it calls latch.wait()). Here the producer thread doesn't wait for the barrier so we need to manage the memory in a different way.
If you did want the original producer thread to wait until the barrier has been finished, that's fine, it can call count_down_and_wait() too, but you will obviously need to pass worker_count + 1 to the barrier's constructor. (And then you wouldn't need to use a shared pointer for the barrier.)
If other work items are being pushed onto the queue at the same time, that's fine too, although it will potentially waste time as some threads will just be sitting there waiting for the barrier to be acquired while other threads are distracted by other work before they acquire the barrier.
!!! DANGER !!!
The last bullet point about other working being pushed onto the queue being "fine" is only the case if that other work doesn't also use a barrier! If you have two different producer threads putting work items with a barrier on to the same queue and those items are interleaved, then some threads will wait on one barrier and others on the other one, and neither will ever reach the required wait count - DEADLOCK. One way to avoid this is to only ever use barriers like this from a single thread, or even to only ever use one barrier in your whole program (this sounds extreme but is actually quite a common strategy, as barriers are often used for one-time initialisation on startup). Another option, if the thread queue you're using supports it, is to atomically push all work items for the barrier onto the queue at once so they're never interleaved with any other work items. (This won't work with the moodycamel queue, which supports pushing multiple items at once but doesn't guarantee that they won't be interleved with items pushed on by other threads.)
Barrier without completion function
At the point when you asked this question, the proposed experimental API didn't support completion functions. Even the current API at least allows not using them, so I thought I should show an example of how barriers can be used like that too.
auto barrier = std::make_shared<std::barrier<>>(worker_count);
auto workerMainFn = [&procs, barrier](Proc&) {
barrier->count_down_and_wait();
// Do something with the whole vector of Proc objects
barrier->count_down_and_wait();
};
auto workerOtherFn = [barrier](Proc&) {
barrier->count_down_and_wait(); // Wait for work to start
barrier->count_down_and_wait(); // Wait for work to finish
}
queue.push_back(std::move(workerMainFn));
for (size_t i = 0; i < worker_count - 1; ++i) {
queue.push_back(workerOtherFn);
}
How this works:
The key idea is to wait for the barrier twice in each thread, and do the work in between. The first waits have the same purpose as the previous example: they ensure any earlier work items in the queue are finished before starting this work. The second waits ensure that any later items in the queue don't start until this work has finished.
Notes:
The notes are mostly the same as the previous barrier example, but here are some differences:
One difference is that, because the barrier is not tied to the specific completion function, it's more likely that you can share it between multiple uses, like we did in the latch example, avoiding the use of a shared pointer.
This example makes it look like using a barrier without a completion function is much more fiddly, but that's just because this situation isn't well suited to them. Sometimes, all you need is to reach the barrier. For example, whereas we initialised a queue before the threads started, maybe you have a queue for each thread but initialised in the threads' run functions. In that case, maybe the barrier just signifies that the queues have been initialised and are ready for other threads to pass messages to each other. In that case, you can use a barrier with no completion function without needing to wait on it twice like this.
You could actually use a latch for this, calling count_down() and then wait() in place of count_down_and_wait(). But using a barrier makes more sense, both because calling the combined function is a little simpler and because using a barrier communicates your intention better to future readers of the code.
Any any case, the "DANGER" warning from before still applies.

should i always lock the global data in multi-thread programming, why or why not?

I'm new to multi-thread programming(actually, i'm not a fresh man in multi-threading, but i always use global data for reading and writing thread, i think it makes my code ugly and slow, i'm eager to improve my skill)
and i'm now developing a forwarder server using c++, for simplify the question, we suppose there are only two threads, a receiving-thread and a sending-thread, and, the stupid design as usual, I have an global std::list for saving data :(
receiving-thread read raw data from server and wirte it into global std::list.
sending-thread read global std::list and send it to several clients.
i use pthread_mutex_lock to sync the global std::list.
the problem is that the performance of forward server is poor, global list locked when receiving-thread is wrting, but at that time, my sending-thread wanna read, so it must waiting, but i think this waiting is useless.
what should i do, i know that global is bad, but, without global, how can i sync these two threads?
i'll keep searching from SO and google.
any suggestions, guides, technology or books will be appreciated. thanks!
EDIT
for any suggestions, i wanna know why or why not, please give me the reason, thanks a lot.
Notes:
Please provide more complete examples: http://sscce.org/
Answers:
Yes, you should synchronize access to shared data.
NOTE: this makes assumptions about std::list implementation - which may or may not apply to your case - but since this assumptions is valid for some implementation you cannot assume your implementation must be thread safe without some explicit guarantee
Consider the snippet:
std::list g_list;
void thread1()
{
while( /*input ok*/ )
{
/*read input*/
g_list.push_back( /*something*/ );
}
}
void thread2()
{
while( /*something*/ )
{
/*pop from list*/
data x = g_list.front();
g_list.pop_front();
}
}
say for example list has 1 element in it
std::list::push_back() must do:
allocate space (many CPU instructions)
copy data into new space (many CPU instructions)
update previous element (if it exists) to point to new element
set std::list::_size
std::list::pop_front() must do:
free space
update next element to not have previous element
set std::list_size
Now say thread 1 calls push_back() - after checking that there is an element (check on size) - it continues to update this element - but right after this - before it gets a chance to update the element - thread 2 could be running pop_front - and be busy freeing the memory for the first element - which could result then in thread 1 causing a segmentation fault - or even memory corruption. Similarly updates to size could result in push_back winning over pop_front's update - and then you have size 2 when you only have 1 element.
Do not use pthread_* in C++ unless you really know what your doing - use std::thread (c++11) or boost::thread - or wrap pthread_* in a class by yourself - because if you don't consider exceptions you will end up with deadlocks
You cannot get past some form of synchronization in this specific example - but you could optimize synchronization
Don't copy the data itself into and out of the std::list - copy a pointer to the data into and out of the list
Only lock while your actually accessing the std::list - but don't make this mistake:
{
// lock
size_t i = g_list.size();
// unlock
if ( i )
{
// lock
// work with g_list ...
// unlock
}
}
A more appropriate pattern here would be a message queue - you can implement one with a mutex, a list and a condition variable. Here are some implementations you can look at:
http://pocoproject.org/docs/Poco.Notification.html
http://gnodebian.blogspot.com.es/2013/07/a-thread-safe-asynchronous-queue-in-c11.html
http://docs.wxwidgets.org/trunk/classwx_message_queue_3_01_t_01_4.html
google for more
There is also the option of atomic containers, look at:
http://calumgrant.net/atomic/ - not sure if this is backed by actual atomic storage (as opposed to just using synchronization behind an interface)
google for more
You could also go for an asynchronous approach with boost::asio - though your case should be quite fast if done right.

Multithreading with a multimap

Enviroment : Windows 7.0 , C++ , Multithreading
I have created a new worker thread to receive data on socket and add it into a static multimap instance.
code snippet:
//remember mymultimap is static data type
static std::multimap<string,string> mymultimap;
EnterCriticalSection(&m_criticalsection);
mymultimap.insert ( "aaa", "bbb") );
LeaveCriticalSection(&m_criticalsection);
At same time my main thread is reading same static multimap :
code snap :
EnterCriticalSection(&m_criticalsection);
std::multimap<string,string>::iterator it = mymultimap.begin();
for( ; it != mymultimap.end(); it++)
{
std::string firstName = (*it).first;
std::string secondName = (*it).second;
}
LeaveCriticalSection(&m_criticalsection);
As the main and work threads are continously doing read and write, it hampers my application peformance.
Also the instance of multimap contains huge data (more than 10,000 records).
How can I make thread lock for a minimal time in multimap?
EnterCriticalSection(&m_criticalsection);
///minimal lock time for Map ???
LeaveCriticalSection(&m_criticalsection);
Please help me in improve my application performance.
As it stands your question leaves too much room for discussion: we don't know how the values stored in your multimap are actually used.
If:
the order enforced in that data structure is important,
you need to keep the values in the multimap even after they have been read,
you need to go through all the entries each time you read,
then you are pretty much stuck as to how we can optimize the use of that structure.
On the other hand, if you can relax one of these requirements somehow, then you may have possibilities to optimize things a bit, for instance by using a message queue instead of the map directly for communication between both threads.
Message queues are a standard way to implement efficient communication between threads, and for one to one setup, there are even lockless solutions.
Update: thinking about it, sharing that type of structure accross threads is not a good idea, whatever use you make of it. It is better to regroup all accesses to a multimap within one single thread, and thus have items generated by other threads passed on to the thread managing it through a queue. This completely decouples the work of generating the items from their storage and use. In your case, the producer thread will lose less time storing the data, which leaves it more time to handle the socket stream.
So, for that solution, you need a queue<std::pair<key,value> >, say std::queue, to be handled to both threads at their initialization, or alternatively a static instance like the multimap one. Then simply replace the multimap::insert in the first thread by a queue::push_back of a make_pair(key, value), and symmetrically in the consumer thread, fisrt have a pop_front of all the pending pairs in the queue, inserting them in the map at the same time, then implement your processing of your map, whatever it is.
Note:
Please be aware that if you are using a multimap, you might end up with multiple values for the same key: the call to find will return an iterator, and you might well have to check the next entries of the multimap to make sure you get all the values with the same keys .

Lists and multithreaded environments

I'm fairly new to C++ standard library and have been using standard library lists for specific multithreaded implementation. I noticed that there might be a trick with using lists that I have not seen on any tutorial/blog/forum posts and though seems obvious to me does not seem to be considered by anyone. So maybe I'm too new and possibly missing something so hopefully someone smarter than me can perhaps validate what I am trying to achieve or explain to me what I am doing wrong.
So we know that in general standard library containers are not thread safe - but this seems like a guiding statement more than a rule. With lists it seems that there is a level of tolerance for thread safety. Let me explain, we know that lists do not get invalidated if we add/delete from the list. The only iterator that gets invalidated is the deleted item - which you can fix with the following line of code:
it = myList.erase(it)
So now lets say we have two threads and call them thread 1 and thread 2.
Thread 1's responsibility is to add to the list. It treats it as a queue, so it uses the std::list::push_back() function call.
Thread 2's responsibility is to process the data stored in the list as a queue and then after processing it will remove elements from the list.
Its guaranteed that Thread 2 will not remove elements in the list that were just added during its processing and Thread 1 guarantees that it will queue up the necessary data well ahead for Thread 2's processing. However, keep in mind elements can be added during Thread 2's processing.
So it seems that this is a reasonable use of lists in this multithreaded environment without the use of a locks for data protection. The reason why I say its reasonable is because, essentially, Thread 2 will only process data up to now such that it can retreive the current end iterator shown by the following pseudocode:
Thread 2 {
iter = myList.begin();
lock();
iterEnd = myList.end(); // lock data temporarily in order to get the current
// last element in the list
unlock();
// perform necessary processing
while (iter != iterEnd) {
// process data
// ...
// remove element
iter = myList.erase(iter);
}
}
Thread 2 uses a lock for a very short amount of time just to know where to stop processing, but for the most part Thread 1 and Thread 2 don't require any other locking. In addition, Thread 2 can possibly avoid locking too if its scope to know the current last element is flexible.
Does anyone see anything wrong with my suggestion?
Thanks!
Your program is racy. As an example of one obvious data race: std::list is more than just a collection of doubly-linked nodes. It also has, for example, a data member that stores the number of nodes in the list (it needs not be a single data member, but it has to store the count somewhere).
Both of your threads will modify this data member concurrently. Because there is no synchronization of those modifications, your program is racy.
Instances of the Standard Library containers cannot be mutated from multiple threads concurrently without external synchronization.

Concurrency with a producer/consumer (sort of) and STL

I have the following situation: I have two threads
thread1, which is a worker thread that executes an algorithm until its input list size is > 0
thread2, which is asynchronous (user driven) and can add elements to the input list to be processed
Now, thread1 loop does something similar to the following
list input_list
list buffer_list
if (input_list.size() == 0)
sleep
while (input_list.size() > 0) {
for (item in input_list) {
process(item);
possibly add items to buffer_list
}
input_list = buffer_list (or copy it)
buffer_list = new list (or empty it)
sleep X ms (100 < X < 500, still have to decide)
}
Now thread2 will just add elements to buffer_list (which will be the next pass of the algorithm) and possibly manage to awake thread1 if it was stopped.
I'm trying to understand which multithread issues can occur in this situation, assuming that I'm programming it into C++ with aid of STL (no assumption on thread-safety of the implementation), and I have of course access to standard library (like mutex).
I would like to avoid any possible delay with thread2, since it's bound to user interface and it would create delays. I was thinking about using 3 lists to avoid synchronization issues but I'm not real sure so far. I'm still unsure either if there is a safer container within STL according to this specific situation.. I don't want to just place a mutex outside everything and lose so much performance.
Any advice would be very appreciated, thanks!
EDIT:
This is what I managed so far, wondering if it's thread safe and enough efficient:
std::set<Item> *extBuffer, *innBuffer, *actBuffer;
void thread1Function()
{
actBuffer->clear();
sem_wait(&mutex);
if (!extBuffer->empty())
std::swap(actBuffer, extBuffer);
sem_post(&mutex);
if (!innBuffer->empty())
{
if (actBuffer->empty())
std::swap(innBuffer, actBuffer);
else if (!innBuffer->empty())
actBuffer->insert(innBuffer->begin(), innBuffer->end());
}
if (!actBuffer->empty())
{
set<Item>::iterator it;
for (it = actBuffer.begin; it != actBuffer.end(); ++it)
// process
// possibly innBuffer->insert(...)
}
}
void thread2Add(Item item)
{
sem_wait(&mutex);
extBuffer->insert(item);
sem_post(&mutex);
}
Probably I should open another question
If you are worried about thread2 being blocked for a long time because thread1 is holding on to the lock, then make sure that thread1 guarentees to only take the lock for a really short time.
This can be easily accomplished if you have two instances of buffer list. So your attempt is already in the right direction.
Each buffer is pointed to with a pointer. One pointer you use to insert items into the list (thread2) and the other pointer is used to process the items in the other list (thread1). The insert operation of thread2 must be surrounded by a lock.
If thread1 is done processing all the items it only has to swap the pointers (e.g. with std::swap) this is a very quick operation which must be surrounded by a lock. Only the swap operation though. The actual processing of the items is lock-free.
This solution has the following advantages:
The lock in thread1 is always very short, so the amount of time that it may block thread2 is minimal
No constant dynamic allocation of buffers, which is faster and less likely to cause memory leak bugs.
You just need a mutex around inserting, removing, or when accessing the size of the container. You could develop a class to encapsulate the container and that would have the mutex. This would keep things simple and the class would handle the functionally of using he mutex. If you limit the items accessing (what is exposed for functions/interfaces) and make the functions small (just calling the container classes function enacapsulated in the mutex), they will return relatively quickly. You should need only on list for this in that case.
Depending on the system, if you have semaphore available, you may want to check and see if they are more efficent and use them instead of the mutex. Same concept applies, just in a different manner.
You may want to check in the concept of guards in case one of the threads dies, you would not get a deadlock condition.