Multithreading with a multimap - c++

Enviroment : Windows 7.0 , C++ , Multithreading
I have created a new worker thread to receive data on socket and add it into a static multimap instance.
code snippet:
//remember mymultimap is static data type
static std::multimap<string,string> mymultimap;
EnterCriticalSection(&m_criticalsection);
mymultimap.insert ( "aaa", "bbb") );
LeaveCriticalSection(&m_criticalsection);
At same time my main thread is reading same static multimap :
code snap :
EnterCriticalSection(&m_criticalsection);
std::multimap<string,string>::iterator it = mymultimap.begin();
for( ; it != mymultimap.end(); it++)
{
std::string firstName = (*it).first;
std::string secondName = (*it).second;
}
LeaveCriticalSection(&m_criticalsection);
As the main and work threads are continously doing read and write, it hampers my application peformance.
Also the instance of multimap contains huge data (more than 10,000 records).
How can I make thread lock for a minimal time in multimap?
EnterCriticalSection(&m_criticalsection);
///minimal lock time for Map ???
LeaveCriticalSection(&m_criticalsection);
Please help me in improve my application performance.

As it stands your question leaves too much room for discussion: we don't know how the values stored in your multimap are actually used.
If:
the order enforced in that data structure is important,
you need to keep the values in the multimap even after they have been read,
you need to go through all the entries each time you read,
then you are pretty much stuck as to how we can optimize the use of that structure.
On the other hand, if you can relax one of these requirements somehow, then you may have possibilities to optimize things a bit, for instance by using a message queue instead of the map directly for communication between both threads.
Message queues are a standard way to implement efficient communication between threads, and for one to one setup, there are even lockless solutions.
Update: thinking about it, sharing that type of structure accross threads is not a good idea, whatever use you make of it. It is better to regroup all accesses to a multimap within one single thread, and thus have items generated by other threads passed on to the thread managing it through a queue. This completely decouples the work of generating the items from their storage and use. In your case, the producer thread will lose less time storing the data, which leaves it more time to handle the socket stream.
So, for that solution, you need a queue<std::pair<key,value> >, say std::queue, to be handled to both threads at their initialization, or alternatively a static instance like the multimap one. Then simply replace the multimap::insert in the first thread by a queue::push_back of a make_pair(key, value), and symmetrically in the consumer thread, fisrt have a pop_front of all the pending pairs in the queue, inserting them in the map at the same time, then implement your processing of your map, whatever it is.
Note:
Please be aware that if you are using a multimap, you might end up with multiple values for the same key: the call to find will return an iterator, and you might well have to check the next entries of the multimap to make sure you get all the values with the same keys .

Related

Efficiently iterating through a map while inserting on other thread

I have an std::map < std::string, std::string > which is having values added to it at irregular intervals from one thread (but frequently and needs to be very fast), and occasionally having groups of entries removed.
I need from a different thread to dump a snapshot of the map as text to a debug log on command from a user.
Clearly it's not thread safe to just iterate through the map outputting the debug information while it could be updated so I'm currently taking a read lock (mutex) before dumping the data and a write lock for every insert or delete. This works fine, but I can't really lock the map for this long, it delays the processing of incoming updates too much.
I don't believe I can lock and unlock the debug dump thread for each item as modifying the map from the other thread can invalidate the iterator I believe.
Is there any way I can do this safely without having to take out a read lock on the whole data structure while I write it out so that new values can still be inserted quickly? I realise I won't be able to get a guarenteed consistent view of the data if values can be added and removed while I'm iterating though it, but as long as it's safe that's understood.
If there is no way to use a map for this, can anyone suggest any other data structure I could use?
edit: I'm hoping for a solution that means I don't need to take out an expensive lock when adding an item.
There are 2 solutions I can see at this moment:
(Easy, but might still take too long): copy the map (or assign to another container) while locked and then dump the local copy to the debug log while not locked
(Some more work): delegate the updates of the map to another thread via a queue. If the other thread is the one that dumps to the debug log, then you don't need the locks anymore. This way the fast threads are only locked while accessing the queue.

Lists and multithreaded environments

I'm fairly new to C++ standard library and have been using standard library lists for specific multithreaded implementation. I noticed that there might be a trick with using lists that I have not seen on any tutorial/blog/forum posts and though seems obvious to me does not seem to be considered by anyone. So maybe I'm too new and possibly missing something so hopefully someone smarter than me can perhaps validate what I am trying to achieve or explain to me what I am doing wrong.
So we know that in general standard library containers are not thread safe - but this seems like a guiding statement more than a rule. With lists it seems that there is a level of tolerance for thread safety. Let me explain, we know that lists do not get invalidated if we add/delete from the list. The only iterator that gets invalidated is the deleted item - which you can fix with the following line of code:
it = myList.erase(it)
So now lets say we have two threads and call them thread 1 and thread 2.
Thread 1's responsibility is to add to the list. It treats it as a queue, so it uses the std::list::push_back() function call.
Thread 2's responsibility is to process the data stored in the list as a queue and then after processing it will remove elements from the list.
Its guaranteed that Thread 2 will not remove elements in the list that were just added during its processing and Thread 1 guarantees that it will queue up the necessary data well ahead for Thread 2's processing. However, keep in mind elements can be added during Thread 2's processing.
So it seems that this is a reasonable use of lists in this multithreaded environment without the use of a locks for data protection. The reason why I say its reasonable is because, essentially, Thread 2 will only process data up to now such that it can retreive the current end iterator shown by the following pseudocode:
Thread 2 {
iter = myList.begin();
lock();
iterEnd = myList.end(); // lock data temporarily in order to get the current
// last element in the list
unlock();
// perform necessary processing
while (iter != iterEnd) {
// process data
// ...
// remove element
iter = myList.erase(iter);
}
}
Thread 2 uses a lock for a very short amount of time just to know where to stop processing, but for the most part Thread 1 and Thread 2 don't require any other locking. In addition, Thread 2 can possibly avoid locking too if its scope to know the current last element is flexible.
Does anyone see anything wrong with my suggestion?
Thanks!
Your program is racy. As an example of one obvious data race: std::list is more than just a collection of doubly-linked nodes. It also has, for example, a data member that stores the number of nodes in the list (it needs not be a single data member, but it has to store the count somewhere).
Both of your threads will modify this data member concurrently. Because there is no synchronization of those modifications, your program is racy.
Instances of the Standard Library containers cannot be mutated from multiple threads concurrently without external synchronization.

Concurrency issue with std::map insert/erase

I'm writing a threaded application that will process a list of resources and may or may not place a resulting item in a container (std::map) for each resource.
The processing of resources takes place in multiple threads.
The result container will be traversed and each item acted upon by a seperate thread which takes an item and updates a MySQL database (using mysqlcppconn API), then removes the item from the container and continues.
For simplicities sake, here's the overview of the logic:
queueWorker() - thread
getResourcesList() - seeds the global queue
databaseWorker() - thread
commitProcessedResources() - commits results to a database every n seconds
processResources() - thread x <# of processor cores>
processResource()
queueResultItem()
And the pseudo-implementation to show what I'm doing.
/* not the actual stucts, but just for simplicities sake */
struct queue_item_t {
int id;
string hash;
string text;
};
struct result_item_t {
string hash; // hexadecimal sha1 digest
int state;
}
std::map< string, queue_item_t > queue;
std::map< string, result_item_t > results;
bool processResource (queue_item_t *item)
{
result_item_t result;
if (some_stuff_that_doesnt_apply_to_all_resources)
{
result.hash = item->hash;
result.state = 1;
/* PROBLEM IS HERE */
queueResultItem(result);
}
}
void commitProcessedResources ()
{
pthread_mutex_lock(&resultQueueMutex);
// this can take a while since there
for (std::map< string, result_item_t >::iterator it = results.begin; it != results.end();)
{
// do mysql stuff that takes a while
results.erase(it++);
}
pthread_mutex_unlock(&resultQueueMutex);
}
void queueResultItem (result_item_t result)
{
pthread_mutex_lock(&resultQueueMutex);
results.insert(make_pair(result.hash, result));
pthread_mutex_unlock(&resultQueueMutex);
}
As indicated in processResource(), the problem is there and is that when commitProcessedResources() is running and resultQueueMutex is locked, we'll wait here for queueResultItem() to return since it'll try to lock the same mutex and therefore will wait until it's done, which might take a while.
Since there, obviously, is a limited number of threads running, as soon as all of them are waiting for queueResultItem() to finish, no more work will be done until the mutex is released and usable for queueResultItem().
So, my question is how I best go about implementing this? Is there a specific kind of standard container that can be inserted into and deleted from simultaneously or does there exist something that I just don't know of?
It is not strictly necessary that each queue item can have it's own unique key as is the case here with the std::map, but I would prefer it since several resources can produce the same result and I would prefer to only send a unique result to the database even if it does use INSERT IGNORE to ignore any duplicates.
I'm fairly new to C++ so I've no idea what to look for on Google, unfortunately. :(
You do not have to hold the lock for the queue all the time during processing in commitProcessedResources (). You can instead swap the queue with empty one:
void commitProcessedResources ()
{
std::map< string, result_item_t > queue2;
pthread_mutex_lock(&resultQueueMutex);
// XXX Do a quick swap.
queue2.swap (results);
pthread_mutex_unlock(&resultQueueMutex);
// this can take a while since there
for (std::map< string, result_item_t >::iterator it = queue2.begin();
it != queue2.end();)
{
// do mysql stuff that takes a while
// XXX You do not need this.
//results.erase(it++);
}
}
You will need to use synchronization methods (i.e. the mutex) to make this work properly. However, the goal of parallel programming is to minimize the critical section (i.e. the amount of code which is executed while you hold the lock).
That said, if your MySQL queries can be run in parallel without synchronization (i.e. multiple calls won't conflict with each other), take them out of the critical section. This will greatly reduce overhead. For instance, a simple refactor as follow could do the trick
void commitProcessedResources ()
{
// MOVING THIS LOCK
// this can take a while since there
pthread_mutex_lock(&resultQueueMutex);
std::map<string, result_item_t>::iterator end = results.end();
std::map<string, result_item_t>::iterator begin = results.begin();
pthread_mutex_unlock(&resultQueueMutex);
for (std::map< string, result_item_t >::iterator it = begin; it != end;)
{
// do mysql stuff that takes a while
pthread_mutex_lock(&resultQueueMutex); // Is this the only place we need it?
// This is a MUCH smaller critical section
results.erase(it++);
pthread_mutex_unlock(&resultQueueMutex); // Unlock or everything will block until end of loop
}
// MOVED UNLOCK
}
This will give you concurrent "real-time" access to the data across multiple threads. That is, as every write finishes, the map is updated and can be read elsewhere with current information.
Up through C++03, the standard didn't define anything about threading or thread safety at all (and since you're using pthreads, I'm guess that's pretty much what you're using).
As such, it's up to you to do locking on your shared map, to ensure that only one thread tries to access the map at any given time. Without that, you're likely to corrupt its internal data structure, so the map is no longer valid at all.
Alternatively (and I'd generally prefer this) you could have your multiple thread just put their data into a thread-safe queue, and have a single thread that gets data from that queue and puts it into the map. Since it's single-threaded, you no longer have to lock the map when its in use.
There are a few reasonable possibilities for dealing with the delay while you flush the map to the disk. Probably the simplest is to have the same thread read from the queue, insert into the map, and periodically flush the map to disk. In this case, the incoming data just sits in the queue while the map is being flushed to disk. This keeps access to the map simple -- since only one thread ever touches it directly, it can use the map without any locking.
Another would be to have two maps. At any given time, the thread that flushes to disk gets one map, and the thread the retrieves from the queue and inserts into the map gets the other. When the flushing thread needs to do its thing, it just swaps the roles of the two. Personally, I think I prefer the first though -- eliminating all the locking around the map has a great deal of appeal, at least to me.
Yet another variant that would maintain that simplicity would be for the queue->map thread to create map, fill it, and when it's full enough (i.e., after the appropriate length of time) stuff it into another queue, then repeat from the start (i.e., create new map, etc.) The flushing thread retrieves a map from its incoming queue, flushes it to disk, and destroys it. Though this adds a bit of overhead creating and destroying maps, you're not doing it often enough to care a lot. You still keep single-threaded access to any map at any time, and still keep all the database access segregated from everything else.

Concurrency with a producer/consumer (sort of) and STL

I have the following situation: I have two threads
thread1, which is a worker thread that executes an algorithm until its input list size is > 0
thread2, which is asynchronous (user driven) and can add elements to the input list to be processed
Now, thread1 loop does something similar to the following
list input_list
list buffer_list
if (input_list.size() == 0)
sleep
while (input_list.size() > 0) {
for (item in input_list) {
process(item);
possibly add items to buffer_list
}
input_list = buffer_list (or copy it)
buffer_list = new list (or empty it)
sleep X ms (100 < X < 500, still have to decide)
}
Now thread2 will just add elements to buffer_list (which will be the next pass of the algorithm) and possibly manage to awake thread1 if it was stopped.
I'm trying to understand which multithread issues can occur in this situation, assuming that I'm programming it into C++ with aid of STL (no assumption on thread-safety of the implementation), and I have of course access to standard library (like mutex).
I would like to avoid any possible delay with thread2, since it's bound to user interface and it would create delays. I was thinking about using 3 lists to avoid synchronization issues but I'm not real sure so far. I'm still unsure either if there is a safer container within STL according to this specific situation.. I don't want to just place a mutex outside everything and lose so much performance.
Any advice would be very appreciated, thanks!
EDIT:
This is what I managed so far, wondering if it's thread safe and enough efficient:
std::set<Item> *extBuffer, *innBuffer, *actBuffer;
void thread1Function()
{
actBuffer->clear();
sem_wait(&mutex);
if (!extBuffer->empty())
std::swap(actBuffer, extBuffer);
sem_post(&mutex);
if (!innBuffer->empty())
{
if (actBuffer->empty())
std::swap(innBuffer, actBuffer);
else if (!innBuffer->empty())
actBuffer->insert(innBuffer->begin(), innBuffer->end());
}
if (!actBuffer->empty())
{
set<Item>::iterator it;
for (it = actBuffer.begin; it != actBuffer.end(); ++it)
// process
// possibly innBuffer->insert(...)
}
}
void thread2Add(Item item)
{
sem_wait(&mutex);
extBuffer->insert(item);
sem_post(&mutex);
}
Probably I should open another question
If you are worried about thread2 being blocked for a long time because thread1 is holding on to the lock, then make sure that thread1 guarentees to only take the lock for a really short time.
This can be easily accomplished if you have two instances of buffer list. So your attempt is already in the right direction.
Each buffer is pointed to with a pointer. One pointer you use to insert items into the list (thread2) and the other pointer is used to process the items in the other list (thread1). The insert operation of thread2 must be surrounded by a lock.
If thread1 is done processing all the items it only has to swap the pointers (e.g. with std::swap) this is a very quick operation which must be surrounded by a lock. Only the swap operation though. The actual processing of the items is lock-free.
This solution has the following advantages:
The lock in thread1 is always very short, so the amount of time that it may block thread2 is minimal
No constant dynamic allocation of buffers, which is faster and less likely to cause memory leak bugs.
You just need a mutex around inserting, removing, or when accessing the size of the container. You could develop a class to encapsulate the container and that would have the mutex. This would keep things simple and the class would handle the functionally of using he mutex. If you limit the items accessing (what is exposed for functions/interfaces) and make the functions small (just calling the container classes function enacapsulated in the mutex), they will return relatively quickly. You should need only on list for this in that case.
Depending on the system, if you have semaphore available, you may want to check and see if they are more efficent and use them instead of the mutex. Same concept applies, just in a different manner.
You may want to check in the concept of guards in case one of the threads dies, you would not get a deadlock condition.

c++ multithread optimization

in my code I have 2/4 threads performing montecarlo simulations. Each of them runs a number of experiments and they all collect the results into a stl vector.
My question is this: suppose each thread runs 1000 experiments sequentially. Is is better to store the result into the shared vector one at the time, or every once in a while? If they wait until they have some consistent amount of data, writing into the vector will take longer, so I'm not sure whether the second solution is necessarily better than the first one.
PS each experiment is numerical computation, so no IO operations.
Thanks
If you are going to wait until all the results are computed before you use any of the results, preallocate space for 4,000 results in the vector and have each thread write into one range of elements in the vector. No locking is required because no two threads access the same element in the vector.
If you want to use the results as they are computed, use some sort of a concurrent queue data structure instead of a vector.
If you're only putting 2000 to 4000 elements in the vector I doubt it would make much of a difference either way.
Do whatever is most natural for the algorithm. If that doesn't work well enough look into doing it the other way.
After thinking about it for a bit, it might serve both purposes (simplicity and speed) to have each thread store results to a local vector then copy the contents of the local vector to the 'global' vector (protected by a lock) when the thread is done. Of course, that's as long as whatever's waiting for the results can wait until a thread is fully finished before getting an update.
a singly linked list may be a better choice than vector here.
If there is only one thread reading and one thread writing to a fifo .. you don't need any synchronization . The trick is to keep at least one 'dummy' element always in the list, and fifo is empty if head == tail . The head and tail pointers can be manipulated for push and pop, such that there is no need for synchronization..
Using this .. you can make several Q's .. which will not need any synchronization
If new/delete is taking time .. you can have Q's to hold reusable elements.
best of luck .
remember .. Exactly one reader, and Exactly one writer .. no more, no less .
the trick is createa LOT of Q's like this , Q to recycle objects also .. and
you'll not need any thread synchronization stuff ...
If your Q's do run empty .. just a sleep() / wakeup() functionality is needed.
and in case i haven't already said .. Exactly one reader, and Exactly one writer.