I'm fairly new to C++ standard library and have been using standard library lists for specific multithreaded implementation. I noticed that there might be a trick with using lists that I have not seen on any tutorial/blog/forum posts and though seems obvious to me does not seem to be considered by anyone. So maybe I'm too new and possibly missing something so hopefully someone smarter than me can perhaps validate what I am trying to achieve or explain to me what I am doing wrong.
So we know that in general standard library containers are not thread safe - but this seems like a guiding statement more than a rule. With lists it seems that there is a level of tolerance for thread safety. Let me explain, we know that lists do not get invalidated if we add/delete from the list. The only iterator that gets invalidated is the deleted item - which you can fix with the following line of code:
it = myList.erase(it)
So now lets say we have two threads and call them thread 1 and thread 2.
Thread 1's responsibility is to add to the list. It treats it as a queue, so it uses the std::list::push_back() function call.
Thread 2's responsibility is to process the data stored in the list as a queue and then after processing it will remove elements from the list.
Its guaranteed that Thread 2 will not remove elements in the list that were just added during its processing and Thread 1 guarantees that it will queue up the necessary data well ahead for Thread 2's processing. However, keep in mind elements can be added during Thread 2's processing.
So it seems that this is a reasonable use of lists in this multithreaded environment without the use of a locks for data protection. The reason why I say its reasonable is because, essentially, Thread 2 will only process data up to now such that it can retreive the current end iterator shown by the following pseudocode:
Thread 2 {
iter = myList.begin();
lock();
iterEnd = myList.end(); // lock data temporarily in order to get the current
// last element in the list
unlock();
// perform necessary processing
while (iter != iterEnd) {
// process data
// ...
// remove element
iter = myList.erase(iter);
}
}
Thread 2 uses a lock for a very short amount of time just to know where to stop processing, but for the most part Thread 1 and Thread 2 don't require any other locking. In addition, Thread 2 can possibly avoid locking too if its scope to know the current last element is flexible.
Does anyone see anything wrong with my suggestion?
Thanks!
Your program is racy. As an example of one obvious data race: std::list is more than just a collection of doubly-linked nodes. It also has, for example, a data member that stores the number of nodes in the list (it needs not be a single data member, but it has to store the count somewhere).
Both of your threads will modify this data member concurrently. Because there is no synchronization of those modifications, your program is racy.
Instances of the Standard Library containers cannot be mutated from multiple threads concurrently without external synchronization.
Related
I'm new to multi-thread programming(actually, i'm not a fresh man in multi-threading, but i always use global data for reading and writing thread, i think it makes my code ugly and slow, i'm eager to improve my skill)
and i'm now developing a forwarder server using c++, for simplify the question, we suppose there are only two threads, a receiving-thread and a sending-thread, and, the stupid design as usual, I have an global std::list for saving data :(
receiving-thread read raw data from server and wirte it into global std::list.
sending-thread read global std::list and send it to several clients.
i use pthread_mutex_lock to sync the global std::list.
the problem is that the performance of forward server is poor, global list locked when receiving-thread is wrting, but at that time, my sending-thread wanna read, so it must waiting, but i think this waiting is useless.
what should i do, i know that global is bad, but, without global, how can i sync these two threads?
i'll keep searching from SO and google.
any suggestions, guides, technology or books will be appreciated. thanks!
EDIT
for any suggestions, i wanna know why or why not, please give me the reason, thanks a lot.
Notes:
Please provide more complete examples: http://sscce.org/
Answers:
Yes, you should synchronize access to shared data.
NOTE: this makes assumptions about std::list implementation - which may or may not apply to your case - but since this assumptions is valid for some implementation you cannot assume your implementation must be thread safe without some explicit guarantee
Consider the snippet:
std::list g_list;
void thread1()
{
while( /*input ok*/ )
{
/*read input*/
g_list.push_back( /*something*/ );
}
}
void thread2()
{
while( /*something*/ )
{
/*pop from list*/
data x = g_list.front();
g_list.pop_front();
}
}
say for example list has 1 element in it
std::list::push_back() must do:
allocate space (many CPU instructions)
copy data into new space (many CPU instructions)
update previous element (if it exists) to point to new element
set std::list::_size
std::list::pop_front() must do:
free space
update next element to not have previous element
set std::list_size
Now say thread 1 calls push_back() - after checking that there is an element (check on size) - it continues to update this element - but right after this - before it gets a chance to update the element - thread 2 could be running pop_front - and be busy freeing the memory for the first element - which could result then in thread 1 causing a segmentation fault - or even memory corruption. Similarly updates to size could result in push_back winning over pop_front's update - and then you have size 2 when you only have 1 element.
Do not use pthread_* in C++ unless you really know what your doing - use std::thread (c++11) or boost::thread - or wrap pthread_* in a class by yourself - because if you don't consider exceptions you will end up with deadlocks
You cannot get past some form of synchronization in this specific example - but you could optimize synchronization
Don't copy the data itself into and out of the std::list - copy a pointer to the data into and out of the list
Only lock while your actually accessing the std::list - but don't make this mistake:
{
// lock
size_t i = g_list.size();
// unlock
if ( i )
{
// lock
// work with g_list ...
// unlock
}
}
A more appropriate pattern here would be a message queue - you can implement one with a mutex, a list and a condition variable. Here are some implementations you can look at:
http://pocoproject.org/docs/Poco.Notification.html
http://gnodebian.blogspot.com.es/2013/07/a-thread-safe-asynchronous-queue-in-c11.html
http://docs.wxwidgets.org/trunk/classwx_message_queue_3_01_t_01_4.html
google for more
There is also the option of atomic containers, look at:
http://calumgrant.net/atomic/ - not sure if this is backed by actual atomic storage (as opposed to just using synchronization behind an interface)
google for more
You could also go for an asynchronous approach with boost::asio - though your case should be quite fast if done right.
Enviroment : Windows 7.0 , C++ , Multithreading
I have created a new worker thread to receive data on socket and add it into a static multimap instance.
code snippet:
//remember mymultimap is static data type
static std::multimap<string,string> mymultimap;
EnterCriticalSection(&m_criticalsection);
mymultimap.insert ( "aaa", "bbb") );
LeaveCriticalSection(&m_criticalsection);
At same time my main thread is reading same static multimap :
code snap :
EnterCriticalSection(&m_criticalsection);
std::multimap<string,string>::iterator it = mymultimap.begin();
for( ; it != mymultimap.end(); it++)
{
std::string firstName = (*it).first;
std::string secondName = (*it).second;
}
LeaveCriticalSection(&m_criticalsection);
As the main and work threads are continously doing read and write, it hampers my application peformance.
Also the instance of multimap contains huge data (more than 10,000 records).
How can I make thread lock for a minimal time in multimap?
EnterCriticalSection(&m_criticalsection);
///minimal lock time for Map ???
LeaveCriticalSection(&m_criticalsection);
Please help me in improve my application performance.
As it stands your question leaves too much room for discussion: we don't know how the values stored in your multimap are actually used.
If:
the order enforced in that data structure is important,
you need to keep the values in the multimap even after they have been read,
you need to go through all the entries each time you read,
then you are pretty much stuck as to how we can optimize the use of that structure.
On the other hand, if you can relax one of these requirements somehow, then you may have possibilities to optimize things a bit, for instance by using a message queue instead of the map directly for communication between both threads.
Message queues are a standard way to implement efficient communication between threads, and for one to one setup, there are even lockless solutions.
Update: thinking about it, sharing that type of structure accross threads is not a good idea, whatever use you make of it. It is better to regroup all accesses to a multimap within one single thread, and thus have items generated by other threads passed on to the thread managing it through a queue. This completely decouples the work of generating the items from their storage and use. In your case, the producer thread will lose less time storing the data, which leaves it more time to handle the socket stream.
So, for that solution, you need a queue<std::pair<key,value> >, say std::queue, to be handled to both threads at their initialization, or alternatively a static instance like the multimap one. Then simply replace the multimap::insert in the first thread by a queue::push_back of a make_pair(key, value), and symmetrically in the consumer thread, fisrt have a pop_front of all the pending pairs in the queue, inserting them in the map at the same time, then implement your processing of your map, whatever it is.
Note:
Please be aware that if you are using a multimap, you might end up with multiple values for the same key: the call to find will return an iterator, and you might well have to check the next entries of the multimap to make sure you get all the values with the same keys .
I have a queue with elements which needs to be processed. I want to process these elements in parallel. The will be some sections on each element which need to be synchronized. At any point in time there can be max num_threads running threads.
I'll provide a template to give you an idea of what I want to achieve.
queue q
process_element(e)
{
lock()
some synchronized area
// a matrix access performed here so a spin lock would do
unlock()
...
unsynchronized area
...
if( condition )
{
new_element = generate_new_element()
q.push(new_element) // synchonized access to queue
}
}
process_queue()
{
while( elements in q ) // algorithm is finished condition
{
e = get_elem_from_queue(q) // synchronized access to queue
process_element(e)
}
}
I can use
pthreads
openmp
intel thread building blocks
Top problems I have
Make sure that at any point in time I have max num_threads running threads
Lightweight synchronization methods to use on queue
My plan is to the intel tbb concurrent_queue for the queue container. But then, will I be able to use pthreads functions ( mutexes, conditions )? Let's assume this works ( it should ). Then, how can I use pthreads to have max num_threads at one point in time? I was thinking to create the threads once, and then, after one element is processes, to access the queue and get the next element. However it if more complicated because I have no guarantee that if there is not element in queue the algorithm is finished.
My question
Before I start implementing I'd like to know if there is an easy way to use intel tbb or pthreads to obtain the behaviour I want? More precisely processing elements from a queue in parallel
Note: I have tried to use tasks but with no success.
First off, pthreads gives you portability which is hard to walk away from. The following appear to be true from your question - let us know if these aren't true because the answer will then change:
1) You have a multi-core processor(s) on which you're running the code
2) You want to have no more than num_threads threads because of (1)
Assuming the above to be true, the following approach might work well for you:
Create num_threads pthreads using pthread_create
Optionally, bind each thread to a different core
q.push(new_element) atomically adds new_element to a queue. pthreads_mutex_lock and pthreads_mutex_unlock can help you here. Examples here: http://pages.cs.wisc.edu/~travitch/pthreads_primer.html
Use pthreads_mutexes for dequeueing elements
Termination is tricky - one way to do this is to add a TERMINATE element to the queue, which upon dequeueing, causes the dequeuer to queue up another TERMINATE element (for the next dequeuer) and then terminate. You will end up with one extra TERMINATE element in the queue, which you can remove by having a named thread dequeue it after all the threads are done.
Depending on how often you add/remove elements from the queue, you may want to use something lighter weight than pthread_mutex_... to enqueue/dequeue elements. This is where you might want to use a more machine-specific construct.
TBB is compatible with other threading packages.
TBB also emphasizes scalability. So when you port over your program to from a dual core to a quad core you do not have to adjust your program. With data parallel programming, program performance increases (scales) as you add processors.
Cilk Plus is also another runtime that provides good results.
www.cilkplus.org
Since pThreads is a low level theading library you have to decide how much control you need in your application because it does offer flexibility, but at a high cost in terms of programmer effort, debugging time, and maintenance costs.
My recommendation is to look at tbb::parallel_do. It was designed to process elements from a container in parallel, even if the container itself is not concurrent; i.e. parallel_do works with an std::queue correctly without any user synchronization (of course you would still need to protect your matrix access inside process_element(). Moreover, with parallel_do you can add more work on the fly, which looks like what you need, as process_element() creates and adds new elements to the work queue (the only caution is that the newly added work will be processed immediately, unlike putting in a queue which would postpone processing till after all "older" items). Also, you don't have to worry about termination: parallel_do will complete automatically as soon as all initial queue items and new items created on the fly are processed.
However, if, besides the computation itself, the work queue can be concurrently fed from another source (e.g. from an I/O processing thread), then parallel_do is not suitable. In this case, it might make sense to look at parallel_pipeline or, better, the TBB flow graph.
Lastly, an application can control the number of active threads with TBB, though it's not a recommended approach.
I have the following situation: I have two threads
thread1, which is a worker thread that executes an algorithm until its input list size is > 0
thread2, which is asynchronous (user driven) and can add elements to the input list to be processed
Now, thread1 loop does something similar to the following
list input_list
list buffer_list
if (input_list.size() == 0)
sleep
while (input_list.size() > 0) {
for (item in input_list) {
process(item);
possibly add items to buffer_list
}
input_list = buffer_list (or copy it)
buffer_list = new list (or empty it)
sleep X ms (100 < X < 500, still have to decide)
}
Now thread2 will just add elements to buffer_list (which will be the next pass of the algorithm) and possibly manage to awake thread1 if it was stopped.
I'm trying to understand which multithread issues can occur in this situation, assuming that I'm programming it into C++ with aid of STL (no assumption on thread-safety of the implementation), and I have of course access to standard library (like mutex).
I would like to avoid any possible delay with thread2, since it's bound to user interface and it would create delays. I was thinking about using 3 lists to avoid synchronization issues but I'm not real sure so far. I'm still unsure either if there is a safer container within STL according to this specific situation.. I don't want to just place a mutex outside everything and lose so much performance.
Any advice would be very appreciated, thanks!
EDIT:
This is what I managed so far, wondering if it's thread safe and enough efficient:
std::set<Item> *extBuffer, *innBuffer, *actBuffer;
void thread1Function()
{
actBuffer->clear();
sem_wait(&mutex);
if (!extBuffer->empty())
std::swap(actBuffer, extBuffer);
sem_post(&mutex);
if (!innBuffer->empty())
{
if (actBuffer->empty())
std::swap(innBuffer, actBuffer);
else if (!innBuffer->empty())
actBuffer->insert(innBuffer->begin(), innBuffer->end());
}
if (!actBuffer->empty())
{
set<Item>::iterator it;
for (it = actBuffer.begin; it != actBuffer.end(); ++it)
// process
// possibly innBuffer->insert(...)
}
}
void thread2Add(Item item)
{
sem_wait(&mutex);
extBuffer->insert(item);
sem_post(&mutex);
}
Probably I should open another question
If you are worried about thread2 being blocked for a long time because thread1 is holding on to the lock, then make sure that thread1 guarentees to only take the lock for a really short time.
This can be easily accomplished if you have two instances of buffer list. So your attempt is already in the right direction.
Each buffer is pointed to with a pointer. One pointer you use to insert items into the list (thread2) and the other pointer is used to process the items in the other list (thread1). The insert operation of thread2 must be surrounded by a lock.
If thread1 is done processing all the items it only has to swap the pointers (e.g. with std::swap) this is a very quick operation which must be surrounded by a lock. Only the swap operation though. The actual processing of the items is lock-free.
This solution has the following advantages:
The lock in thread1 is always very short, so the amount of time that it may block thread2 is minimal
No constant dynamic allocation of buffers, which is faster and less likely to cause memory leak bugs.
You just need a mutex around inserting, removing, or when accessing the size of the container. You could develop a class to encapsulate the container and that would have the mutex. This would keep things simple and the class would handle the functionally of using he mutex. If you limit the items accessing (what is exposed for functions/interfaces) and make the functions small (just calling the container classes function enacapsulated in the mutex), they will return relatively quickly. You should need only on list for this in that case.
Depending on the system, if you have semaphore available, you may want to check and see if they are more efficent and use them instead of the mutex. Same concept applies, just in a different manner.
You may want to check in the concept of guards in case one of the threads dies, you would not get a deadlock condition.
After posting my solution to my own problem regarding memory issues, nusi suggested that my solution lacks locking.
The following pseudo code vaguely represents my solution in a very simple way.
std::map<int, MyType1> myMap;
void firstFunctionRunFromThread1()
{
MyType1 mt1;
mt1.Test = "Test 1";
myMap[0] = mt1;
}
void onlyFunctionRunFromThread2()
{
MyType1 &mt1 = myMap[0];
std::cout << mt1.Test << endl; // Prints "Test 1"
mt1.Test = "Test 2";
}
void secondFunctionFromThread1()
{
MyType1 mt1 = myMap[0];
std::cout << mt1.Test << endl; // Prints "Test 2"
}
I'm not sure at all how to go about implementing locking, and I'm not even sure why I should do it (note the actual solution is much more complex). Could someone please explain how and why I should implement locking in this scenario?
One function (i.e. thread) modifies the map, two read it. Therefore a read could be interrupted by a write or vice versa, in both cases the map will probably be corrupted. You need locks.
Actually, it's not even just locking that is the issue...
If you really want thread two to ALWAYS print "Test 1", then you need a condition variable.
The reason is that there is a race condition. Regardless of whether or not you create thread 1 before thread 2, it is possible that thread 2's code can execute before thread 1, and so the map will not be initialized properly. To ensure that no one reads from the map until it has been initialized you need to use a condition variable that thread 1 modifies.
You also should use a lock with the map, as others have mentioned, because you want threads to access the map as though they are the only ones using it, and the map needs to be in a consistent state.
Here is a conceptual example to help you think about it:
Suppose you have a linked list that 2 threads are accessing. In thread 1, you ask to remove the first element from the list (at the head of the list), In thread 2, you try to read the second element of the list.
Suppose that the delete method is implemented in the following way: make a temporary ptr to point at the second element in the list, make the head point at null, then make the head the temporary ptr...
What if the following sequence of events occur:
-T1 removes the heads next ptr to the second element
- T2 tries to read the second element, BUT there is no second element because the head's next ptr was modified
-T1 completes removing the head and sets the 2nd element as the head
The read by T2 failed because T1 didn't use a lock to make the delete from the linked list atomic!
That is a contrived example, and isn't necessarily how you would even implement the delete operation; however, it shows why locking is necessary: it is necessary so that operations performed on data are atomic. You do not want other threads using something that is in an inconsistent state.
Hope this helps.
In general, threads might be running on different CPUs/cores, with different memory caches. They might be running on the same core, with one interrupting ("pre-empting" the other). This has two consequences:
1) You have no way of knowing whether one thread will be interrupted by another in the middle of doing something. So in your example, there's no way to be sure that thread1 won't try to read the string value before thread2 has written it, or even that when thread1 reads it, it is in a "consistent state". If it is not in a consistent state, then using it might do anything.
2) When you write to memory in one thread, there is no telling if or when code running in another thread will see that change. The change might sit in the cache of the writer thread and not get flushed to main memory. It might get flushed to main memory but not make it into the cache of the reader thread. Part of the change might make it through, and part of it not.
In general, without locks (or other synchronization mechanisms such as semaphores) you have no way of saying whether something that happens in thread A will occur "before" or "after" something that happens in thread B. You also have no way of saying whether or when changes made in thread A will be "visible" in thread B.
Correct use of locking ensures that all changes are flushed through the caches, so that code sees memory in the state you think it should see. It also allows you to control whether particular bits of code can run simultaneously and/or interrupt each other.
In this case, looking at your code above, the minimum locking you need is to have a synchronisation primitive which is released/posted by the second thread (the writer) after it has written the string, and acquired/waited on by the first thread (the reader) before using that string. This would then guarantee that the first thread sees any changes made by the second thread.
That's assuming the second thread isn't started until after firstFunctionRunFromThread1 has been called. If that might not be the case, then you need the same deal with thread1 writing and thread2 reading.
The simplest way to actually do this is to have a mutex which "protects" your data. You decide what data you're protecting, and any code which reads or writes the data must be holding the mutex while it does so. So first you lock, then read and/or write the data, then unlock. This ensures consistent state, but on its own it does not ensure that thread2 will get a chance to do anything at all in between thread1's two different functions.
Any kind of message-passing mechanism will also include the necessary memory barriers, so if you send a message from the writer thread to the reader thread, meaning "I've finished writing, you can read now", then that will be true.
There can be more efficient ways of doing certain things, if those prove too slow.
The whole idea is to prevent the program from going into an indeterminate/unsafe state due to multiple threads accessing the same resource(s) and/or updating/modifying the resource so that the subsequent state becomes undefined. Read up on Mutexes and Locking (with examples).
The set of instructions created as a result of compiling your code can be interleaved in any order. This can yield unpredictable and undesired results. For example, if thread1 runs before thread2 is selected to run, your output may look like:
Test 1
Test 1
Worse yet, one thread may get pre-empted in the middle of assigning - if assignment is not an atomic operation. In this case let's think of atomic as the smallest unit of work which can not be further split.
In order to create a logically atomic set of instructions -- even if they yield multiple machine code instructions in reality -- is to use a lock or mutex. Mutex stands for "mutual exclusion" because that's exactly what it does. It ensures exclusive access to certain objects or critical sections of code.
One of the major challenges in dealing with multiprogramming is identifying critical sections. In this case, you have two critical sections: where you assign to myMap, and where you change myMap[ 0 ]. Since you don't want to read myMap before writing to it, that is also a critical section.
The simplest answer is: you have to lock whenever there is an access to some shared resources, which are not atomics. In your case myMap is shared resource, so you have to lock all reading and writing operations on it.