I am quite new for parallel programming. Right now I have a problem and try to TBB solve it.
To simplified the problem, we can imagine that there are several people (tasks) picking balls and putting them into a container (concurrent_vector) according to the hash value of the number on the ball. Because we need to make sure it is lossless. The ball is represented as a link list (this is the reason to use concurrent_vector instead of concurrent_hashmap, I need random access). If the container is almost full (there is a threshold and condition to judge it). One people will put all the balls from the current container to a large container. For correctness, when he moving balls to the other container, all the other people should stop adding more balls and wait until he finishes. Due to moving balls around requires a lot of time, it would be better that all the other people stop the current task and help moving balls around.
How should I design it for better efficiency, should I use mutex, or spin_mutex, or conditional variable?
Because right now, I am using concurrent_vector, modifying the container contain is done in parallel. Do I need to lock the whole vector for the moving procedure?
Also I have a question about the TBB mutex. What does it mean no reenterance?
Reentrant means that a function, foo, can be interrupted while it's running, say by a signal, and you can then call foo again, in the signal handler, before the first call completes.
Imagine a call that locks a mutex before starting the function. This call wouldn't be re-entrant because if you tried to call it again you'd block forever trying to acquire the mutex. However, a function like returning the sum of two parameters, x and y, could be reentrant if x and y were both saved on the stack so each call had its own stack storage.
Compare reentrant to thread-safe. A thread-safe function takes care to not have a problem if two calls happen at the same time, say by locking, or using atomic operations. A reentrant function guarantees that it can call itself again even if the initial call is interrupted. Note that a reentrant function isn't necessarily thread safe
Related
Say I have a function whose prototype looks like this, belonging to class container_class:
std::vector<int> container_class::func(int param);
The function may or may not cause an infinite loop on certain inputs; it is impossible to tell which inputs will cause a success and which will cause an infinite loop. The function is in a library of which I do not have the source of and cannot modify (this is a bug and will be fixed in the next release in a few months, but for now I need a way to work around it), so solutions which modify the function or class will not work.
I've tried isolating the function using std::async and std::future, and using a while loop to constantly check the state of the thread:
container_class c();
long start = get_current_time(); //get the current time in ms
auto future = std::async(&container_class::func, &c, 2);
while(future.wait_for(0ms) != std::future_status::ready) {
if(get_current_time() - start > 1000) {
//forcibly terminate future
}
sleep(2);
}
This code has many problems. One is that I can't forcibly terminate the std::future object (and the thread that it represents).
At the far extreme, if I can't find any other solution, I can isolate the function in its own executable, run it, and then check its state and terminate it appropriately. However, I would rather not do this.
How can I accomplish this? Is there a better way than what I'm doing right now?
You are out of luck, sorry.
First off, C++ doesn't even guarantee you there will be a thread for future execution. Although it would be extremely hard (probably impossible) to implement all std::async guarantees in a single thread, there is no direct prohibition of that, and also, there is certainly no guarantee that there will be a thread per async call. Because of that, there is no way to cancel the async execution.
Second, there is no such way even in the lowest level of thread implementation. While pthread_cancel exists, it won't protect you from infinite loops not visiting cancellation points, for example.
You can not arbitrarily kill a thread in Posix, and C++ thread model is based on it. A process really can't be a scheduler of it's own threads, and while sometimes it is a pain, it is what it is.
So I took an OS class last semester and we had a concurrency/threading project. It was an airport sim that landed planes / had them take off into the direction that the wind was blowing. We had to do it in Java. So now that finals are over and I'm bored, I'm trying to do it in C++11. In Java I used a synchronized variable for the wind (0 - 360) in main and passed it to the 3 threads I was using. My question is: Can you do that in C++11? It's a basic reader/writer, one thread writes/updates the wind, the other 2 (takeoff/land) read.
I got it working by having a global wind variable in my "threads.cpp" implementation file. But is there a way to pass a variable to as many threads as I want and all of them keep up with it? Or is it actually better for me to just use the global variable and not pass anything?(why/why not?) I was looking at std::ref() but that didn't work.
EDIT: I'm already using mutex and lock_guard. I'm just trying to figure out how to pass and keep a variable up to date in all threads. Right now it only updates in the write thread.
You can use a std::mutex with std::lock_guard to synchronize access to the shared data. Or if the shared data fits in an integer, you can use std::atomic<int> without locking.
If you want to avoid global variables, simply pass the address of the shared state to the thread functions when you launch them. For example:
void thread_entry1(std::atomic<int>* val) {}
void thread_entry2(std::atomic<int>* val) {}
std::atomic<int> shared_value;
std::thread t1(thread_entry1, &shared_value);
std::thread t2(thread_entry2, &shared_value);
Using std::mutex and std::lock_guard mimicks what a Java synchronized variable does (only in Java this happens secretly without you knowing, in C++ you do it explicitly).
However, having one producer (there is just one direction of wind) and otherwise only consumers, it suffices to write to a e.g. std::atomic<int> variable with relaxed ordering, and to read from that variable from each consumer, again with relaxed ordering. Unless you have the requirement that the global view of all airplanes are consistent (but then you would have to run a lockstep simulation, which makes threading nonsensical), there is no need for synchronization, you only have to make sure that any value that any airplane reads at any time is eventually correct and that no garbled intermediate results can occur. In other words, you need an atomic update.
Relaxed memory ordering is sufficient too, since if all you read is one value, you do not need any happens-before guarantees.
An atomic update (or rather, atomic write) is at least an order of magnitude, if not more, faster. Atomic reads and writes with relaxed ordering are indeed plain normal reads and writes on many (most) mainstream architectures.
The variable needs not be global, you can as well keep it in the main thread's simultion loop's scope and pass a reference (or pointer) to the threads.
You might want to create say, the wind object, on the heap with new through an std::shared_ptr. Pass this pointer to all interested threads and use a std::mutex and std::lock_guard to change it.
I am currently running into some design problems regarding concurrent programming in C++
and I was wondering if you could help me out:
Assume that some function func operates on some object obj. It is necessary during these operations to hold a lock (which might be a member variable of obj). Now assume that
func calls a subfunction func_2 while it holds the lock. Now func_2 operates on an object which is already locked. However, what if I also want to call func_2 from somewhere else without holding the lock? Should func_2 lock obj or should it not? I see 3 possibilites:
I could pass a bool to func_2 indicating whether ot not locking is required.
This seems to introduce a lot of boilerplate code though.
I could use a recursive lock and just always lock obj in func_2. Recursive locks
seem to
be problematic though, see here.
I could assume that every caller of func_2 holds the lock already. I would have
to document this and perhaps enforce this (at least in debugging mode). Is
it reasonable to have functions make assumptions regarding which locks are / are not
held by the calling thread? More generally, how do I decide from a design perspective
whether a function should lock Obj and which should assume that it is already locked?
(Obviously if a function assumes that certain locks are held then it can only call
functions which make at least equally strong assumptions but apart from that?)
My question is the following: Which one of these approaches is used in practice and why?
Thanks in advance
hfhc2
1. Passing an indicator whether to lock or not:
You give the the lock choice to the caller. This is error prone:
the caller might not do the right choice
the caller needs to know implementation details about your object, thus breaking the principle of encapsulation
the caller needs access to the mutex
If you have several objects, you eventually facilitate conditions for deadlocks
2. recursive lock:
You already highlighted the issue.
3. Pass locking responsibility to caller:
Among the different alternatives that you propose, this seems the most consistent. On contrary of 1, ou don't give the choice, but you pass complete responsibility for locking. It's part of the contract for using func_2.
You could even assert if a lock is set on the object, to prevent mistakes (although teh check wold be limited because you would not necessarily be in position to verivy who owns the lock).
4.Reconsider your design:
If you need to ensure in func_2 that the object is locked, it means that you have a critical section therein that you must protect. There are chances that both functions need to lock because they perform some lower level operations on obj and need to prevent data races on an instable state of the object.
I'd strongly advidse to look if it would be feasible to extract these lower-level routines from both func and func_2, and encapsulated them in simpler primitive functions on obj. This approach could also contribute to locking for shorter sequences thus increasing opportunity for real concurrency.
Ok, just as another follow-up. I recently read the API documentation of glib, in particular the section about message-passing queues. I found that most functions operating on these queues come in two variants, named function and function_unlocked. The idea is that if a programmer wants to execute a single operation, like popping from the queue this can be done using g_async_queue_pop(). The function automatically takes care of the locking/unlocking of the queue. However, if the programmer wants to for instance pop two elements without interruption, the following sequence may be used:
GAsyncQueue *queue = g_async_queue_new();
// ...
g_async_queue_lock(queue);
g_async_queue_pop_unlocked(queue);
g_async_queue_pop_unlocked(queue);
g_async_queue_unlock(queue);
This resembles my third approach. It is also the case that assumptions regarding the state of certain locks are made, they are required by the API and required to be documented.
I've been trying to optimize a sorting algorithm (quicksort) with threads. I know it is already quite good in the std::sort() implementation, but I'm trying to beat it with optimizations on my computer, and learn about threads at the same time.
So, my question is, how do I use threads with my recursive quicksort function?
Here's the function (with the not-important-to-the-question stuff removed):
template <typename T>
void quicksort(T arr[], const int &size, const int &beginning, const int &end)
{
// Algorithm here
thread t1(quicksort, arr, size, beginning, slow - 1);
thread t2(quicksort, arr, size, slow + 1, end);
}
If I was wrong and you do end up needing more of the code, let me know and I'll update it.
I'm using Visual Studio 2012, and as of right now, the error states:
error C2661: 'std::thread::thread' : no overloaded function takes 5 arguments
I've also tried calling ref(arr), etc. on each of the parameters, but I got the same error.
EDIT:
After trying the solution by #mfontanini I can compile with no errors, but on running, I get:
Debug Error!
Program: ...sktop\VisualStudio\Projects\SpeedTester\Debug\SpeedTester.exe
R6010
- abort() has been called
(Press Retry to debug the application)
Repeated over an over again. Eventually, it exits with code 3.
You need to explicitly indicate which is the T template parameter:
thread t1(&quicksort<T>, arr, size, beginning, slow - 1);
Otherwise the compiler sees that you're referring to a function template, but not to which specific specialization; it can't deduce T out of nowhere.
Your main problem probably is that you need to join() the thread(s) you spawn. If the thread objects are destructed without a prior join() or detach() the implementation calls std::terminate().
You don't want detach(), as you need to know that all partial sorts are finished for the overall sort to be complete, so joining is the right thing to do.
Additionally there are a few more things you could improve:
You should not pass around ints by reference. Pass by value is more efficient for simple scalar types and referencing local variables from other threads is generally not a good idea (unless you have a good reason and protocol for it)
You start far too many threads. After partitioning you need two threads for the two sub-sorts, but you have three: the current thread also continues to run, so you should create just one new thread and do the other sub-sort in the current thread. (And join() the other part when done.)
You should not keep creating new threads when the partitions get small. It may generally be a good idea to have a cutoff size for your quicksort and use something non-recursive (like insertion sort) for smaller sizes, as the recursion overhead becomes higher than the algorithm complexity benefit. A similar cut-off is even more important for concurrent sorting: the overhead of a thread is much higher than a simple recursive call and with small (and nearby) partitions, the threads will start to hit the same cache lines frequently, slowing things down even more.
It is generally not a good idea to create threads without limit. That will eventually run into platform limits. You might want to restrict the count of threads to use (using an atomic counter) or use something like std::async with default launch policy to avoid launching more threads than the platform can handle.
Starting with pthreads, I cannot understand what is the business with pthread_key_t and pthread_once_t?
Would someone explain in simple terms with examples, if possible?
thanks
pthread_key_t is for creating thread thread-local storage: each thread gets its own copy of a data variable, instead of all threads sharing a global (or function-static, class-static) variable. The TLS is indexed by a key. See pthread_getspecific et al for more details.
pthread_once_t is a control for executing a function only once with pthread_once. Suppose you have to call an initialization routine, but you must only call that routine once. Furthermore, the point at which you must call it is after you've already started up multiple threads. One way to do this would be to use pthread_once(), which guarantees that your routine will only be called once, no matter how many threads try to call it at once, so long as you use the same control variable in each call. It's often easier to use pthread_once() than it is to use other alternatives.
No, it can't be explained in layman terms. Laymen cannot successfully program with pthreads in C++. It takes a specialist known as a "computer programmer" :-)
pthread_once_t is a little bit of storage which pthread_once must access in order to ensure that it does what it says on the tin. Each once control will allow an init routine to be called once, and once only, no matter how many times it is called from how many threads, possibly concurrently. Normally you use a different once control for each object you're planning to initialise on demand in a thread-safe way. You can think of it in effect as an integer which is accessed atomically as a flag whether a thread has been selected to do the init. But since pthread_once is blocking, I guess there's allowed to be a bit more to it than that if the implementation can cram in a synchronisation primitive too (the only time I ever implemented pthread_once, I couldn't, so the once control took any of 3 states (start, initialising, finished). But then I couldn't change the kernel. Unusual situation).
pthread_key_t is like an index for accessing thread-local storage. You can think of each thread as having a map from keys to values. When you add a new entry to TLS, pthread_key_create chooses a key for it and writes that key into the location you specify. You then use that key from any thread, whenever you want to set or retrieve the value of that TLS item for the current thread. The reason TLS gives you a key instead of letting you choose one, is so that unrelated libraries can use TLS, without having to co-operate to avoid both using the same value and trashing each others' TLS data. The pthread library might for example keep a global counter, and assign key 0 for the first time pthread_key_create is called, 1 for the second, and so on.
Wow, the other answers here are way too verbose.
pthread_once_t stores state for pthread_once(). Calling pthread_once(&s, fn) calls fn and sets the value pointed to by s to record the fact it has been executed. All subsequent calls to pthread_once() are noops. The name should become obvious now.
pthread_once_t should be initialized to PTHREAD_ONCE_INIT.