I've been trying to optimize a sorting algorithm (quicksort) with threads. I know it is already quite good in the std::sort() implementation, but I'm trying to beat it with optimizations on my computer, and learn about threads at the same time.
So, my question is, how do I use threads with my recursive quicksort function?
Here's the function (with the not-important-to-the-question stuff removed):
template <typename T>
void quicksort(T arr[], const int &size, const int &beginning, const int &end)
{
// Algorithm here
thread t1(quicksort, arr, size, beginning, slow - 1);
thread t2(quicksort, arr, size, slow + 1, end);
}
If I was wrong and you do end up needing more of the code, let me know and I'll update it.
I'm using Visual Studio 2012, and as of right now, the error states:
error C2661: 'std::thread::thread' : no overloaded function takes 5 arguments
I've also tried calling ref(arr), etc. on each of the parameters, but I got the same error.
EDIT:
After trying the solution by #mfontanini I can compile with no errors, but on running, I get:
Debug Error!
Program: ...sktop\VisualStudio\Projects\SpeedTester\Debug\SpeedTester.exe
R6010
- abort() has been called
(Press Retry to debug the application)
Repeated over an over again. Eventually, it exits with code 3.
You need to explicitly indicate which is the T template parameter:
thread t1(&quicksort<T>, arr, size, beginning, slow - 1);
Otherwise the compiler sees that you're referring to a function template, but not to which specific specialization; it can't deduce T out of nowhere.
Your main problem probably is that you need to join() the thread(s) you spawn. If the thread objects are destructed without a prior join() or detach() the implementation calls std::terminate().
You don't want detach(), as you need to know that all partial sorts are finished for the overall sort to be complete, so joining is the right thing to do.
Additionally there are a few more things you could improve:
You should not pass around ints by reference. Pass by value is more efficient for simple scalar types and referencing local variables from other threads is generally not a good idea (unless you have a good reason and protocol for it)
You start far too many threads. After partitioning you need two threads for the two sub-sorts, but you have three: the current thread also continues to run, so you should create just one new thread and do the other sub-sort in the current thread. (And join() the other part when done.)
You should not keep creating new threads when the partitions get small. It may generally be a good idea to have a cutoff size for your quicksort and use something non-recursive (like insertion sort) for smaller sizes, as the recursion overhead becomes higher than the algorithm complexity benefit. A similar cut-off is even more important for concurrent sorting: the overhead of a thread is much higher than a simple recursive call and with small (and nearby) partitions, the threads will start to hit the same cache lines frequently, slowing things down even more.
It is generally not a good idea to create threads without limit. That will eventually run into platform limits. You might want to restrict the count of threads to use (using an atomic counter) or use something like std::async with default launch policy to avoid launching more threads than the platform can handle.
Related
I want to test some Object's function for thread safety in a race condition. In order to test this I would like to call a function simultaneously from two (or more) different threads. How can I write code that guarantee that the function calls will occur at the same time or at least close enough that it will have the desired effect?
The best you can do is hammer heavily at the code and check all the little signs you may get of an issue. If there's a race-condition, you should be able to write code that will eventually trigger it. Consider:
#include <thread>
#include <assert.h>
int x = 0;
void foo()
{
while (true)
{
x = x + 1;
x = x - 1;
assert(x == 0);
}
}
int main()
{
std::thread t(foo);
std::thread t2(foo);
t.join();
t2.join();
}
Everywhere I test it, it asserts pretty quickly. I could then add critical sections until the assert is gone.
But in fact, there's no guarantee that it ever will assert. But I've used this technique repeatedly on large-scale production code. You may just need to hammer at your code for a long while, to be sure.
Have a struct having a field of array of integers of zero, probably 300-500 kB long. Then from two threads, copy two other structs (one having 1s another having 2s) to it, just before some atomic memory issuing barriers(to be sure undefined behavior area has finished, from main thread by checking atomic variable's value).
This should have a high chance of undefined behavior and maybe you could see mixed 1s, 2s (and even 0s?) in it to know it happened.
But when you delete all control stuff such as atomics, then new shape can be also another undefined behavior and behave different.
A great way to do this is by inserting well-timed sleep calls. You can use this, for example, to force combinations of events in an order you want to test (Thread 1 does something, then Thread 2 does something, then Thread 1 does something else). A downside is that you have to have an idea of where to put the sleep calls. After doing this for a little bit you should start to get a feel it, but some good intuition helps in the beginning.
You may be able to conditionally call sleep or hit a breakpoint from a specific thread if you can get a handle to the thread id.
Also, I'm pretty sure that Visual Studio and (I think) GDB allow you to freeze some threads and/or run specific ones.
Say I have a function whose prototype looks like this, belonging to class container_class:
std::vector<int> container_class::func(int param);
The function may or may not cause an infinite loop on certain inputs; it is impossible to tell which inputs will cause a success and which will cause an infinite loop. The function is in a library of which I do not have the source of and cannot modify (this is a bug and will be fixed in the next release in a few months, but for now I need a way to work around it), so solutions which modify the function or class will not work.
I've tried isolating the function using std::async and std::future, and using a while loop to constantly check the state of the thread:
container_class c();
long start = get_current_time(); //get the current time in ms
auto future = std::async(&container_class::func, &c, 2);
while(future.wait_for(0ms) != std::future_status::ready) {
if(get_current_time() - start > 1000) {
//forcibly terminate future
}
sleep(2);
}
This code has many problems. One is that I can't forcibly terminate the std::future object (and the thread that it represents).
At the far extreme, if I can't find any other solution, I can isolate the function in its own executable, run it, and then check its state and terminate it appropriately. However, I would rather not do this.
How can I accomplish this? Is there a better way than what I'm doing right now?
You are out of luck, sorry.
First off, C++ doesn't even guarantee you there will be a thread for future execution. Although it would be extremely hard (probably impossible) to implement all std::async guarantees in a single thread, there is no direct prohibition of that, and also, there is certainly no guarantee that there will be a thread per async call. Because of that, there is no way to cancel the async execution.
Second, there is no such way even in the lowest level of thread implementation. While pthread_cancel exists, it won't protect you from infinite loops not visiting cancellation points, for example.
You can not arbitrarily kill a thread in Posix, and C++ thread model is based on it. A process really can't be a scheduler of it's own threads, and while sometimes it is a pain, it is what it is.
I am sorry if this was asked before, but I didn't find anything related to this. And this is for my understanding. It's not an home work.
I want to execute a function only for some amount of time. How do I do that? For example,
main()
{
....
....
func();
.....
.....
}
function func()
{
......
......
}
Here, my main function calls another function. I want that function to execute only for a minute. In that function, I will be getting some data from the user. So, if user doesn't enter the data, I don't want to be stuck in that function forever. So, Irrespective of whether function is completed by that time or it is not completed, I want to come back to the main function and execute the next operation.
Is there any way to do it ? I am on windows 7 and I am using VS-2013.
Under windows, the options are limited.
The simplest option would be for func() to explicitly and periodically check how long it has been executing (e.g. store its start time, periodically check the amount of time elapses since that start time) and return if it has gone longer than you wish.
It is possible (C++11 or later) to execute the function within another thread, and for main() to signal that thread when the required time period has elapsed. That is best done cooperatively. For example, main() sets a flag, the thread function checks that flag and exits when required to. Such a flag is usually best protected by a critical section or mutex.
An extremely unsafe way under windows is for main() to forceably terminate the thread. That is unsafe, as it can leave the program (and, in worst cases, the operating system itself) in an unreliable state (e.g. if the terminated thread is in the process of allocating memory, if it is executing certain kernel functions, manipulating global state of a shared DLL).
If you want better/safer options, you will need a real-time operating system with strict memory and timing partitioning. To date, I have yet to encounter any substantiated documentation about any variant of Windows and unix (not even real time variants) with those characteristics. There are a couple of unix-like systems (e.g. LynxOS) with variants that have such properties.
I think a part of your requirement can be met using multithreading and a loop with a stopwatch.
Create a new thread.
Start a stopwatch.
Start a loop with one minute as the condition for the loop.
During each iteration check if the user has entered the input and process.
when one minute is over, the loop quits.
I 'am not sure about the feasibility about this idea, just shared my idea. I don't know much about c++, but in Node.js your requirement can be achieved using 'events'. May be such things exists in C++ too.
I've an expensive function that need to be executed 1000 times. Execution can take between 5 seconds and 10 minutes. It has thus a high variation.
I like to have multiple threads working on it. My current implementation devised these 1000 calls in 4 times 250 calls and spawns 4 threads. However, if one thread has a "bad day", it has much longer to finish compared to the other 3 threads.
Hence I like to do a new call to the function whenever a thread has finished a previous call - until all 1000 calls have been made.
I think a thread-pool would work - but if ever possible I like to have a simple method (=as less additional code as possible). Also task-based design goes into this direction (I think). Is there an easy solution for this?
Initialize a semaphore with 1000 units. Have each of the 4 threads loop around a semaphore wait() and the work function.
All the threads will then work on the function until it has been executed 1000 times. Even if three of the threads get stuck and take ages, the fourth will handle the other 997 calls.
[Edit]
Meh.. aparrently, the standard C++11 library does not include semaphores. A semaphore is, however, a basic OS sunchro primitive and so should be easy enough to call, eg. with POSIX.
You can use either one of the reference implementation of Exectuors and then call the function via
#include <experimental/thread_pool>
using std::experimental::post;
using std::experimental::thread_pool;
thread_pool pool_{1};
void do_big_task()
{
for (auto i : n)
{
post(pool_, [=]
{
// do your work here;
});
}
}
Executors are coming in C++17 so I thought I would get in early.
Or if you want to try another flavour of executors then there is a more recent implementation with a slightly different syntax.
Given that you have already been able to segment the calls into separate entities and the threads to handle. Once approach is to use std::package_task (with its associated std::future) to handle the function call, and place them in a queue of some sort. In turn, each thread can pick up the packaged tasks and process them.
You will need to lock the queue for concurrent access, there may be some bottle necking here, but compared to the concern that a thread can have "a bad day", this should be minimal. This is effectively a thread pool, but it allows you some control over the execution of the tasks.
Another alternative is to use std::async and specify its launch policy as std::launch::async, the disadvantage it that you do not control the thread creation itself, so you are dependent on how efficient your standard library is controlling the threads vs. how many cores you have.
Either approach would work, the key would be to measure the performance of the approaches over a reasonable sample size. The measure should be for time and resource use (threads and keeping the cores busy). Most OSes will include ways of measuring the resource usage of the process.
I am quite new for parallel programming. Right now I have a problem and try to TBB solve it.
To simplified the problem, we can imagine that there are several people (tasks) picking balls and putting them into a container (concurrent_vector) according to the hash value of the number on the ball. Because we need to make sure it is lossless. The ball is represented as a link list (this is the reason to use concurrent_vector instead of concurrent_hashmap, I need random access). If the container is almost full (there is a threshold and condition to judge it). One people will put all the balls from the current container to a large container. For correctness, when he moving balls to the other container, all the other people should stop adding more balls and wait until he finishes. Due to moving balls around requires a lot of time, it would be better that all the other people stop the current task and help moving balls around.
How should I design it for better efficiency, should I use mutex, or spin_mutex, or conditional variable?
Because right now, I am using concurrent_vector, modifying the container contain is done in parallel. Do I need to lock the whole vector for the moving procedure?
Also I have a question about the TBB mutex. What does it mean no reenterance?
Reentrant means that a function, foo, can be interrupted while it's running, say by a signal, and you can then call foo again, in the signal handler, before the first call completes.
Imagine a call that locks a mutex before starting the function. This call wouldn't be re-entrant because if you tried to call it again you'd block forever trying to acquire the mutex. However, a function like returning the sum of two parameters, x and y, could be reentrant if x and y were both saved on the stack so each call had its own stack storage.
Compare reentrant to thread-safe. A thread-safe function takes care to not have a problem if two calls happen at the same time, say by locking, or using atomic operations. A reentrant function guarantees that it can call itself again even if the initial call is interrupted. Note that a reentrant function isn't necessarily thread safe