I am trying to write a simple task class. It is a wrapper around std::future, it holds its state (not_started, running, completed), can start processing of given job on demand and it can repeatedly return result of its processing.
I can also offer some global functions for work with these tasks. But I am a little bit stuck in writing size_t wait_any(std::vector<task<T>>& tasks) function. This function is given a vector of tasks and should return index of the first completed task. If there are more tasks completed at the beginning, one of them must be returned (but this is not the problem).
A simple implementation using active waiting is following:
template <typename T>
size_t wait_any(std::vector<task<T>>& tasks) {
if (tasks.size() == 0) throw std::exception("Waiting for empty vector of tasks!");
for (auto i = tasks.begin(); i != tasks.end(); ++i) {
(*i).try_start();
}
while (true) {
for (size_t i = 0; i != tasks.size(); ++i) {
if (tasks[i].is_completed()) return i;
}
}
}
I would appreciate passive waiting for any completition. A std::this_thread::yield function is available, but I would rather not use it. As mentioned in documentation:
The exact behavior of this function depends on the implementation, in particular on the mechanics of the OS scheduler in use and the state of the system.
It seems that I should use std::condition_variable and std::mutex to get the whole thing working. There are a lot of examples showing use of these things, but I do not understand it at all and I have not found solution for this particular problem.
I would guess that I should create a std::condition_variable (just cv further) in the wait_any function. Then this cv (pointer) should be registered to all tasks from given vector. Once any of the tasks is completed (I can handle the moment when a task is done) it should call std::condition_variable::notify_one for all cv's registered in this task. These notified cv's should be also removed from all tasks which are holding them.
Now, I do not know how to use mutexes. I probably need to prevent multiple calls of notification and many other problems.
Any help appreciated!
I was thinking that since you only need one notification, you can use std::call_once to set the task_id which you require.
A naive way to go about it would be:
#include <iostream>
#include <vector>
#include <thread>
std::once_flag lala;
std::atomic_int winner( -1 );
void silly_task( int task_id )
{
//do nothing
std::call_once ( lala, [&]()
{
std::cout << "thread " << task_id << " wins" << std::endl;
winner = task_id;
} );
}
int main(){
std::vector<std::thread> vt;
for ( int i=0; i < 10 ; i ++ )
{
vt.push_back( std::thread( &silly_task, i) );
}
while ( winner == -1 )
{
std::this_thread::sleep_for(std::chrono::seconds(1));
}
for ( int i=0; i < 10 ; i ++ )
{
vt[i].join();
}
return 0;
} // end main
Related
I'm trying to implement some algorithm using threads that must be synchronized at some moment. More or less the sequence for each thread should be:
1. Try to find a solution with current settings.
2. Synchronize solution with other threads.
3. If any of the threads found solution end work.
4. (empty - to be inline with example below)
5. Modify parameters for algorithm and jump to 1.
Here is a toy example with algorithm changed to just random number generation - all threads should end if at least one of them will find 0.
#include <iostream>
#include <condition_variable>
#include <thread>
#include <vector>
const int numOfThreads = 8;
std::condition_variable cv1, cv2;
std::mutex m1, m2;
int lockCnt1 = 0;
int lockCnt2 = 0;
int solutionCnt = 0;
void workerThread()
{
while(true) {
// 1. do some important work
int r = rand() % 1000;
// 2. synchronize and get results from all threads
{
std::unique_lock<std::mutex> l1(m1);
++lockCnt1;
if (r == 0) ++solutionCnt; // gather solutions
if (lockCnt1 == numOfThreads) {
// last thread ends here
lockCnt2 = 0;
cv1.notify_all();
}
else {
cv1.wait(l1, [&] { return lockCnt1 == numOfThreads; });
}
}
// 3. if solution found then quit all threads
if (solutionCnt > 0) return;
// 4. if not, then set lockCnt1 to 0 to have section 2. working again
{
std::unique_lock<std::mutex> l2(m2);
++lockCnt2;
if (lockCnt2 == numOfThreads) {
// last thread ends here
lockCnt1 = 0;
cv2.notify_all();
}
else {
cv2.wait(l2, [&] { return lockCnt2 == numOfThreads; });
}
}
// 5. Setup new algorithm parameters and repeat.
}
}
int main()
{
srand(time(NULL));
std::vector<std::thread> v;
for (int i = 0; i < numOfThreads ; ++i) v.emplace_back(std::thread(workerThread));
for (int i = 0; i < numOfThreads ; ++i) v[i].join();
return 0;
}
The questions I have are about sections 2. and 4. from code above.
A) In a section 2 there is synchronization of all threads and gathering solutions (if found). All is done using lockCnt1 variable. Comparing to single use of condition_variable I found it hard how to set lockCnt1 to zero safely, to be able to reuse this section (2.) next time. Because of that I introduced section 4. Is there better way to do that (without introducing section 4.)?
B) It seems that all examples shows using condition_variable rather in context of 'producer-consumer' scenario. Is there better way to synchronization all threads in case where all are 'producers'?
Edit: Just to be clear, I didn't want to describe algorithm details since this is not important here - anyway this is necessary to have all solution(s) or none from given loop execution and mixing them is not allowed. Described sequence of execution must be followed and the question is how to have such synchronization between threads.
A) You could just not reset the lockCnt1 to 0, just keep incrementing it further. The condition lockCnt2 == numOfThreads then changes to lockCnt2 % numOfThreads == 0. You can then drop the block #4. In future you could also use std::experimental::barrier to get the threads to meet.
B) I would suggest using std::atomic for solutionCnt and then you can drop all other counters, the mutex and the condition variable. Just atomically increase it by one in the thread that found solution and then return. In all threads after every iteration check if the value is bigger than zero. If it is, then return. The advantage is that the threads do not have to meet regularly, but can try to solve it at their own pace.
Out of curiosity, I tried to solve your problem using std::async. For every attempt to find a solution, we call async. Once all parallel attempts have finished, we process feedback, adjust parameters, and repeat. An important difference with your implementation is that feedback is processed in the calling (main) thread. If processing feedback takes too long — or if we don't want to block the main thread at all — then the code in main() can be adjusted to also call std::async.
The code is supposed to be quite efficient, provided that the implementation of async uses a thread pool (e. g. Microsoft's implementation does that).
#include <chrono>
#include <future>
#include <iostream>
#include <vector>
const int numOfThreads = 8;
struct Parameters{};
struct Feedback {
int result;
};
Feedback doTheWork(const Parameters &){
// do the work and provide result and feedback for future runs
return Feedback{rand() % 1000};
}
bool isSolution(const Feedback &f){
return f.result == 0;
}
// Runs doTheWork in parallel. Number of parallel tasks is same as size of params vector
std::vector<Feedback> findSolutions(const std::vector<Parameters> ¶ms){
// 1. Run async tasks to find solutions. Normally threads are not created each time but re-used from a pool
std::vector<std::future<Feedback>> futures;
for (auto &p: params){
futures.push_back(std::async(std::launch::async,
[&p](){ return doTheWork(p); }));
}
// 2. Syncrhonize: wait for all tasks
std::vector<Feedback> feedback(futures.size());
for (auto nofRunning = futures.size(), iFuture = size_t{0}; nofRunning > 0; ){
// Check if the task has finished (future is invalid if we already handled it during an earlier iteration)
auto &future = futures[iFuture];
if (future.valid() && future.wait_for(std::chrono::milliseconds(1)) != std::future_status::timeout){
// Collect feedback for next attempt
// Alternatively, we could already check if solution has been found and cancel other tasks [if our algorithm supports cancellation]
feedback[iFuture] = std::move(future.get());
--nofRunning;
}
if (++iFuture == futures.size())
iFuture = 0;
}
return feedback;
}
int main()
{
srand(time(NULL));
std::vector<Parameters> params(numOfThreads);
// 0. Set inital parameter values here
// If we don't want to block the main thread while the algorithm is running, we can use std::async here too
while (true){
auto feedbackVector = findSolutions(params);
auto itSolution = std::find_if(std::begin(feedbackVector), std::end(feedbackVector), isSolution);
// 3. If any of the threads has found a solution, we stop
if (itSolution != feedbackVector.end())
break;
// 5. Use feedback to re-configure parameters for next iteration
}
return 0;
}
I'm making a parallel password cracker for an assignment. When I launch more than one thread, the times taken to crack take longer the more threads I add. What is the problem here?
Secondly, what resource sharing techniques can I use for optimal performance too? I'm required to use either mutexes, atomic operations or barriers while also using semaphores, conditional variables or channels. Mutexes seem to slow my program down quite drastically.
Here is an example of my code for context:
std::mutex mtx;
std::condition_variable cv;
void run()
{
std::unique_lock<std::mutex> lck(mtx);
ready = true;
cv.notify_all();
}
crack()
{
std::lock_guard<std::mutex> lk(mtx);
...do cracking stuff
}
main()
{
....
std::thread *t = new std::thread[uiThreadCount];
for(int i = 0; i < uiThreadCount; i++)
{
t[i] = std::thread(crack, params);
}
run();
for(int i = 0; i < uiThreadCount; i++)
{
t[i].join();
}
}
When writing multi-threaded code, it's generally a good idea to share as few resources as possible, so you can avoid having to synchronize using a mutex or an atomic.
There are a lot of different ways to do password cracking, so I'll give a slightly simpler example. Let's say you have a hash function, and a hash, and you're trying to guess what input produces the hash (this is basically how a password would get cracked).
We can write the cracker like this. It'll take the hash function and the password hash, check a range of values, and invoke the callback function if it found a match.
auto cracker = [](auto passwdHash, auto hashFunc, auto min, auto max, auto callback) {
for(auto i = min; i < max; i++) {
auto output = hashFunc(i);
if(output == passwdHash) {
callback(i);
}
}
};
Now, we can write a parallel version. This version only has to synchronize when it finds a match, which is pretty rare.
auto parallel_cracker = [](auto passwdHash, auto hashFunc, auto min, auto max, int num_threads) {
// Get a vector of threads
std::vector<std::thread> threads;
threads.reserve(num_threads);
// Make a vector of all the matches it discovered
using input_t = decltype(min);
std::vector<input_t> matches;
std::mutex match_lock;
// Whenever a match is found, this function gets called
auto callback = [&](input_t match) {
std::unique_lock<std::mutex> _lock(match_lock);
std::cout << "Found match: " << match << '\n';
matches.push_back(match);
};
for(int i = 0; i < num_threads; i++) {
auto sub_min = min + ((max - min) * i) / num_threads;
auto sub_max = min + ((max - min) * (i + 1)) / num_threads;
matches.push_back(std::thread(cracker, passwdHash, hashFunc, sub_min, sub_max, callback));
}
// Join all the threads
for(auto& thread : threads) {
thread.join();
}
return matches;
};
yes, not surprising with the way it's written: putting a mutex at the beginning of your thread (crack function), you effectively make them run sequentially
I understand you want to achieve a "synchronous start" of the threads (by the intention of using conditional variable cv), but you don't use it properly - without use of one of its wait methods, the call cv.notify_all() is useless: it does not do what you intended, instead your threads will simply run sequentially.
using wait() from the std::condition_variable in your crack() call is imperative: it will release the mtx (which you just grabbed with the mutex guard lk) and will block the execution of the thread until the cv.notify_all(). After the call, your other threads (except the first one, whichever it will be) will remain under the mtx so if you really want the "parallel" execution, you'd then need to unlock the mtx.
Here, how your crack thread should look like:
crack()
{
std::unique_lock<std::mutex> lk(mtx);
cv.wait(lk);
lk.unlock();
...do cracking stuff
}
btw, you don't need ready flag in your run() call - it's entirely redundant/unused.
I'm required to use either mutexes, atomic operations or barriers
while also using semaphores, conditional variables or channels
- different tools/techniques are good for different things, the question is too general
Assuming I have the function double someRandomFunction(int n) that takes an integer and returns double but it's random in the sense that it tries random stuff to come up with the solution so even though you run the function with the same arguments, sometimes it can take 10 seconds to finish and other 40 seconds to finish.
The double someRandomFunction(int n) functions itself is a wrapper to a black box function. So the someRandomFunction takes a while to complete but I don't have control in the main loop of the black box, hence I can't really check for a flag variable within the thread as the heavy computation happens in a black box function.
I would like to start 10 threads calling that function and I am interested in the result of the first thread which finishes first. I don't care which one it's I only need 1 result from these threads.
I found the following code:
std::vector<boost::future<double>> futures;
for (...) {
auto fut = boost::async([i]() { return someRandomFunction(2) });
futures.push_back(std::move(fut));
}
for (...) {
auto res = boost::wait_for_any(futures.begin(), futures.end());
std::this_thread::yield();
std::cout << res->get() << std::endl;
}
Which is the closest to what I am looking for, but still I can't see how I can make my program to terminate the other threads as far as one thread returns a solution.
I would like to wait for one to finish and then carry on with the result of that one thread to continue my program execution (i.e., I don't want to terminate my program after I obtain that single result, but I would like to use it for the remaining program execution.).
Again, I want to start up 10 threads calling the someRandomFunction and then wait for one thread to finish first, get the result of that thread and stop all the other threads even though they didn't finish their work.
If the data structure supplied to the black-box has some obvious start and end values, one way to make it finish early could be to change the end value while it's computing. It could of course cause all sorts of trouble if you've misunderstood how the black-box must work with the data, but if you are reasonably sure, it can work.
main spawns 100 outer threads that each spawn one inner thread that calls the blackbox. The inner thread receives the blackbox result and notifies all waiting threads that it's done. The outer thread waits for any inner thread to get done and then modifies the data for its own blackbox to trick it to finish.
No polling (except for the spurious wakeup loops) and no detached threads.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <vector>
#include <chrono>
// a work package for one black-box
struct data_for_back_box {
int start_here;
int end_here;
};
double blackbox(data_for_back_box* data) {
// time consuming work here:
for(auto v=data->start_here; v<data->end_here; ++v) {
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
// just a debug
if(data->end_here==0) std::cout << "I was tricked into exiting early\n";
return data->end_here;
}
// synchronizing stuff and result
std::condition_variable cv;
std::mutex mtx;
bool done=false;
double result;
// a wrapper around the real blackbox
void inner(data_for_back_box* data) {
double r = blackbox(data);
if(done) return; // someone has already finished, skip this result
// notify everyone that we're done
std::unique_lock<std::mutex> lock(mtx);
result = r;
done=true;
cv.notify_all();
}
// context setup and wait for any inner wrapper
// to signal "done"
void outer(int n) {
data_for_back_box data{0, 100+n*n};
std::thread work(inner, &data);
{
std::unique_lock<std::mutex> lock(mtx);
while( !done ) cv.wait(lock);
}
// corrupt data for blackbox:
data.end_here = 0;
// wait for this threads blackbox to finish
work.join();
}
int main() {
std::vector<std::thread> ths;
// spawn 100 worker threads
for(int i=0; i<100; ++i) {
ths.emplace_back(outer, i);
}
double saved_result;
{
std::unique_lock<std::mutex> lock(mtx);
while( !done ) cv.wait(lock);
saved_result = result;
} // release lock
// join all threads
std::cout << "got result, joining:\n";
for(auto& th : ths) {
th.join();
}
std::cout << "result: " << saved_result << "\n";
}
With the new standards ofc++17 I wonder if there is a good way to start a process with a fixed number of threads until a batch of jobs are finished.
Can you tell me how I can achieve the desired functionality of this code:
std::vector<std::future<std::string>> futureStore;
const int batchSize = 1000;
const int maxNumParallelThreads = 10;
int threadsTerminated = 0;
while(threadsTerminated < batchSize)
{
const int& threadsRunning = futureStore.size();
while(threadsRunning < maxNumParallelThreads)
{
futureStore.emplace_back(std::async(someFunction));
}
for(std::future<std::string>& readyFuture: std::when_any(futureStore.begin(), futureStore.end()))
{
auto retVal = readyFuture.get();
// (possibly do something with the ret val)
threadsTerminated++;
}
}
I read, that there used to be an std::when_any function, but it was a feature that did make it getting into the std features.
Is there any support for this functionality (not necessarily for std::future-s) in the current standard libraries? Is there a way to easily implement it, or do I have to resolve to something like this?
This does not seem to me to be the ideal approach:
All your main thread does is waiting for your other threads finishing, polling the results of your future. Almost wasting this thread somehow...
I don't know in how far std::async re-uses the threads' infrastructures in any suitable way, so you risk creating entirely new threads each time... (apart from that you might not create any threads at all, see here, if you do not specify std::launch::async explicitly.
I personally would prefer another approach:
Create all the threads you want to use at once.
Let each thread run a loop, repeatedly calling someFunction(), until you have reached the number of desired tasks.
The implementation might look similar to this example:
const int BatchSize = 20;
int tasksStarted = 0;
std::mutex mutex;
std::vector<std::string> results;
std::string someFunction()
{
puts("worker started"); fflush(stdout);
sleep(2);
puts("worker done"); fflush(stdout);
return "";
}
void runner()
{
{
std::lock_guard<std::mutex> lk(mutex);
if(tasksStarted >= BatchSize)
return;
++tasksStarted;
}
for(;;)
{
std::string s = someFunction();
{
std::lock_guard<std::mutex> lk(mutex);
results.push_back(s);
if(tasksStarted >= BatchSize)
break;
++tasksStarted;
}
}
}
int main(int argc, char* argv[])
{
const int MaxNumParallelThreads = 4;
std::thread threads[MaxNumParallelThreads - 1]; // main thread is one, too!
for(int i = 0; i < MaxNumParallelThreads - 1; ++i)
{
threads[i] = std::thread(&runner);
}
runner();
for(int i = 0; i < MaxNumParallelThreads - 1; ++i)
{
threads[i].join();
}
// use results...
return 0;
}
This way, you do not recreate each thread newly, but just continue until all tasks are done.
If these tasks are not all all alike as in above example, you might create a base class Task with a pure virtual function (e. g. "execute" or "operator ()") and create subclasses with the implementation required (and holding any necessary data).
You could then place the instances into a std::vector or std::list (well, we won't iterate, list might be appropriate here...) as pointers (otherwise, you get type erasure!) and let each thread remove one of the tasks when it has finished its previous one (do not forget to protect against race conditions!) and execute it. As soon as no more tasks are left, return...
If you dont care about the exact number of threads, the simplest solution would be:
std::vector<std::future<std::string>> futureStore(
batchSize
);
std::generate(futureStore.begin(), futureStore.end(), [](){return std::async(someTask);});
for(auto& future : futureStore) {
std::string value = future.get();
doWork(value);
}
From my experience, std::async will reuse the threads, after a certain amount of threads is spawend. It will not spawn 1000 threads. Also, you will not gain much of a performance boost (if any), when using a threadpool. I did measurements in the past, and the overall runtime was nearly identical.
The only reason, I use threadpools now, is to avoid the delay for creating threads in the computation loop. If you have timing constraints, you may miss deadlines, when using std::async for the first time, since it will create the threads on the first calls.
There is a good thread pool library for these applications. Have a look here:
https://github.com/vit-vit/ctpl
#include <ctpl.h>
const unsigned int numberOfThreads = 10;
const unsigned int batchSize = 1000;
ctpl::thread_pool pool(batchSize /* two threads in the pool */);
std::vector<std::future<std::string>> futureStore(
batchSize
);
std::generate(futureStore.begin(), futureStore.end(), [](){ return pool.push(someTask);});
for(auto& future : futureStore) {
std::string value = future.get();
doWork(value);
}
I want to run a function and tell if the function didn't finish after n milliseconds, stop that function and start another one. something like this code:
void run()
{
//do something that doesn't have while(1)
}
void main()
{
run();
if(runFunctionDidntFinishInSeconds(10)
{
endPrintFunction();
backupPlan();
}
return 0;
}
I searched out and found boost::timed_join function. here's my code:
void run()
{
int a;
for (int i = 0; i < 2000; i++)
cout << i << endl;
}
int main()
{
boost::thread t = new boost::thread(&run);
if (t.timed_join(boost::posix_time::microseconds(10000))){
cout << "done" << endl;
}
else{
cout << endl << "not done" << endl;
}
system("pause");
return 0;
}
but it doesn't stop thread 't' from running. I went to terminate the thread, but it's not a good option.
I want the 'a' function to finish the exact time I'm telling it to.
The system gets input every 16ms and I want to do a processing on it and say if the processing took more than about 13ms leave it and go do a backup plan. and I want it to be abstracted from the ones who write the processing method. So putting a while loop on the top of it brings me delay.
What should i do?
The least I think I need is to be abled to reset the processing thread to do what it had needed to do again!
I think your are looking for something like std::future.
http://en.cppreference.com/w/cpp/thread/future/wait_for
You can start the function in another thread and wait until the function returns or has a timeout.
For your example:
std::future< void > future = std::async( std::launch::async, print );
auto status = future.wait_for( std::chrono::seconds( 10 ) );
if ( status == std::future_status::deferred )
{
std::cout << "deferred\n";
}
else if ( status == std::future_status::timeout )
{
std::cout << "timeout\n";
}
else if ( status == std::future_status::ready )
{
std::cout << "ready!\n";
}
However this doesn't cause the detached thread to end. For this it is necessary to include a flag on startup, so the detached thread can cleanup and exit savely on its own.
void run(const std::atomic_bool& cancelled)
{
int a;
for (int i = 0; i < 2000; i++)
{
cout << i << endl;
if (cancelled)
return;
}
}
std::atomic_bool cancellation_token = false;
std::future< void > future = std::async( std::launch::async,
run,
std::ref(cancellation_token) );
auto status = future.wait_for( std::chrono::seconds( 10 ) );
if ( status == std::future_status::deferred )
{
std::cout << "deferred\n";
}
else if ( status == std::future_status::timeout )
{
std::cout << "timeout\n";
cancellation_token = true;
}
else if ( status == std::future_status::ready )
{
std::cout << "ready!\n";
}
I want it to be abstracted from the ones who write the processing method.
Standard C++ does not have a way to forcibly interrupt the control flow of a function from outside of that function's call graph (a function it calls can throw, but someone can't throw for them).
OS-specific thread systems have ways to terminate a thread. However, this leaves the program potentially in an undefined state, as the destructors for any stack variables have not been called. And since you didn't know where it was in that processing when you killed it, you can't effectively clean up after it. Even a C program cannot guarantee that an arbitrary function can be terminated; it would have to be one which did not dynamically allocate memory or other resources that have to be cleaned up.
You can compensate for this by coding your function very carefully. But that requires that the person who wrote that function to code it very carefully. And thus, there isn't an abstraction, since the person writing the function has to know what the rules are and is required to follow them.
So the only solution that works requires cooperation. The function must either be written in such a way that it can safely be stopped via those OS-dependent features, or it must be written to periodically check some value and stop itself.
Here are two and 3/4 approaches.
The first requires that the code you want to halt cooperates. It either polls some variable while it runs, or it calls a function periodically that could throw an exception to halt execution. boost interruptable threads follow the second model.
The second requires you to launch a new process, marshall your data over to the function, and use IPC to get the information back. If the function doesn't return in time, you kill the child process.
The third "half" involves rewriting the code in a different language, or using C++ as a scripting language. You run the code in an interpreter that does the first or second solution for you.
Now, a practical alternative (a 1/4 solution) is to make sure the function is purely functional, run it in a separate thread with a semi-reliable abort message (like the first one), and discard its return value if it takes too long. This doesn't do what you want, but is far easier.
There's a way with atomics used as semaphores but this will emit full blown memory barriers and thus decrease the performance because of the load every iteration :
#include <iostream>
#include <thread>
#include <chrono>
#include <atomic>
std::atomic<bool> printFinished { false };
std::atomic<bool> shouldPrintRun { true };
void print()
{
while (shouldPrintRun.load() /* && your normal stop condition*/)
{
//work..
}
printFinished.store(true);
}
int main()
{
std::thread t(print);
std::this_thread::sleep_for(std::chrono::seconds(10));
if (!printFinished.load())
{
shouldPrintRun.store(false);
t.join();
std::cout << "help!";
}
return 0;
}
If you don't want your function that's ran on another thread to check back if it needs to stop then terminating that thread is the only option.
A possible solution is that you have to make that the lengthy function into small & short incremental function which will continue the task still every time it is call from the last time it left of. The code below which can be run in a thread will do similar job of a time slicer and can be terminated at will.
void Process()
{
bool flag = true;
while (running)
{
std::chrono::high_resolution_clock::time_point time1 = std::chrono::high_resolution_clock::now();
std::chrono::milliseconds span(16);
while ( (std::chrono::high_resolution_clock::now() - time1 ) < span)
{
flag ? incremental_function1() : incremental_function2();
if (!running) return;
}
flag = (!flag);
}
}