simultaneously run different thread without waiting for other thread to complete - c++

The thing is i want to use c++ library which runs different threads simultaneously without having other threads to wait until the preceding thread is complete and their functionality within each thread is run simultaneuslly,I am talking about the code which is to be run in the thread;the sample code is shown below.
while(condition is true<it is infinite loop >){
running sleep here with random time
sleep(random time(sec))
rest of the code is here
}
This infinite while loop is run in each thread. I want to run this while loop in each thread to be run simultaneously without being stuck at the first thread to be completed. In other words all the infinite while loop(in each thread context) is to be run simultaneously. How do I achieve that? If you can please share some sample code actually I have used future with async but I get the same behavior as normal <thread> using join().

The issue you are encountering is because of the rather silly definition of std::async (in my opinion) that it doesn't have to execute your code asynchronously, but can instead run it when you attempt to get from its std::future return value.
No matter. If you set the first parameter of your call to std::launch::async you force it to run asynchronously. You can then save the future in a container, and if you retire futures from this container regularly, you can run as many threads as the system will let you.
Here's an example:
#include <iostream>
#include <thread>
#include <future>
#include <chrono>
#include <vector>
#include <mutex>
using future_store = std::vector<std::future<void>>;
void retireCompletedThreads(future_store &threadList)
{
for (auto i = threadList.begin(); i != threadList.end(); /* ++i */)
{
if (i->wait_for(std::chrono::seconds(0)) == std::future_status::ready)
{
i->get();
i = threadList.erase(i);
}
else
{
++i;
}
}
}
void waitForAllThreads(future_store &threadList)
{
for (auto& f : threadList)
{
f.get();
}
}
std::mutex coutMutex;
int main(int argc, char* argv[])
{
future_store threadList;
// No infinite loop here, but you can if you want.
// You do need to limit the number of threads you create in some way though,
// for example, only create new threads if threadList.size() < 20.
for (auto i = 0; i < 20; ++i)
{
auto f = std::async(std::launch::async,
[i]() {
{
std::lock_guard<std::mutex> l(coutMutex);
std::cout << "Thread " << i << " started" << std::endl;
}
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::lock_guard<std::mutex> l(coutMutex);
std::cout << "Thread " << i << " completed" << std::endl;
}
});
threadList.push_back(std::move(f));
// Existing threads need to be checked for completion every so often
retireCompletedThreads(threadList);
}
waitForAllThreads(threadList);
}

Related

Unable to create real multithreading in C++ (using for loop)

I am new to C++ and I am trying to create multiple threads using for loop. Here is the code
#include <iostream>
#include <thread>
class Threader{
public:
int foo(int z){
std::cout << "Calling this function with value :" << z << std::endl;
return 0;
}
};
int main()
{
Threader *m;
std::cout << "Hello world!" << std::endl;
std::thread t1;
for(int i = 0; i < 5; i++){
std::thread t1(&Threader::foo, m, i);
t1.join();
}
return 0;
}
This is the output
As you can see the function I am calling is being invoked using Thread 5 times, but I have to do a t1.join inside the for loop. Without the join the for loop fails in the very first iteration. Like shown here
But if I use the join(), then the threads are being created and executed sequentially cause join() waits for each thread completion. I could easily achieve Actual multithreading in Java by creating Threads in a loop using runnable methods.
How can I create 5 threads which would run absolutely parallel in C++?

How to correctly wait for condition variable timeout

I'm working on simple cancellation mechanism. But I have found problem with waiting for timeout on condition variable.
Lets consider the sample program from:
https://www.cplusplus.com/reference/condition_variable/condition_variable/wait_for/
It looks like this sample is broken. If someone would provide the data very fast then the program would go into infinite loop. To visualize it I did little modification to the sample program:
#include <iostream> // std::cout
#include <thread> // std::thread
#include <chrono> // std::chrono::seconds
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable, std::cv_status
using namespace std::chrono_literals;
std::condition_variable cv;
int value = -1;
void compute() {
value = 0;;
cv.notify_one();
}
int main()
{
std::thread th(compute);
std::this_thread::sleep_for(1s);
std::mutex mtx;
std::unique_lock<std::mutex> lck(mtx);
while (cv.wait_for(lck, std::chrono::seconds(1)) == std::cv_status::timeout) {
std::cout << '.' << std::endl;
}
std::cout << "You entered: " << value << '\n';
th.join();
return 0;
}
As I can't type as fast I just set the value to 0 and execute notify_one.
On the main thread I simulate simple delay. sleep_for(1s).
Finally the program does not see the notify_one and loops infinitely.
The output is: .....
My question is how to implement it correctly ?
I would like to know also if the waiting was stopped by timeout.
If the notify happens before the wait then it indeed gets "lost".
Most usage of CVs also require a flag of some sort which should be checked in the predicate. You already have this flag - value. Just use this as a predicate:
EDIT: Removed wrong code.
Note that as a separate matter you should protect the writing to value with your mutex or you're likely to hit UB. Which means you need to make your mutex global along with the CV/Flag.
Better way:
auto endTime = std::chrono::now() + std::chrono::seconds(1);
while(flag != 0)
{
auto res = cv.wait_until(lck, endTime);
if (res == std::cv_status::timeout)
{
// Add Timeout logic here
break;
}
}

Using boost to turn single thread to multi thread

I'm trying to turn a code from a single thread to a multi thread(example, create 6 threads instead of 1) while making sure they all start and finish without any interference from each other. What would be a way to do this? Could I just do a for loop that creates a thread until i < 6? And just add a mutex class with lock() and unlock()?
#include <iostream>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
void workerFunc()
{
boost::posix_time::seconds workTime(3);
std::cout << "Worker: running" << std::endl;
// Pretend to do something useful...
boost::this_thread::sleep(workTime);
std::cout << "Worker: finished" << std::endl;
}
int main(int argc, char* argv[])
{
std::cout << "main: startup" << std::endl;
boost::thread workerThread(workerFunc);
std::cout << "main: waiting for thread" << std::endl;
workerThread.join();
std::cout << "main: done" << std::endl;
system("pause");
return 0;
}
Yes, it's certainly possible. Since you don't want any interference between them, give them unique data to work with so that you do not need to synchronize the access to that data with a std::mutex or making it std::atomic. To further minimize the interference between threads, align the data according to std::hardware_destructive_interference_size.
You can use boost::thread::hardware_concurrency() to get the number of hardware threads available on the current system so that you don't have to hardcode the number of threads to run.
Passing references to the thread can be done using std::ref (or else the thread will get a ref to a copy of the data).
Here I create a std::list of threads and a std::vector of data to work on.
#include <cstdint> // std::int64_t
#include <iostream>
#include <list>
#include <new> // std::hardware_destructive_interference_size
#include <vector>
#include <boost/thread.hpp>
unsigned hardware_concurrency() {
unsigned rv = boost::thread::hardware_concurrency();
if(rv == 0) rv = 1; // fallback if hardware_concurrency returned 0
return rv;
}
// if you don't have hardware_destructive_interference_size, use something like this
// instead:
//struct alignas(64) data {
struct alignas(std::hardware_destructive_interference_size) data {
std::int64_t x;
};
void workerFunc(data& d) {
// work on the supplied data
for(int i = 0; i < 1024*1024-1; ++i) d.x -= i;
for(int i = 0; i < 1024*1024*1024-1; ++i) d.x += i;
}
int main() {
std::cout << "main: startup" << std::endl;
size_t number_of_threads = hardware_concurrency();
std::list<boost::thread> threads;
std::vector<data> dataset(number_of_threads);
// create the threads
for(size_t idx = 0; idx < number_of_threads; ++idx)
threads.emplace_back(workerFunc, std::ref(dataset[idx]));
std::cout << "main: waiting for threads" << std::endl;
// join all threads
for(auto& th : threads) th.join();
// display results
for(const data& d : dataset) std::cout << d.x << "\n";
std::cout << "main: done" << std::endl;
}
If you are using C++11 (or later), I suggest using std::thread instead.
Starting and stopping a bunch of Boost threads
std::vector<boost::thread> threads;
for (int i = 0; i < numberOfThreads; ++i) {
boost::thread t(workerFunc);
threads.push_back(std::move(t));
}
for (auto& t : threads) {
t.join();
}
Keep in mind that join() doesn't terminate the threads, it only waits until they are finished.
Synchronization
Mutexes are required if multiple threads access the same data and at least one of them is writing the data. You can use a mutex to ensure that multiple threads enter the critical sections of the code. Example:
std::queue<int> q;
std::mutex q_mu;
void workerFunc1() {
// ...
{
std::lock_guard<std::mutex> guard(q_mu);
q.push(foo);
} // lock guard goes out of scope and automatically unlocks q_mu
// ...
}
void workerFunc2() {
// ...
{
std::lock_guard<std::mutex> guard(q_mu);
foo = q.pop();
} // lock guard goes out of scope and automatically unlocks q_mu
// ...
}
This prevents undefined behavior like reading an item from the queue that hasn't been written completely. Be careful - data races can crash your program or corrupt your data. I'm frequently using tools like Thread Sanitizer or Helgrind to ensure I didn't miss anything. If you only want to pass results back into the main program but don't need to share data between your threads you might want to consider using std::promise and std::future.
Yes, spawning new threads can be done with a simple loop. You will have to keep a few things in mind though:
If threads will operate on shared data, it will need to be protected with mutexes, atomics or via some other way to avoid data races and undefined behaviour (bear in mind that even primitive types such as int have to be wrapped with an atomic or mutex according to the standard).
You will have to make sure that you will eventually either call join() or detach() on every spawned thread before its object goes out of scope to prevent it from suddenly terminating.
Its best to do some computations on the main thread while waiting for worker threads to use this time efficiently instead of wasting it.
You generally want to spawn 1 thread less than the number of total threads you want as the program starts running with with one thread by default (the main thread).

thread is determined during compile or runtime?

i just ask my self:when i make pool of threads in code
then i compile the code ,
does the compiled code have a copy for each thread?
and if i use macro function , and pass it to the threads,
is this macro expanded during compile time"what i think" or during runtime,
and if it is in compile time why this following code need mutex:
#include <boost/asio.hpp>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
#include <iostream>
namespace asio = boost::asio;
#define PRINT_ARGS(msg) do {\
boost::lock_guard<boost::mutex> lg(mtx); \
std::cout << '[' << boost::this_thread::get_id() \
<< "] " << msg << std::endl; \
} while (0)
int main() {
asio::io_service service;
boost::mutex mtx;
for (int i = 0; i < 20; ++i) {
service.post([i, &mtx]() {
PRINT_ARGS("Handler[" << i << "]");
boost::this_thread::sleep(
boost::posix_time::seconds(1));
});
}
boost::thread_group pool;
for (int i = 0; i < 4; ++i) {
pool.create_thread([&service]() { service.run(); });
}
pool.join_all();
}
here the lock_guard make cout critical section,although only the main thread will be the thread posting to io_service
then the threads running tasks will work on already made queue of already made lamda functions >>>>which make me think there is no need for mutex?
is this thinking right?
here i will simulate the macro expansion during compilation:
#include <boost/asio.hpp>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
#include <iostream>
namespace asio = boost::asio;
#define PRINT_ARGS(msg) do {\
boost::lock_guard<boost::mutex> lg(mtx); \
std::cout << '[' << boost::this_thread::get_id() \
<< "] " << msg << std::endl; \
} while (0)
int main() {
asio::io_service service;
boost::mutex mtx;
for (int i = 0; i < 20; ++i) {
service.post([i, &mtx]() {
//PRINT_ARGS("Handler[" << i << "]");//>>>>>this will be
do {\\
boost::lock_guard<boost::mutex> lg(mtx); \\
std::cout << '[' << boost::this_thread::get_id() \\
<< "] " << "Handler[" << i << "]" << std::endl; \\
} while (0)
boost::this_thread::sleep(
boost::posix_time::seconds(1));
});
}
boost::thread_group pool;
for (int i = 0; i < 4; ++i) {
pool.create_thread([&service]() { service.run(); });
}
pool.join_all();
}
and then the program will be in the following order:
1- main thread :make io_service instance
2- main thread :make mutex instance
3- main thread :make for loop 20 times ,each time the main thread post a task"the lambda function" which is defined in the book having this code as adding this function object to an internal queue in io_service so
my question is :does the main thread add 20 lambda function objects to the queue and in this case each one would have certain value of i
and then when the new 4 threads start work,he gives them thread function "run" which according to same book remove the function objects one by one and execute them one by one
in this case:
thread 1:removes lambda 1 and execute it with its own code as being separate instance with unique i
thread 2:removes lambda 2 and execute it with its own code as being separate instance with unique i
thread 3:removes lambda 3 and execute it with its own code as being separate instance with unique i
thread 4:removes lambda 4 and execute it with its own code as being separate instance with unique i
then thread one againget lambda 5
this is based on my understanding that the queue has 20 function objects as lambda functions"may be wrapped in somesort of wrapper" and thus each thread will take separate object and for this reason need no mutex"20 internal assembly codes after compilation"
but if the tasks in queue are just references to the same single code"but when it execute the for loop" ,then it will need mutex to prevent 2 threads accessing the critical code at same time
which scenario is the present here by code signs?
Macros are always expanded at compile time but the compiler has only very rudimentary knowledge of threads (mostly regarding being able to say that certain variables are thread local).
Code is only going to exist once in either the on-disk image or in-memory copy that actually gets run.
Locking the mutex in PRINT_ARGS ensures that each operation's message is printed in its entirety without getting interrupted by another thread. (Otherwise you might have a operation start to print its message, get interrupted by another operation on a different thread which does print its message and then the remainder of the first operation's message gets printed).

How to limit the number of running instances in C++

I have a c++ class that allocates a lot of memory. It does this by calling a third-party library that is designed to crash if it cannot allocate the memory, and sometimes my application creates several instances of my class in parallel threads. With too many threads I have a crash.
My best idea for a solution is to make sure that there are never, say, more than three instances running at the same time. (Is this a good idea?)
And my current best idea for implementing that is to use a boost mutex. Something along the lines of the following pseudo-code,
MyClass::MyClass(){
my_thread_number = -1; //this is a class variable
while (my_thread_number == -1)
for (int i=0; i < MAX_PROCESSES; i++)
if(try_lock a mutex named i){
my_thread_number = i;
break;
}
//Now I know that my thread has mutex number i and it is allowed to run
}
MyClass::~MyClass(){
release mutex named my_thread_number
}
As you see, I am not quite sure of the exact syntax for mutexes here.. So summing up, my questions are
Am I on the right track when I want to solve my memory error by limiting the number of threads?
If yes, Should I do it with mutexes or by other means?
If yes, Is my algorithm sound?
Is there a nice example somewhere of how to use try_lock with boost mutexes?
Edit: I realized I am talking about threads, not processes.
Edit: I am involved in building an application that can run on both linux and Windows...
UPDATE My other answer addresses scheduling resources among threads (after the question was clarified).
It shows both a semaphore approach to coordinate work among (many) workers, and a thread_pool to limit workers in the first place and queue the work.
On linux (and perhaps other OSes?) you can use a lock file idiom (but it's not supported with some file-systems and old kernels).
I would suggest to use Interprocess synchronisation objects.
E.g., using a Boost Interprocess named semaphore:
#include <boost/interprocess/sync/named_semaphore.hpp>
#include <boost/thread.hpp>
#include <cassert>
int main()
{
using namespace boost::interprocess;
named_semaphore sem(open_or_create, "ffed38bd-f0fc-4f79-8838-5301c328268c", 0ul);
if (sem.try_wait())
{
std::cout << "Oops, second instance\n";
}
else
{
sem.post();
// feign hard work for 30s
boost::this_thread::sleep_for(boost::chrono::seconds(30));
if (sem.try_wait())
{
sem.remove("ffed38bd-f0fc-4f79-8838-5301c328268c");
}
}
}
If you start one copy in the back ground, new copies will "refuse" to start ("Oops, second instance") for about 30s.
I have a feeling it might be easier to reverse the logic here. Mmm. Lemme try.
some time passes
Hehe. That was more tricky than I thought.
The thing is, you want to make sure that the lock doesn't remain when your application is interrupted or killed. In the interest of sharing the techniques for portably handling the signals:
#include <boost/interprocess/sync/named_semaphore.hpp>
#include <boost/thread.hpp>
#include <cassert>
#include <boost/asio.hpp>
#define MAX_PROCESS_INSTANCES 3
boost::interprocess::named_semaphore sem(
boost::interprocess::open_or_create,
"4de7ddfe-2bd5-428f-b74d-080970f980be",
MAX_PROCESS_INSTANCES);
// to handle signals:
boost::asio::io_service service;
boost::asio::signal_set sig(service);
int main()
{
if (sem.try_wait())
{
sig.add(SIGINT);
sig.add(SIGTERM);
sig.add(SIGABRT);
sig.async_wait([](boost::system::error_code,int sig){
std::cerr << "Exiting with signal " << sig << "...\n";
sem.post();
});
boost::thread sig_listener([&] { service.run(); });
boost::this_thread::sleep_for(boost::chrono::seconds(3));
service.post([&] { sig.cancel(); });
sig_listener.join();
}
else
{
std::cout << "More than " << MAX_PROCESS_INSTANCES << " instances not allowed\n";
}
}
There's a lot that could be explained there. Let me know if you're interested.
NOTE It should be quite obvious that if kill -9 is used on your application (forced termination) then all bets are off and you'll have to either remove the Name Semaphore object or explicitly unlock it (post()).
Here's a testrun on my system:
sehe#desktop:/tmp$ (for a in {1..6}; do ./test& done; time wait)
More than 3 instances not allowed
More than 3 instances not allowed
More than 3 instances not allowed
Exiting with signal 0...
Exiting with signal 0...
Exiting with signal 0...
real 0m3.005s
user 0m0.013s
sys 0m0.012s
Here's a simplistic way to implement your own 'semaphore' (since I don't think the standard library or boost have one). This chooses a 'cooperative' approach and workers will wait for each other:
#include <boost/thread.hpp>
#include <boost/phoenix.hpp>
using namespace boost;
using namespace boost::phoenix::arg_names;
void the_work(int id)
{
static int running = 0;
std::cout << "worker " << id << " entered (" << running << " running)\n";
static mutex mx;
static condition_variable cv;
// synchronize here, waiting until we can begin work
{
unique_lock<mutex> lk(mx);
cv.wait(lk, phoenix::cref(running) < 3);
running += 1;
}
std::cout << "worker " << id << " start work\n";
this_thread::sleep_for(chrono::seconds(2));
std::cout << "worker " << id << " done\n";
// signal one other worker, if waiting
{
lock_guard<mutex> lk(mx);
running -= 1;
cv.notify_one();
}
}
int main()
{
thread_group pool;
for (int i = 0; i < 10; ++i)
pool.create_thread(bind(the_work, i));
pool.join_all();
}
Now, I'd say it's probably better to have a dedicated pool of n workers taking their work from a queue in turns:
#include <boost/thread.hpp>
#include <boost/phoenix.hpp>
#include <boost/optional.hpp>
using namespace boost;
using namespace boost::phoenix::arg_names;
class thread_pool
{
private:
mutex mx;
condition_variable cv;
typedef function<void()> job_t;
std::deque<job_t> _queue;
thread_group pool;
boost::atomic_bool shutdown;
static void worker_thread(thread_pool& q)
{
while (auto job = q.dequeue())
(*job)();
}
public:
thread_pool() : shutdown(false) {
for (unsigned i = 0; i < boost::thread::hardware_concurrency(); ++i)
pool.create_thread(bind(worker_thread, ref(*this)));
}
void enqueue(job_t job)
{
lock_guard<mutex> lk(mx);
_queue.push_back(std::move(job));
cv.notify_one();
}
optional<job_t> dequeue()
{
unique_lock<mutex> lk(mx);
namespace phx = boost::phoenix;
cv.wait(lk, phx::ref(shutdown) || !phx::empty(phx::ref(_queue)));
if (_queue.empty())
return none;
auto job = std::move(_queue.front());
_queue.pop_front();
return std::move(job);
}
~thread_pool()
{
shutdown = true;
{
lock_guard<mutex> lk(mx);
cv.notify_all();
}
pool.join_all();
}
};
void the_work(int id)
{
std::cout << "worker " << id << " entered\n";
// no more synchronization; the pool size determines max concurrency
std::cout << "worker " << id << " start work\n";
this_thread::sleep_for(chrono::seconds(2));
std::cout << "worker " << id << " done\n";
}
int main()
{
thread_pool pool; // uses 1 thread per core
for (int i = 0; i < 10; ++i)
pool.enqueue(bind(the_work, i));
}
PS. You can use C++11 lambdas instead boost::phoenix there if you prefer.