Boost, create thread pool before io_service.post - c++

I successfully was testing an example about boost io_service:
for(x = 0; x < loops; x++)
{
// Add work to ioService.
for (i = 0; i < number_of_threads; i++)
{
ioService.post(boost::bind(worker_task, data, pre_data[i]));
}
// Now that the ioService has work, use a pool of threads to service it.
for (i = 0; i < number_of_threads; i++)
{
threadpool.create_thread(boost::bind(
&boost::asio::io_service::run, &ioService));
}
// threads in the threadpool will be completed and can be joined.
threadpool.join_all();
}
This will loop several times and it take a little bit long because every time the threads are created for each loop.
Is there a way to create all needed threads.
Then post in the loop the work for each thread.
After the work it is needed to wait until all threads have finished their work!
Something like this:
// start/create threads
for (i = 0; i < number_of_threads; i++)
{
threadpool.create_thread(boost::bind(
&boost::asio::io_service::run, &ioService));
}
for(x = 0; x < loops; x++)
{
// Add work to ioService.
for (i = 0; i < number_of_threads; i++)
{
ioService.post(boost::bind(worker_task, data, pre_data[i]));
}
// threads in the threadpool will be completed and can be joined.
threadpool.join_all();
}

The problem here is that your worker threads will finish immediately after creation, since there is no work to be done. io_service::run() will just return right away, so unless you manage to sneak in one of the post-calls before all worker threads have had an opportunity to call run(), they will all finish right away.
Two ways to fix this:
Use a barrier to stop the workers from calling run() right away. Only unblock them once the work has been posted.
Use an io_service::work object to prevent run from returning. You can destroy the work object once you posted everything (and must do so before attempting to join the workers again).

The loop wasn't realy usefull.
Here is a better shwoing how it works.
I getting data in a callback:
void worker_task(uint8_t * data, uint32_t len)
{
uint32_t pos = 0;
while(pos < len)
{
pos += process_data(data + pos);
}
}
void callback_f(uint8_t *data, uint32_t len)
{
//split data into parts
uint32_t number_of_data_per_thread = len / number_of_threads;
// Add work to ioService.
uint32_t x = 0;
for (i = 0; i < number_of_threads; i++)
{
ioService.post(boost::bind(worker_task, data + x, number_of_data_per_thread));
x += number_of_data_per_thread ;
}
// Now that the ioService has work, use a pool of threads to service it.
for (i = 0; i < number_of_threads; i++)
{
threadpool.create_thread(boost::bind(
&boost::asio::io_service::run, &ioService));
}
// threads in the threadpool will be completed and can be joined.
threadpool.join_all();
}
So this callback get called very fast from the host application (media stream). If the len what is comming in is big enough the threadpool makes sense. This is because the working time is higher than the init time of the threads and start running.
If the len of the data is small the advantage of the threadpool is getting lost because the init and starting of the threads takes more time then the processing of the data.
May question is now if it is possible to have the threads already running and waiting for data. If the callback get called push the data to the threads and wait for their finish. The number of threads is constant (CPU count).
And as because it is a callback from a host application the pointer to data is only valid while being in the callback function. This is why I have to wait until all threads have finished work.
The thread can starting working immediately after getting data even before other threads are getting started. There is no sync problem because every thread have its own memory area of the data.

Related

Pthread synchronization with barrier

I am trying to synchronize a function I am parallelizing with pthreads.
The issue is, I am having a deadlock because a thread will exit the function while other threads are still waiting for the thread that exited to reach the barrier. I am unsure whether the pthread_barrier structure takes care of this. Here is an example:
static pthread_barrier_t barrier;
static void foo(void* arg) {
for(int i = beg; i < end; i++) {
if (i > 0) {
pthread_barrier_wait(&barrier);
}
}
}
int main() {
// create pthread barrier
pthread_barrier_init(&barrier, NULL, NUM_THREADS);
// create thread handles
//...
// create threads
for (int i = 0; i < NUM_THREADS; i++) {
pthread_create(&thread_handles[i], NULL, &foo, (void*) i);
}
// join the threads
for (int i = 0; i < NUM_THREADS; i++) {
pthread_join(&thread_handles[i], NULL);
}
}
Here is a solution I tried for foo, but it didn't work (note NUM_THREADS_COPY is a copy of the NUM_THREADS constant, and is decremented whenever a thread reaches the end of the function):
static void foo(void* arg) {
for(int i = beg; i < end; i++) {
if (i > 0) {
pthread_barrier_wait(&barrier);
}
}
pthread_barrier_init(&barrier, NULL, --NUM_THREADS_COPY);
}
Is there a solution to updating the number of threads to wait in a barrier for when a thread exits a function?
You need to decide how many threads it will take to pass the barrier before any threads arrive at it. Undefined behavior results from re-initializing the barrier while there are threads waiting at it. Among the plausible manifestations are that some of the waiting threads are prematurely released or that some of the waiting threads never get released, but those are by no means the only unwanted things that could happen. In any case ...
Is there a solution to updating the number of threads to wait in a
barrier for when a thread exits a function?
... no, pthreads barriers do not support that.
Since a barrier seems not to be flexible enough for your needs, you probably want to fall back to the general-purpose thread synchronization object: a condition variable (used together with a mutex and some kind of shared variable).

Best approach to Independently Timing each Tick of a For Loop in another Thread

Suppose I have a client thread and a server thread. The client thread must perform an expensive for loop operation which is prone to hanging. Thus, the server has independently determine whether each tick of the for loop has exceeded the max time. The context behind this is that the server will timeout the client if it takes too long to complete a tick.
My initial idea below is to have two for loops in the client and server thread. The server thread will have a condition variable that waits for 1 second. If the client does not notify the condition variable in 1 second every tick, the server will time it out:
Server
bool success;
for (int i = 0; i < 10; i++) {
std::unique_lock<std::mutex> lock(CLIENT_MUTEX);
success = CLIENT_CV.wait_for(lock, std::chrono::seconds(1));
if (!success) {
std::cout << "timed out during tick " << i << std::endl;
break;
}
}
Client
for (int i = 0; i < 10; i++) {
std::unique_lock<std::mutex> lock(CLIENT_MUTEX);
//do work
CLIENT_CV.notify_one();
}
However my implementation attempt is unreliable and times out at random times given the same work for the client. How can I improve the design to make it more reliable?
Side Note:
A simple solution to this would be for the server to time the entire for loop as opposed to each tick. However if the for loop fails on tick 1 out of 10, and the timer is waiting for 10 seconds, then the client will be informed after 10 seconds. However if the server was to impose a 1 second timeout for each tick (10x1sec = 10secs) then the client will be informed of timeout without having to wait the full 10 seconds.
Edit.
This whole client/server/timeout analogy is simply to put the question into context. I'm purely interested in the best way to time the for loop from a different thread.
One way of doing this might be:
Shared vars:
std::vector<std::chrono::time_point<std::chrono::high_resolution_clock>> ledger;
std::mutex ledger_mtx;
Client:
for (int i = 0; i < 10; i++) {
{
std::scoped_lock lock(ledger_mtx);
ledger.push_back(std::chrono::high_resolution_clock::now());
}
// Do work
}
{
std::scoped_lock lock(ledger_mtx);
ledger.push_back(std::chrono::high_resolution_clock::now());
}
Server:
size_t id = 0;
std::this_thread::wait_for(1s); // Some time so that initial write to ledger is made
while(true) {
{
std::scoped_lock lock(ledger_mtx);
if(ledger.size()==id) { /* Do something if the thread hangs */ }
id = ledger.size();
std::chrono::time_point<std::chrono::high_resolution_clock> last_tick = ledger.back();
}
if(id == 11) break;
std::this_thread::sleep_for(1s - (std::chrono::high_resolution_clock::now() - last_tick));
}
This way you can time the thread, while monitoring it from the outside. Is it the best way? probably not, but it does give you the times you need.

How to run a function on a separate thread, if a thread is available

How can I run a function on a separate thread if a thread is available, assuming that i always want k threads running at the same time at any point?
Here's a pseudo-code
For i = 1 to N
IF numberOfRunningThreads < k
// run foo() on another thread
ELSE
// run foo()
In summary, once a thread is finished it notifies the other threads that there's a thread available that any of the other threads can use. I hope the description was clear.
My personal approach: Just do create the k threads and let them call foo repeatedly. You need some counter, protected against race conditions, that is decremented each time before foo is called by any thread. As soon as the desired number of calls has been performed, the threads will exit one after the other (incomplete/pseudo code):
unsigned int global_counter = n;
void fooRunner()
{
for(;;)
{
{
std::lock_guard g(global_counter_mutex);
if(global_counter == 0)
break;
--global_counter;
}
foo();
}
}
void runThreads(unsigned int n, unsigned int k)
{
global_counter = n;
std::vector<std::thread> threads(std::min(n, k - 1));
// k - 1: current thread can be reused, too...
// (provided it has no other tasks to perform)
for(auto& t : threads)
{
t = std::thread(&fooRunner);
}
fooRunner();
for(auto& t : threads)
{
t.join();
}
}
If you have data to pass to foo function, instead of a counter you could use e. g a FIFO or LIFO queue, whatever appears most appropriate for the given use case. Threads then exit as soon as the buffer gets empty; you'd have to prevent the buffer running empty prematurely, though, e. g. by prefilling all the data to be processed before starting the threads.
A variant might be a combination of both: exiting, if the global counter gets 0, waiting for the queue to receive new data e. g. via a condition variable otherwise, and the main thread continuously filling the queue while the threads are already running...
you can use (std::thread in <thread>) and locks to do what you want, but it seems to me that your code could be simply become parallel using openmp like this.
#pragma omp parallel num_threads(k)
#pragma omp for
for (unsigned i = 0; i < N; ++i)
{
auto t_id = omp_get_thread_num();
if (t_id < K)
foo()
else
other_foo()
}

Best way to wakeup multiple thread using pthread

I have created 4 threads by pthread_create. I want them to start running at the very same time, so I add sem_wait(&sem) at the very beginning of the thread procedure. In main thread, I may using something like this, but I don't think it is a good solution:
for (int i = 0; i < 4; i++)
{
sem_post(&sem);
}
I googled and found pthread_cond_t. However, pthread_cond_broadcast can only wake up threads that are currently waiting. Even if I put pthread_cond_wait at the very beginning of the procedure, it is still not guaranteed that pthread_cond_wait is called before pthread_cond_broadcast (in main thread).
To avoid this, I have to add lots of additional codes to make sure the calling sequence of wait and broadcast, which is also not smart.
So, is there a simple way to 'line-up' all threads (make them start to run at the same time)?
There seems to be a sem_post_multiple, but it is a win32 extension in pthread. I am using Linux (Android) however.
you are searching for a barrier
pthread_barrier_t
you initialize it with the number of threads (n) and then call pthread_barrier_wait() with every thread. This call will block the execution until n threads have reached the barrier.
example:
int num_threads = 4;
pthread_barrier_t bar;
void* thread_start(void* arg) {
pthread_barrier_wait(&bar);
//...
}
int main() {
pthread_barrier_init(&bar,NULL,num_threads);
pthread_t thread[num_threads];
for (int i=0; i < num_threads; i++) {
pthread_create(thread + i, NULL, &thread_start, NULL);
}
for (int i=0; i < num_threads; i++) {
pthread_join(thread[i], NULL);
}
pthread_barrier_destroy(&bar);
return 0;
}

Activate threads from the slowest or from the faster?

I have an application on Linux on an i7 using boost::thread, and currently I have about 10 threads (between the 8 cores) that run simultaneously doing image processing on images of sizes of approximately 240 x 180 to 960 x 720, so naturally the smaller images finish faster than the larger images. At some point in the future I may need to bump up the number of threads so there will definitely be many more threads than cores.
So, is there a rule of thumb to decide which order to start and wait for threads; fastest first to get the small tasks out of the way, or slowest first to get it started sooner so finished sooner? Or is my synchronisation code wonky?
For reference, my code is approximately this, where the lower-numbered threads are slower:
Globals
static boost::mutex mutexes[MAX_THREADS];
static boost::condition_variable_any conditions[MAX_THREADS];
Main thread
// Wake threads
for (int thread = 0; thread < MAX_THREADS; thread++)
{
mutexes[thread].unlock();
conditions[thread].notify_all();
}
// Wait for them to sleep again
for (int thread = 0; thread < MAX_THREADS; thread++)
{
boost::unique_lock<boost::mutex> lock(mutexes[thread]);
}
Processing thread
static void threadFunction(int threadIdx)
{
while(true)
{
boost::unique_lock<boost::mutex> lock(mutexes[threadIdx]);
// DO PROCESSING
conditions[threadIdx].notify_all();
conditions[threadIdx].wait(lock);
}
}
Thanks to the the hints from the commenters and much Googling, I've completely reworked my code, which seems to be slightly faster without the mutexes.
Globals
// None now
Main thread
boost::asio::io_service ioService;
boost::thread_group threadpool;
{
boost::asio::io_service::work work(ioService);
for (size_t i = 0; i < boost::thread::hardware_concurrency(); i++)
threadpool.create_thread(boost::bind(&boost::asio::io_service::run, &ioService));
for (int thread = 0; thread < MAX_THREADS; thread++)
ioService.post(std::bind(threadFunction, thread));
}
threadpool.join_all();
Processing thread
static void threadFunction(int threadIdx)
{
// DO PROCESSING
}
(I've made this a Community Wiki as it's not really my answer.)