Serial code execution in a multi-threaded program in C++

Serial code execution in a multi-threaded program in C++ - c++

The question: Is it possible to guarantee code execution can only occur in one thread at a time in a multi-threaded program? (Or something which approximates this)
Specifically: I have a controller M (which is a thread) and threads A, B, C. I would like M to be able to decided who should be allowed to run. When the thread has finished (either finally or temporarily) the control transfers back to M.
Why: Ideally I want A, B and C to execute their code in their own thread while the others are not running. This would enable each thread to keep their instruction pointer and stack while they pause, starting back where they left off when the controller gives them the control back.
What I'm doing now: I've written some code which can actually do this - but I don't like it.
In pseudo-C:
//Controller M
//do some stuff
UnlockMutex(mutex);
do{}while(lockval==0);
LockMutex(mutex);
//continue with other stuff
//Thread A
//The controller currently has the mutex - will release it at UnlockMutex
LockMutex(mutex);
lockval=1;
//do stuff
UnlockMutex(mutex);
The reason why
do{}while(lockval==0);
is required is that when the mutex is unlocked, both A and M will continue. This hack ensures that A won't unlock the mutex before M can lock it again allowing A to retake the lock a second time and run again (it should only run once).
The do-while seems like overkill, but does the job. So my question is, is there a better way?

Assuming you're running on Windows, you might try looking at Fibers. (See eg http://developer.amd.com/Pages/1031200677.aspx or just google "windows fibers".)
I suspect you're really looking for coroutines.

Check for "CriticalSection" in Win32.
C++ 11 uses an other term "lock_guard".
How do I make a critical section with Boost?
http://en.cppreference.com/w/cpp/thread/lock_guard
Your code
do{}while(lockval==0);
will eat up your CPU performance.

I presume your are coding c++ under linux and using pthread API.
Here is the code, not so much robust, but a good point to start. Hope useful to you.
Using "g++ test_controller_thread.cpp -pthread -o test_controller_thread" to make the binary executive.
// 3 threads, one for controller, the other two for worker1 and worker2.
// Only one thread can proceed at any time.
// We use one pthread_mutex_t and two pthread_cond_t to guarantee this.
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
static pthread_mutex_t g_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t g_controller_cond = PTHREAD_COND_INITIALIZER;
static pthread_cond_t g_worker_cond = PTHREAD_COND_INITIALIZER;
void* controller_func(void *arg) {
printf("entering the controller thread. \n");
// limit the max time the controller can run
int max_run_time = 5;
int run_time = 0;
pthread_mutex_lock(&g_mutex);
while (run_time++ < max_run_time) {
printf("controller is waitting.\n");
pthread_cond_wait(&g_controller_cond, &g_mutex);
printf("controller is woken up.\n");
pthread_cond_signal(&g_worker_cond);
printf("signal worker to wake up.\n");
}
pthread_mutex_unlock(&g_mutex);
}
void* worker_func(void *arg) {
int work_id = *(int*)arg;
printf("worker %d start.\n", work_id);
pthread_mutex_lock(&g_mutex);
while (1) {
printf("worker %d is waitting for controller.\n", work_id);
pthread_cond_wait(&g_worker_cond, &g_mutex);
printf("worker %d is working.\n", work_id);
pthread_cond_signal(&g_controller_cond);
printf("worker %d signal the controller.\n", work_id);
}
pthread_mutex_unlock(&g_mutex);
}
int main() {
pthread_t controller_thread, worker_thread_1, worker_thread_2;
int worker_id_1 = 1;
int worker_id_2 = 2;
pthread_create(&controller_thread, NULL, controller_func, NULL);
pthread_create(&worker_thread_1, NULL, worker_func, &worker_id_1);
pthread_create(&worker_thread_2, NULL, worker_func, &worker_id_2);
sleep(1);
printf("\nsignal the controller to start all the process.\n\n");
pthread_cond_signal(&g_controller_cond);
pthread_join(controller_thread, NULL);
pthread_cancel(worker_thread_1);
pthread_cancel(worker_thread_2);
return 0;
}

Related

Port program that uses CreateEvent and WaitForMultipleObjects to Linux

I need to port a multiprocess application that uses the Windows API functions SetEvent, CreateEvent and WaitForMultipleObjects to Linux. I have found many threads concerning this issue, but none of them provided a reasonable solution for my problem.
I have an application that forks into three processes and manages thread workerpool of one process via these Events.
I had multiple solutions to this issue. One was to create FIFO special files on Linux using mkfifo on linux and use a select statement to awaken the threads. The Problem is that this solution will operate differently than WaitForMultipleObjects. For Example if 10 threads of the workerpool will wait for the event and I call SetEvent five times, exactly five workerthreads will wake up and do the work, when using the FIFO variant in Linux, it would wake every thread, that i in the select statement and waiting for data to be put in the fifo. The best way to describe this is that the Windows API kind of works like a global Semaphore with a count of one.
I also thought about using pthreads and condition variables to recreate this and share the variables via shared memory (shm_open and mmap), but I run into the same issue here!
What would be a reasonable way to recreate this behaviour on Linux? I found some solutions doing this inside of a single process, but what about doing this with between multiple processes?
Any ideas are appreciated (Note: I do not expect a full implementation, I just need some more ideas to get myself started with this problem).

You could use a semaphore (sem_init), they work on shared memory. There's also named semaphores (sem_open) if you want to initialize them from different processes. If you need to exchange messages with the workers, e.g. to pass the actual tasks to them, then one way to resolve this is to use POSIX message queues. They are named and work inter-process. Here's a short example. Note that only the first worker thread actually initializes the message queue, the others use the attributes of the existing one. Also, it (might) remain(s) persistent until explicitly removed using mq_unlink, which I skipped here for simplicity.
Receiver with worker threads:
// Link with -lrt -pthread
#include <fcntl.h>
#include <mqueue.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void *receiver_thread(void *param) {
struct mq_attr mq_attrs = { 0, 10, 254, 0 };
mqd_t mq = mq_open("/myqueue", O_RDONLY | O_CREAT, 00644, &mq_attrs);
if(mq < 0) {
perror("mq_open");
return NULL;
}
char msg_buf[255];
unsigned prio;
while(1) {
ssize_t msg_len = mq_receive(mq, msg_buf, sizeof(msg_buf), &prio);
if(msg_len < 0) {
perror("mq_receive");
break;
}
msg_buf[msg_len] = 0;
printf("[%lu] Received: %s\n", pthread_self(), msg_buf);
sleep(2);
}
}
int main() {
pthread_t workers[5];
for(int i=0; i<5; i++) {
pthread_create(&workers[i], NULL, &receiver_thread, NULL);
}
getchar();
}
Sender:
#include <fcntl.h>
#include <stdio.h>
#include <mqueue.h>
#include <unistd.h>
int main() {
mqd_t mq = mq_open("/myqueue", O_WRONLY);
if(mq < 0) {
perror("mq_open");
}
char msg_buf[255];
unsigned prio;
for(int i=0; i<255; i++) {
int msg_len = sprintf(msg_buf, "Message #%d", i);
mq_send(mq, msg_buf, msg_len, 0);
sleep(1);
}
}

Logging with asl layout on mac OS-X multi-threaded project

I'd like to convert all my log messages in my multi-threaded project, to use Apple System Log facility (or asl).
according to the following asl manual - https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man3/asl_get.3.html
When logging from multiple threads, each thread must open a separate client handle using asl_open.
For that reason, I've defined asl client per thread to be used in all my log commands. However, in facing some major difficulties in binding asl client to each asl_log command.
1. what if some of my asl log commands reside in a code that is common for
more than one thread - which asl client should i decide use on such message.
2. Even on thread unique code, one should be consistent in choosing the same
asl_client on all log functions on a single thread code scope (this is
not always easy to find in complex projects.).
Is there any easier way to adopt my project logging messages to use asl ?
I'd think about something like binding asl client to thread,
thanks

Ok, so the best solution I've found out so far is by creating a global variable asl client that is thread-specific.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <asl.h>
#define NUMTHREADS 4
pthread_key_t glob_var_key;
void print_func() //take global var and use it as the aslclient per thread
{
asl_log(*((aslclient*) pthread_getspecific(glob_var_key)),NULL,ASL_LEVEL_NOTICE, "blablabla");
}
void* thread_func(void *arg)
{
aslclient *p = malloc(sizeof(aslclient));
// added tid to message format to distinguish between messages
uint64_t tid;
pthread_threadid_np(NULL, &tid);
char tid_str[20];
sprintf(tid_str, "%llu", tid);
*p = asl_open(tid_str,"Facility",ASL_OPT_STDERR);
pthread_setspecific(glob_var_key, p);
print_func();
sleep(1); // enable ctx switch
print_func();
pthread_setspecific(glob_var_key, NULL);
free(p);
pthread_exit(NULL);
}
int main(void)
{
pthread_t threads[NUMTHREADS];
int i;
pthread_key_create(&glob_var_key,NULL);
for (i=0; i < NUMTHREADS; i++)
pthread_create(&threads[i],NULL,thread_func,NULL);
for (i=0; i < NUMTHREADS; i++)
pthread_join(threads[i], NULL);
}

Detached pthreads and memory leak

Can somebody please explain to me why this simple code leaks memory?
I believe that since pthreads are created with detached state their resources should be released inmediatly after it's termination, but it's not the case.
My environment is Qt5.2.
#include <QCoreApplication>
#include <windows.h>
void *threadFunc( void *arg )
{
printf("#");
pthread_exit(NULL);
}
int main()
{
pthread_t thread;
pthread_attr_t attr;
while(1)
{
printf("\nStarting threads...\n");
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
}
printf("\nSleeping 10 seconds...\n");
Sleep(10000);
}
}
UPDATE:
I discovered that if I add a slight delay of 5 milliseconds inside the for loop the leak is WAY slower:
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
Sleep(5); /// <--- 5 MILLISECONDS DELAY ///
}
This is freaking me out, could somebody please tell me what is happening? How this slight delay may produce such a significant change? (or alter the behavior in any way)
Any advice would be greatly appreciated.
Thanks.
UPDATE2:
This leak was observed on Windows platforms (W7 and XP), no leak was observed on Linux platforms (thank you #MichaelGoren)

I checked the program with slight modifications on windows using cygwin, and memory consumption was steady. So it must be a qt issue; the pthread library on cygwin works fine without leaking.
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void *threadFunc( void *arg )
{
printf("#");
pthread_exit(NULL);
}
int main()
{
pthread_t thread;
pthread_attr_t attr;
int idx;
while(1)
{
printf("\nStarting threads...\n");
for(idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
}
printf("\nSleeping 10 seconds...\n");
//Sleep(10000);
sleep(10);
}
}

Compiler optimizations or the OS it self can decide to do loop unrolling. That is your for loop has a constant bound (100 here). Since there is no explicit synchronization to prevent it, a newly created, detached thread can die and have its thread ID reassigned to another new thread before its creator returns from pthread_create() due to this unrolling. The next iteration is already started before the thread was actually destroyed.
This also explains why your added slight delay has less issues; one iteration takes longer and hence the thread functions can actually finish in more cases and hence the threads are actually terminated most of the time.
A possible fix would be to disable compiler optimizations, or add synchronization; that is, you check whether the thread still exist, at the end of the code, if it does you'll have to wait for the function to finish.
A more tricky way would be to use mutexes; you let the thread claim a resource at creation and by definition of PTHREAD_CREATE_DETACHED this resource is automatically released when the thread is exited, hence you can use try_lock to test whether the thread is actually finished. Note that I haven't tested this approach so I'm not actually sure whether PTHREAD_CREATE_DETACHED actually is working according to its definition...
Concept:
pthread_mutex_t mutex;
void *threadFunc( void *arg )
{
printf("#");
pthread_mutex_lock(&mutex);
pthread_exit(NULL);
}
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
pthread_mutex_lock(&mutex); //will block untill "destroy" has released the mutex
pthread_mutex_unlock(&mutex);
}

The delay can induce a large change in behavior because it gives the thread time to exit! Of course how your pthread library is implemented is also a factor here. I suspect it is using a 'free list' optimization.
If you create 1000 threads all at once, then the library allocates memory for them all before any significant number of those threads can exit.
If as in your second code sample you let the previous thread run and probably exit before you start a new thread, then your thread library can reuse that thread's allocated memory or data structures which it now knows are no longer needed and it is now probably holding in a free list just in case someone creates a thread again and it can efficiently recycle the memory.

It has nothing to do with compiler optimisations. Code is fine. Problem could be
a) Windows itself.
b) Qt implementation of pthread_create() with detached attributes
Checking for (a): Try to create many fast detached threads using Windows _beginthreadex directly and see if you get the same picture. Note: CloseHandle(thread_handle) as soon as _beginthreaex returns to make it detached.
Checking for (b): Trace which function Qt uses to create threads. If it is _beginthread then there is your answer. If it is _beginthreadex, then Qt is doing the right thing and you need to check if Qt closes the thread handle handle immediately. If it does not then that is the cause.
cheers
UPDATE 2
Qt5.2.0 does not provide pthreads API and is unlikely responsible for the observed leak.
I wrapped native windows api to see how the code runs without pthread library. You can include this fragment right after includes:
#include <process.h>
#define PTHREAD_CREATE_JOINABLE 0
#define PTHREAD_CREATE_DETACHED 1
typedef struct { int detachstate; } pthread_attr_t;
typedef HANDLE pthread_t;
_declspec(noreturn) void pthread_exit(void *retval)
{
static_assert(sizeof(unsigned) == sizeof(void*), "Modify code");
_endthreadex((unsigned)retval);
}
int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate)
{
attr->detachstate = detachstate;
return 0;
}
int pthread_attr_init(pthread_attr_t *attr)
{
attr->detachstate = PTHREAD_CREATE_JOINABLE;
return 0;
}
int pthread_attr_destroy(pthread_attr_t *attr)
{
(void)attr;
return 0;
}
typedef struct {
void *(*start_routine)(void *arg);
void *arg;
} winapi_caller_args;
unsigned __stdcall winapi_caller(void *arglist)
{
winapi_caller_args *list = (winapi_caller_args *)arglist;
void *(*start_routine)(void *arg) = list->start_routine;
void *arg = list->arg;
free(list);
static_assert(sizeof(unsigned) == sizeof(void*), "Modify code");
return (unsigned)start_routine(arg);
}
int pthread_create( pthread_t *thread, pthread_attr_t *attr,
void *(*start_routine)(void *), void *arg)
{
winapi_caller_args *list;
list = (winapi_caller_args *)malloc(sizeof *list);
if (list == NULL)
return EAGAIN;
list->start_routine = start_routine;
list->arg = arg;
*thread = (HANDLE)_beginthreadex(NULL, 0, winapi_caller, list, 0, NULL);
if (*thread == 0) {
free(list);
return errno;
}
if (attr->detachstate == PTHREAD_CREATE_DETACHED)
CloseHandle(*thread);
return 0;
}
With Sleep() line commented out it works OK without leaks. Run time = 1hr approx.
If the code with Sleep line commented out is calling Pthreads-win32 2.9.1 library (prebuilt for MSVC) then the program stops spawning new threads and stops responding after 5..10 minutes.
Test environment: XP Home, MSVC 2010 Expresss, Qt5.2.0 qmake etc.

You forgot to join your thread (even if they are finished already).
Correct code should be:
pthread_t arr[100];
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &arr[idx], &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
}
Sleep(2000);
for(int idx=0;idx<100;idx++)
{
pthread_join(arr[idx]);
}
Note from man page:
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when enough zombie threads have
accumulated, it will no longer be possible to create new threads (or processes).

using libev with multiple threads

I want to use libev with multiple threads for the handling of tcp connections. What I want to is:
The main thread listen on incoming connections, accept the
connections and forward the connection to a workerthread.
I have a pool of workerthreads. The number of threads depends on the
number of cpu's. Each worker-thread has an event loop. The worker-thread listen if I can write on the tcp socket or if
somethings available for reading.
I looked into the documentation of libev and I known this can be done with libev, but I can't find any example how I have to do that.
Does someone has an example?
I think that I have to use the ev_loop_new() api, for the worker-threads and for the main thread I have to use the ev_default_loop() ?
Regards

The following code can be extended to multiple threads
//This program is demo for using pthreads with libev.
//Try using Timeout values as large as 1.0 and as small as 0.000001
//and notice the difference in the output
//(c) 2009 debuguo
//(c) 2013 enthusiasticgeek for stack overflow
//Free to distribute and improve the code. Leave credits intact
#include <ev.h>
#include <stdio.h> // for puts
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t lock;
double timeout = 0.00001;
ev_timer timeout_watcher;
int timeout_count = 0;
ev_async async_watcher;
int async_count = 0;
struct ev_loop* loop2;
void* loop2thread(void* args)
{
printf("Inside loop 2"); // Here one could initiate another timeout watcher
ev_loop(loop2, 0); // similar to the main loop - call it say timeout_cb1
return NULL;
}
static void async_cb (EV_P_ ev_async *w, int revents)
{
//puts ("async ready");
pthread_mutex_lock(&lock); //Don't forget locking
++async_count;
printf("async = %d, timeout = %d \n", async_count, timeout_count);
pthread_mutex_unlock(&lock); //Don't forget unlocking
}
static void timeout_cb (EV_P_ ev_timer *w, int revents) // Timer callback function
{
//puts ("timeout");
if (ev_async_pending(&async_watcher)==false) { //the event has not yet been processed (or even noted) by the event loop? (i.e. Is it serviced? If yes then proceed to)
ev_async_send(loop2, &async_watcher); //Sends/signals/activates the given ev_async watcher, that is, feeds an EV_ASYNC event on the watcher into the event loop.
}
pthread_mutex_lock(&lock); //Don't forget locking
++timeout_count;
pthread_mutex_unlock(&lock); //Don't forget unlocking
w->repeat = timeout;
ev_timer_again(loop, &timeout_watcher); //Start the timer again.
}
int main (int argc, char** argv)
{
if (argc < 2) {
puts("Timeout value missing.\n./demo <timeout>");
return -1;
}
timeout = atof(argv[1]);
struct ev_loop *loop = EV_DEFAULT; //or ev_default_loop (0);
//Initialize pthread
pthread_mutex_init(&lock, NULL);
pthread_t thread;
// This loop sits in the pthread
loop2 = ev_loop_new(0);
//This block is specifically used pre-empting thread (i.e. temporary interruption and suspension of a task, without asking for its cooperation, with the intention to resume that task later.)
//This takes into account thread safety
ev_async_init(&async_watcher, async_cb);
ev_async_start(loop2, &async_watcher);
pthread_create(&thread, NULL, loop2thread, NULL);
ev_timer_init (&timeout_watcher, timeout_cb, timeout, 0.); // Non repeating timer. The timer starts repeating in the timeout callback function
ev_timer_start (loop, &timeout_watcher);
// now wait for events to arrive
ev_loop(loop, 0);
//Wait on threads for execution
pthread_join(thread, NULL);
pthread_mutex_destroy(&lock);
return 0;
}

Using libev within different threads at the same time is fine as long as each of them runs its own loop[1].
The c++ wrapper in libev (ev++.h) always uses the default loop instead of letting you specify which one you want to use. You should use the C header instead (ev.h) which allows you to specify which loop to use (e.g. ev_io_start takes a pointer to an ev_loop but the ev::io::start doesn't).
You can signal another thread's ev_loop safely through ev_async.
[1]http://doc.dvgu.ru/devel/ev.html#threads_and_coroutines

Thread - synchronizing and sleeping thread refuses to wake up (LINUX)

I'm developing an application For OpenSUSE 12.1.
This application has a main thread and other two threads running instances of the same functions. I'm trying to use pthread_barrier to synchronize all threads but I'm having some problems:
When I put the derived threads to sleep, they will never wake up for some reason.
(in the case when I remove the sleep from the other threads, throwing CPU usage to the sky) In some point all the threads reach pthread_barrier_wait() but none of them continues execution after that.
Here's some pseudo code trying to illustrate what I'm doing.
pthread_barrier_t barrier;
int main(void)
{
pthread_barrier_init(&barrier, NULL , 3);
pthread_create(&thread_id1, NULL,&thread_func, (void*) &params1);
pthread_create(&thread_id2v, NULL,&thread_func, (void*) &params2);
while(1)
{
doSomeWork();
nanosleep(&t1, &t2);
pthread_barrier_wait(&barrier);
doSomeMoreWork();
}
}
void *thread_func(void *params)
{
init_thread(params);
while(1)
{
nanosleep(&t1, &t2);
doAnotherWork();
pthread_barrier_wait(&barrier);
}
}

I don't think it has to do with the barrier as you've presented it in the pseudocode. I'm making an assumption that your glibc is approximately the same as my machine. I compiled roughly your pseudo-code and it's running like I expect: the threads do some work, the main thread does some work, they all reach the barrier and then loop.
Can you comment more about any other synchronization methods or what the work functions are?
This is the the example program I'm using:
#include <pthread.h>
#include <stdio.h>
#include <time.h>
struct timespec req = {1,0}; //{.tv_sec = 1, .tv_nsec = 0};
struct timespec rem = {0,0}; //{.tv_sec = 0, .tv_nsec = 0};
pthread_barrier_t barrier;
void *thread_func(void *params) {
long int name;
name = (long int)params;
while(1) {
printf("This is thread %ld\n", name);
nanosleep(&req, &rem);
pthread_barrier_wait(&barrier);
printf("More work from %ld\n", name);
}
}
int main(void)
{
pthread_t th1, th2;
pthread_barrier_init(&barrier, NULL , 3);
pthread_create(&th1, NULL, &thread_func, (void*)1);
pthread_create(&th2, NULL, &thread_func, (void*)2);
while(1) {
nanosleep(&req, &rem);
printf("This is the parent\n\n");
pthread_barrier_wait(&barrier);
}
return 0;
}

I would suggest to use condition variables in order to synchronize threads.
Here some website about how to do it i hope it helps.
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js