Pthread program runs slower as thread increases - c++

I'm a beginner in parallel programming and I tried to write a parallel program with pthread library. I ran the program on a 8 processor computer. The problem is that when I increase NumProcs, each thread slows down though their tasks are always the same. Can someone help me to figure out what is happening?
`
#define MAX_NUMP 16
using namespace std;
int NumProcs;
pthread_mutex_t SyncLock; /* mutex */
pthread_cond_t SyncCV; /* condition variable */
int SyncCount; /* number of processors at the barrier so far */
pthread_mutex_t ThreadLock; /* mutex */
// used only in solaris. use clock_gettime in linux
//hrtime_t StartTime;
//hrtime_t EndTime;
struct timespec StartTime;
struct timespec EndTime;
void Barrier()
{
int ret;
pthread_mutex_lock(&SyncLock); /* Get the thread lock */
SyncCount++;
if(SyncCount == NumProcs) {
ret = pthread_cond_broadcast(&SyncCV);
assert(ret == 0);
} else {
ret = pthread_cond_wait(&SyncCV, &SyncLock);
assert(ret == 0);
}
pthread_mutex_unlock(&SyncLock);
}
/* The function which is called once the thread is allocated */
void* ThreadLoop(void* tmp)
{
/* each thread has a private version of local variables */
long threadId = (long) tmp;
int ret;
int startTime, endTime;
int count=0;
/* ********************** Thread Synchronization*********************** */
Barrier();
/* ********************** Execute Job ********************************* */
startTime = clock();
for(int i=0;i<65536;i++)
for(int j=0;j<1024;j++)
count++;
endTime = clock();
printf("threadid:%ld, time:%d\n",threadId,endTime-startTime);
}
int main(int argc, char** argv)
{
pthread_t* threads;
pthread_attr_t attr;
int ret;
int dx;
if(argc != 2) {
fprintf(stderr, "USAGE: %s <numProcesors>\n", argv[0]);
exit(-1);
}
assert(argc == 2);
NumProcs = atoi(argv[1]);
assert(NumProcs > 0 && NumProcs <= MAX_NUMP);
/* Initialize array of thread structures */
threads = (pthread_t *) malloc(sizeof(pthread_t) * NumProcs);
assert(threads != NULL);
/* Initialize thread attribute */
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); // sys manages contention
/* Initialize mutexs */
ret = pthread_mutex_init(&SyncLock, NULL);
assert(ret == 0);
ret = pthread_mutex_init(&ThreadLock, NULL);
assert(ret == 0);
/* Init condition variable */
ret = pthread_cond_init(&SyncCV, NULL);
assert(ret == 0);
SyncCount = 0;
Count = 0;
/* get high resolution timer, timer is expressed in nanoseconds, relative
* to some arbitrary time.. so to get delta time must call gethrtime at
* the end of operation and subtract the two times.
*/
//StartTime = gethrtime();
ret = clock_gettime(CLOCK_MONOTONIC, &StartTime);
for(dx=0; dx < NumProcs; dx++) {
/* ************************************************************
* pthread_create takes 4 parameters
* p1: threads(output)
* p2: thread attribute
* p3: start routine, where new thread begins
* p4: arguments to the thread
* ************************************************************ */
ret = pthread_create(&threads[dx], &attr, ThreadLoop, (void*) dx);
assert(ret == 0);
}
/* Wait for each of the threads to terminate */
for(dx=0; dx < NumProcs; dx++) {
ret = pthread_join(threads[dx], NULL);
assert(ret == 0);
}
//EndTime = gethrtime();
ret = clock_gettime(CLOCK_MONOTONIC, &EndTime);
printf("Time = %ld nanoseconds\n", EndTime.tv_nsec - StartTime.tv_nsec);
pthread_mutex_destroy(&ThreadLock);
pthread_mutex_destroy(&SyncLock);
pthread_cond_destroy(&SyncCV);
pthread_attr_destroy(&attr);
return 0;
}

Your observation is expected.
The main factors that usually impact this situation (worker spinning on local computation) are:
The ratio nb_threads / nb_available_machine_cores
The affinity of each thread
The optimal scenario here is when you have a ratio of 1, and each thread has a unique affinity with one of the core.
The idea is to maximize each core throughput. You can do that by having one and only one thread running on each core. If you increase the number of threads (ratio > 1), several threads will share the same core, forcing the kernel (through the task scheduler) to switch between the execution of each of them. This is what you were observing.
Each time the kernel has to operate such a switch, you pay for a context switch. It may become a noticeable overhead.
Note:
You can use pthread_setaffinity to set the affinity of your threads.

If you are running this in release mode (O3 compiler flag) then there are two things wrong with ThreadLoop():
1) There is never any external usage of the 'count' result, so the compiler will omit computing it because it has no visible effect.
2) Even if there had been external usage of 'count' then the compiler will compute the result at compile time and simply emit the value directly.
You can see all this if you disassemble the binary.
You can declare 'volatile int count' to bypass both problems or you can compile with O1 compiler flag or do both.
The loop should scale pretty linearly with number of threads because there is no memory contention. By the way, you should increase the loop iterations because I think the duration could be close to the noise ratio...

Related

ESP32: Attaching an interrupt directly to system time

Currently I'm setting a separate hardware timer to the system time periodically to trigger timed interrupts. It's working fine but for elegance sake, but I wondered if it was possible to attach an interrupt directly to the system time
The events are pretty fast: one every 260 microseconds
ESP32 has a few clocks used for system time. The default full power clock is an 80 MHz called APB_CLK. But even the slow RTC clock has 6.6667 μs resolution. (Documentation here: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/system_time.html)
I have a GPS module that I use to update the system time periodically using adjtime(3). The advantage of that being that it gradually adjusts the system time monotonically. Also system time calls are thread safe
I'm using the Arduino IDE, so my knowledge of accessing registers and interrupts directly is poor. Here's a semi boiled down version of what I'm doing. Bit banging a synchronized digital signal. Rotating in 160 bit pages that are prepped from the other core. It's not all of my code, so something not important might be missing:
#define CPU_SPEED 40
hw_timer_t* timer = NULL;
PageData pages[2];
PageData* timerCurrentPage = &pages[0];
PageData* loopCurrentPage = &pages[1];
TaskHandle_t prepTaskHandle;
volatile int bitCount = 0;
void IRAM_ATTR onTimer() {
int level = timerCurrentPage->data[bitCount];
dac_output_voltage(DAC_CHANNEL_1, level?high:low);
bitCount++;
if(bitCount<160) {
timerAlarmWrite(timer, (timerCurrentPage->startTick+timerCurrentPage->ticksPerPage*bitCount), false);
} else {
if(timerCurrentPage == &pages[0]) timerCurrentPage = &pages[1];
else timerCurrentPage = &pages[0];
bitCount = 0;
timerAlarmWrite(timer, (timerCurrentPage->startTick), false);
vTaskResume(prepTaskHandle);
}
}
uint64_t nowTick() {
timeval timeStruct;
gettimeofday(&timeStruct, NULL);
uint64_t result = (uint64_t)timeStruct.tv_sec*1000000UL + (uint64_t)timeStruct.tv_usec;
return result;
}
void gpsUpdate(uint64_t micros) {
int64_t now = nowTick();
int64_t offset = micros - now;
timeval adjustStruct = {0,offset};
adjtime(&adjustStruct,NULL);
}
void setup() {
setCpuFrequencyMhz(CPU_SPEED);
timer = timerBegin(0, CPU_SPEED, true);
timerWrite(timer, nowTick());
timerAttachInterrupt(timer, &onTimer, true);
setPage(&pages[0]);
xTaskCreatePinnedToCore(
prepLoop, /* Task function. */
"Prep Task", /* name of task. */
10000, /* Stack size of task */
NULL, /* parameter of the task */
1, /* priority of the task */
&prepTaskHandle, /* Task handle to keep track of created task */
1); /* pin task to core 0 */
timerAlarmWrite(timer, (timerCurrentPage->startTick), false);
}
//On Core 1
void prepLoop() {
while(1) {
vTaskSuspend(NULL); //prepTaskHandle
timerWrite(timer, nowTick());
if(loopCurrentPage == &pages[0]) loopCurrentPage = &pages[1];
else loopCurrentPage = &pages[0];
setPage(loopCurrentPage);
}
}

Problem with multi-threading and waiting on events

I have a problem with my code:
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <windows.h>
#include <string.h>
#include <math.h>
HANDLE event;
HANDLE mutex;
int runner = 0;
DWORD WINAPI thread_fun(LPVOID lpParam) {
int* data = (int*)lpParam;
for (int j = 0; j < 4; j++) { //this loop necessary in order to reproduce the issue
if ((data[2] + 1) == data[0]) { // if it is last thread
while (1) {
WaitForSingleObject(mutex, INFINITE);
if (runner == data[0] - 1) { // if all other thread reach event break
ReleaseMutex(mutex);
break;
}
printf("Run:%d\n", runner);
ReleaseMutex(mutex);
Sleep(10);
}
printf("Check Done:<<%d>>\n", data[2]);
runner = 0;
PulseEvent(event); // let all other threads continue
}
else { // if it is not last thread
WaitForSingleObject(mutex, INFINITE);
runner++;
ReleaseMutex(mutex);
printf("Wait:<<%d>>\n", data[2]);
WaitForSingleObject(event, INFINITE); // wait till all other threads reach this stage
printf("Exit:<<%d>>\n", data[2]);
}
}
return 0;
}
int main()
{
event = CreateEvent(NULL, TRUE, FALSE, NULL);
mutex = CreateMutex(NULL, FALSE, NULL);
SetEvent(event);
int data[3] = {2,8}; //0 amount of threads //1 amount of numbers
HANDLE t[10000];
int ThreadData[1000][3];
for (int i = 0; i < data[0]; i++) {
memcpy(ThreadData[i], data, sizeof(int) * 2); // copy amount of threads and amount of numbers to the threads data
ThreadData[i][2] = i; // creat threads id
LPVOID ThreadsData = (LPVOID)(&ThreadData[i]);
t[i] = CreateThread(0, 0, thread_fun, ThreadsData, 0, NULL);
if (t[i] == NULL)return 0;
}
while (1) {
DWORD res = WaitForMultipleObjects(data[0], t, true, 1000);
if (res != WAIT_TIMEOUT) break;
}
for (int i = 0; i < data[0]; i++)CloseHandle(t[i]); // close all threads
CloseHandle(event); // close event
CloseHandle(mutex); //close mutex
printf("Done");
}
The main idea is to wait until all threads except one reach the event and wait there, meanwhile the last thread must release them from waiting.
But the code doesn't work reliably. 1 in 10 times, it ends correctly, and 9 times just gets stuck in while(1). In different tries, printf in while (printf("Run:%d\n", runner);) prints different numbers of runners (0 and 3).
What can be the problem?
As we found out in the comments section, the problem was that although the event was created in the initial state of being non-signalled
event = CreateEvent(NULL, TRUE, FALSE, NULL);
it was being set to the signalled state immediately afterwards:
SetEvent(event);
Due to this, at least on the first iteration of the loop, when j == 0, the first worker thread wouldn't wait for the second worker thread, which caused a race condition.
Also, the following issues with your code are worth mentioning (although these issues were not the reason for your problem):
According to the Microsoft documentation on PulseEvent, that function should not be used, as it can be unreliable and is mainly provided for backward-compatibility. According to the documentation, you should use condition variables instead.
In your function thread_fun, the last thread is locking and releasing the mutex in a loop. This can be bad, because mutexes are not guaranteed to be fair and it is possible that this will cause other threads to never be able to acquire the mutex. Although this possibility is mitigated by you calling Sleep(10); once in every loop iteration, it is still not the ideal solution. A better solution would be to use a condition variable, so that the thread only checks for changes of the variable runner when another thread actually signals a possible change. Such a solution would also be better for performance reasons.

multithread list shared performance

I am developing an application that reads data from a named pipe on Windows 7 at around 800 Mbps. I have to develop it with several threads since the FIFO at the other side of the pipe overflows if I am not able to read at the given speed. The performance though is really pitifull and I cannot understand why. I already read several things I tried to split the memory to avoid bad memory sharing.
At the beginning I has thinking I could be a problem with contiguous memory possitions, but the memory sections are queued in a list the main thread is not using them any more after queue it. The amount of memory are huge so I don't thing they lay on same pages or so.
This is the threaded function:
void splitMessage(){
char* bufferMSEO;
char* bufferMDO;
std::list<struct msgBufferStr*> localBufferList;
while(1)
{
long bytesProcessed = 0;
{
std::unique_lock<std::mutex> lk(bufferMutex);
while(bufferList.empty())
{
// Wait until the map has data
listReady.wait(lk);
}
//Extract the data from the list and copy to the local list
localBufferList.splice(localBufferList.end(),bufferList);
//Unlock the mutex and notify
// Manual unlocking is done before notifying, to avoid waking up
// the waiting thread only to block again (see notify_one for details)
lk.unlock();
//listReady.notify_one();
}
for(auto nextBuffer = localBufferList.begin(); nextBuffer != localBufferList.end(); nextBuffer++)
{
//nextBuffer = it->second();
bufferMDO = (*nextBuffer)->MDO;
bufferMSEO = (*nextBuffer)->MSEO;
bytesProcessed += (*nextBuffer)->size;
//Process the data Stream
for(int k=0; k<(*nextBuffer)->size; k++)
{
}
//localBufferList.remove(*nextBuffer);
free(bufferMDO);
free(bufferMSEO);
free(*nextBuffer);
}
localBufferList.clear();
}
}
And here the thread that reads the data and queue them:
DWORD WINAPI InstanceThread(LPVOID lpvParam)
// This routine is a thread processing function to read from and reply to a client
// via the open pipe connection passed from the main loop. Note this allows
// the main loop to continue executing, potentially creating more threads of
// of this procedure to run concurrently, depending on the number of incoming
// client connections.
{
HANDLE hHeap = GetProcessHeap();
TCHAR* pchRequest = (TCHAR*)HeapAlloc(hHeap, 0, BUFSIZE*sizeof(TCHAR));
DWORD cbBytesRead = 0, cbReplyBytes = 0, cbWritten = 0;
BOOL fSuccess = FALSE;
HANDLE hPipe = NULL;
double totalRxData = 0;
char* bufferPnt;
char* bufferMDO;
char* bufferMSEO;
char* destPnt;
// Do some extra error checking since the app will keep running even if this
// thread fails.
if (lpvParam == NULL)
{
printf( "\nERROR - Pipe Server Failure:\n");
printf( " InstanceThread got an unexpected NULL value in lpvParam.\n");
printf( " InstanceThread exitting.\n");
if (pchRequest != NULL) HeapFree(hHeap, 0, pchRequest);
return (DWORD)-1;
}
if (pchRequest == NULL)
{
printf( "\nERROR - Pipe Server Failure:\n");
printf( " InstanceThread got an unexpected NULL heap allocation.\n");
printf( " InstanceThread exitting.\n");
return (DWORD)-1;
}
// Print verbose messages. In production code, this should be for debugging only.
printf("InstanceThread created, receiving and processing messages.\n");
// The thread's parameter is a handle to a pipe object instance.
hPipe = (HANDLE) lpvParam;
try
{
msgSplitter = std::thread(&splitMessage);
//msgSplitter.detach();
}
catch(...)
{
_tprintf(TEXT("CreateThread failed, GLE=%d.\n"), GetLastError());
return -1;
}
while (1)
{
struct msgBufferStr *newBuffer = (struct msgBufferStr* )malloc(sizeof(struct msgBufferStr));
// Read client requests from the pipe. This simplistic code only allows messages
// up to BUFSIZE characters in length.
fSuccess = ReadFile(
hPipe, // handle to pipe
pchRequest, // buffer to receive data
BUFSIZE*sizeof(TCHAR), // size of buffer
&cbBytesRead, // number of bytes read
NULL); // not overlapped I/O
if (!fSuccess || cbBytesRead == 0)
{
if (GetLastError() == ERROR_BROKEN_PIPE)
{
_tprintf(TEXT("InstanceThread: client disconnected.\n"), GetLastError());
break;
}
else if (GetLastError() == ERROR_MORE_DATA)
{
}
else
{
_tprintf(TEXT("InstanceThread ReadFile failed, GLE=%d.\n"), GetLastError());
}
}
//timeStart = omp_get_wtime();
bufferPnt = (char*)pchRequest;
totalRxData += ((double)cbBytesRead)/1000000;
bufferMDO = (char*) malloc(cbBytesRead);
bufferMSEO = (char*) malloc(cbBytesRead/3);
destPnt = bufferMDO;
//#pragma omp parallel for
for(int i = 0; i < cbBytesRead/12; i++)
{
msgCounter++;
if(*(bufferPnt + (i * 12)) == 0) continue;
if(*(bufferPnt + (i * 12)) == 8)
{
errorCounter++;
continue;
}
//Use 64 bits variables in order to make less operations
unsigned long long *sourceAddrLong = (unsigned long long*) (bufferPnt + (i * 12));
unsigned long long *destPntLong = (unsigned long long*) (destPnt + (i * 8));
//Copy the data bytes from source to destination
*destPntLong = *sourceAddrLong;
//Copy and prepare the MSEO lines for the data processing
bufferMSEO[i*4]=(bufferPnt[(i * 12) + 8] & 0x03);
bufferMSEO[i*4 + 1]=(bufferPnt[(i * 12) + 8] & 0x0C) >> 2;
bufferMSEO[i*4 + 2]=(bufferPnt[(i * 12) + 8] & 0x30) >> 4;
bufferMSEO[i*4 + 3]=(bufferPnt[(i * 12) + 8] & 0xC0) >> 6;
}
newBuffer->size = cbBytesRead/3;
newBuffer->MDO = bufferMDO;
newBuffer->MSEO = bufferMSEO;
{
//lock the mutex
std::lock_guard<std::mutex> lk(bufferMutex);
//add data to the list
bufferList.push_back(newBuffer);
} // bufferMutex is automatically released when lk goes out of scope
//Notify
listReady.notify_one();
}
// Flush the pipe to allow the client to read the pipe's contents
// before disconnecting. Then disconnect the pipe, and close the
// handle to this pipe instance.
FlushFileBuffers(hPipe);
DisconnectNamedPipe(hPipe);
CloseHandle(hPipe);
HeapFree(hHeap, 0, pchRequest);
//Show memory leak isues
_CrtDumpMemoryLeaks();
//TODO: Join thread
printf("InstanceThread exitting.\n");
return 1;
}
The think that really blows my mind is that I a let it like this the splitMessage thread takes minutes to read the data even though the first thread finished reading the data long ago. I mean the read thread reads like 1,5Gb or information in seconds and waits for more data from the pipe. This data are processed by the split thread (the only one really "doing" something in almost one minute or more). The CPU is moreover only to less than 20% percent used. (It is a i7 labtop with 16 Gb RAM and 8 cores!)
On the other hand, if I just comment the for loop in the process thread:
for(int k=0; k<(*nextBuffer)->size; k++)
Then the data are read slowly and the FIFO on the other side of the pipe overflows. With 8 processors and at more than 2 GHz should be fast enought to go throw the buffers without many problems, isn't it? I think it has to be a memory access issue or that the scheduler is sending the thread somehow to sleep but I cannot figure out why!!. Other possibility is that the iteration throw the linked list with the iterator is not optimal.
Any help would be geat because I am trying to understand it since a couple of days, I made several changes in the code and tried to simplified at the maximum and I am getting crazy :).
best regards,
Manuel

mutex / what is the mutex data being locked?

#include <pthread.h>
#include <time.h>
#include "errors.h"
typedef struct alarm_tag {
struct alarm_tag *link;
int seconds;
time_t time; /* seconds from EPOCH */
char message[64];
} alarm_t;
pthread_mutex_t alarm_mutex = PTHREAD_MUTEX_INITIALIZER;
alarm_t *alarm_list = NULL;
void *alarm_thread (void *arg)
{
alarm_t *alarm;
int sleep_time;
time_t now;
int status;
while (1) {
status = pthread_mutex_lock (&alarm_mutex);
if (status != 0)
err_abort (status, "Lock mutex");
alarm = alarm_list;
/*
* If the alarm list is empty, wait for one second. This
* allows the main thread to run, and read another
* command. If the list is not empty, remove the first
* item. Compute the number of seconds to wait -- if the
* result is less than 0 (the time has passed), then set
* the sleep_time to 0.
*/
if (alarm == NULL)
sleep_time = 1;
else {
alarm_list = alarm->link;
now = time (NULL);
if (alarm->time <= now)
sleep_time = 0;
else
sleep_time = alarm->time - now;
#ifdef DEBUG
printf ("[waiting: %d(%d)\"%s\"]\n", alarm->time,
sleep_time, alarm->message);
#endif
}
/*
* Unlock the mutex before waiting, so that the main
* thread can lock it to insert a new alarm request. If
* the sleep_time is 0, then call sched_yield, giving
* the main thread a chance to run if it has been
* readied by user input, without delaying the message
* if there's no input.
*/
status = pthread_mutex_unlock (&alarm_mutex);
if (status != 0)
err_abort (status, "Unlock mutex");
if (sleep_time > 0)
sleep (sleep_time);
else
sched_yield ();
/*
* If a timer expired, print the message and free the
* structure.
*/
if (alarm != NULL) {
printf ("(%d) %s\n", alarm->seconds, alarm->message);
free (alarm);
}
}
}
int main (int argc, char *argv[])
{
int status;
char line[128];
alarm_t *alarm, **last, *next;
pthread_t thread;
status = pthread_create (
&thread, NULL, alarm_thread, NULL);
if (status != 0)
err_abort (status, "Create alarm thread");
while (1) {
printf ("alarm> ");
if (fgets (line, sizeof (line), stdin) == NULL) exit (0);
if (strlen (line) <= 1) continue;
alarm = (alarm_t*)malloc (sizeof (alarm_t));
if (alarm == NULL)
errno_abort ("Allocate alarm");
/*
* Parse input line into seconds (%d) and a message
* (%64[^\n]), consisting of up to 64 characters
* separated from the seconds by whitespace.
*/
if (sscanf (line, "%d %64[^\n]",
&alarm->seconds, alarm->message) < 2) {
fprintf (stderr, "Bad command\n");
free (alarm);
} else {
status = pthread_mutex_lock (&alarm_mutex);
if (status != 0)
err_abort (status, "Lock mutex");
alarm->time = time (NULL) + alarm->seconds;
/*
* Insert the new alarm into the list of alarms,
* sorted by expiration time.
*/
last = &alarm_list;
next = *last;
while (next != NULL) {
if (next->time >= alarm->time) {
alarm->link = next;
*last = alarm;
break;
}
last = &next->link;
next = next->link;
}
/*
* If we reached the end of the list, insert the new
* alarm there. ("next" is NULL, and "last" points
* to the link field of the last item, or to the
* list header).
*/
if (next == NULL) {
*last = alarm;
alarm->link = NULL;
}
#ifdef DEBUG
printf ("[list: ");
for (next = alarm_list; next != NULL; next = next->link)
printf ("%d(%d)[\"%s\"] ", next->time,
next->time - time (NULL), next->message);
printf ("]\n");
#endif
status = pthread_mutex_unlock (&alarm_mutex);
if (status != 0)
err_abort (status, "Unlock mutex");
}
}
}
Hi this is my code, can anyone tell me because the mutex is not declared in the struct. So when the mutex locks and unlocks, what data is actually being changed can someone enlighten me?
where is this set of data that is being protected by the mutex?
The mutex object is alarm_mutex. The data "protected" by it doesn't have to be explicitely mentioned in the code; as in, there doesn't need to be a semantic connection. A mutex is a low-level threading primitive and as such the user needs to build his own logic around that. In your case, that one place in memory is used to block other parts of your code, those accessing actual data, from interfering.
Think about it this way: std::atomic<int> x; expresses the atomicity of operations on it. int x; mutex m; requires every piece of the code accessing x to properly look at m to ensure the correctness of the program. This low-level acess is what we're looking at in your example.
pthread_mutex_t alarm_mutex = PTHREAD_MUTEX_INITIALIZER; creates a shared mutex object, used for locking/unlocking.
pthread_mutex_lock locks the mutex as soon as it is available. It becomes unavailable for all other threads after this line is executed.
pthread_mutex_unlock unlocks the mutex, making it available again for other threads (unlocks the pthread_mutex_lock of another thread)
The mutex doesn't know what it is protecting. It is the programmer's job to know that and only change the data that it is protecting while the mutex is locked.
In this specific case it seems that the alarm list is the data being locked.

Thread - synchronizing and sleeping thread refuses to wake up (LINUX)

I'm developing an application For OpenSUSE 12.1.
This application has a main thread and other two threads running instances of the same functions. I'm trying to use pthread_barrier to synchronize all threads but I'm having some problems:
When I put the derived threads to sleep, they will never wake up for some reason.
(in the case when I remove the sleep from the other threads, throwing CPU usage to the sky) In some point all the threads reach pthread_barrier_wait() but none of them continues execution after that.
Here's some pseudo code trying to illustrate what I'm doing.
pthread_barrier_t barrier;
int main(void)
{
pthread_barrier_init(&barrier, NULL , 3);
pthread_create(&thread_id1, NULL,&thread_func, (void*) &params1);
pthread_create(&thread_id2v, NULL,&thread_func, (void*) &params2);
while(1)
{
doSomeWork();
nanosleep(&t1, &t2);
pthread_barrier_wait(&barrier);
doSomeMoreWork();
}
}
void *thread_func(void *params)
{
init_thread(params);
while(1)
{
nanosleep(&t1, &t2);
doAnotherWork();
pthread_barrier_wait(&barrier);
}
}
I don't think it has to do with the barrier as you've presented it in the pseudocode. I'm making an assumption that your glibc is approximately the same as my machine. I compiled roughly your pseudo-code and it's running like I expect: the threads do some work, the main thread does some work, they all reach the barrier and then loop.
Can you comment more about any other synchronization methods or what the work functions are?
This is the the example program I'm using:
#include <pthread.h>
#include <stdio.h>
#include <time.h>
struct timespec req = {1,0}; //{.tv_sec = 1, .tv_nsec = 0};
struct timespec rem = {0,0}; //{.tv_sec = 0, .tv_nsec = 0};
pthread_barrier_t barrier;
void *thread_func(void *params) {
long int name;
name = (long int)params;
while(1) {
printf("This is thread %ld\n", name);
nanosleep(&req, &rem);
pthread_barrier_wait(&barrier);
printf("More work from %ld\n", name);
}
}
int main(void)
{
pthread_t th1, th2;
pthread_barrier_init(&barrier, NULL , 3);
pthread_create(&th1, NULL, &thread_func, (void*)1);
pthread_create(&th2, NULL, &thread_func, (void*)2);
while(1) {
nanosleep(&req, &rem);
printf("This is the parent\n\n");
pthread_barrier_wait(&barrier);
}
return 0;
}
I would suggest to use condition variables in order to synchronize threads.
Here some website about how to do it i hope it helps.
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html