Problem with multi-threading and waiting on events - c++

I have a problem with my code:
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <windows.h>
#include <string.h>
#include <math.h>
HANDLE event;
HANDLE mutex;
int runner = 0;
DWORD WINAPI thread_fun(LPVOID lpParam) {
int* data = (int*)lpParam;
for (int j = 0; j < 4; j++) { //this loop necessary in order to reproduce the issue
if ((data[2] + 1) == data[0]) { // if it is last thread
while (1) {
WaitForSingleObject(mutex, INFINITE);
if (runner == data[0] - 1) { // if all other thread reach event break
ReleaseMutex(mutex);
break;
}
printf("Run:%d\n", runner);
ReleaseMutex(mutex);
Sleep(10);
}
printf("Check Done:<<%d>>\n", data[2]);
runner = 0;
PulseEvent(event); // let all other threads continue
}
else { // if it is not last thread
WaitForSingleObject(mutex, INFINITE);
runner++;
ReleaseMutex(mutex);
printf("Wait:<<%d>>\n", data[2]);
WaitForSingleObject(event, INFINITE); // wait till all other threads reach this stage
printf("Exit:<<%d>>\n", data[2]);
}
}
return 0;
}
int main()
{
event = CreateEvent(NULL, TRUE, FALSE, NULL);
mutex = CreateMutex(NULL, FALSE, NULL);
SetEvent(event);
int data[3] = {2,8}; //0 amount of threads //1 amount of numbers
HANDLE t[10000];
int ThreadData[1000][3];
for (int i = 0; i < data[0]; i++) {
memcpy(ThreadData[i], data, sizeof(int) * 2); // copy amount of threads and amount of numbers to the threads data
ThreadData[i][2] = i; // creat threads id
LPVOID ThreadsData = (LPVOID)(&ThreadData[i]);
t[i] = CreateThread(0, 0, thread_fun, ThreadsData, 0, NULL);
if (t[i] == NULL)return 0;
}
while (1) {
DWORD res = WaitForMultipleObjects(data[0], t, true, 1000);
if (res != WAIT_TIMEOUT) break;
}
for (int i = 0; i < data[0]; i++)CloseHandle(t[i]); // close all threads
CloseHandle(event); // close event
CloseHandle(mutex); //close mutex
printf("Done");
}
The main idea is to wait until all threads except one reach the event and wait there, meanwhile the last thread must release them from waiting.
But the code doesn't work reliably. 1 in 10 times, it ends correctly, and 9 times just gets stuck in while(1). In different tries, printf in while (printf("Run:%d\n", runner);) prints different numbers of runners (0 and 3).
What can be the problem?

As we found out in the comments section, the problem was that although the event was created in the initial state of being non-signalled
event = CreateEvent(NULL, TRUE, FALSE, NULL);
it was being set to the signalled state immediately afterwards:
SetEvent(event);
Due to this, at least on the first iteration of the loop, when j == 0, the first worker thread wouldn't wait for the second worker thread, which caused a race condition.
Also, the following issues with your code are worth mentioning (although these issues were not the reason for your problem):
According to the Microsoft documentation on PulseEvent, that function should not be used, as it can be unreliable and is mainly provided for backward-compatibility. According to the documentation, you should use condition variables instead.
In your function thread_fun, the last thread is locking and releasing the mutex in a loop. This can be bad, because mutexes are not guaranteed to be fair and it is possible that this will cause other threads to never be able to acquire the mutex. Although this possibility is mitigated by you calling Sleep(10); once in every loop iteration, it is still not the ideal solution. A better solution would be to use a condition variable, so that the thread only checks for changes of the variable runner when another thread actually signals a possible change. Such a solution would also be better for performance reasons.

Related

pthread_cond_wait sometimes will not receive the signal

I have a weird problem with pthread_cond_wait and pthread_cond_signal. I have arranged a series of threads. They are all in sleep state when started. A wake up function will signal these threads, do some work, and wait for the results.
In the setup below, td is thread data, containing the mutex and conditions, and th is an array containing the pointer to the threads:
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_cond_init(&td[i].cond, NULL);
pthread_mutex_init(&td[i].cond_mutex, NULL);
pthread_mutex_init(&td[i].work_mutex, NULL);
pthread_mutex_lock(&td[i].cond_mutex);
pthread_mutex_lock(&td[i].work_mutex);
pthread_create(&th[i], NULL, thread_worker, (void *)&td[i]);
}
Thread worker is like this:
void*
thread_worker(void* data)
{
THREAD_DATA *td = (THREAD_DATA *)data;
while (1) {
pthread_cond_wait(&td->cond, &td->cond_mutex); // marker
// do work ...
pthread_mutex_unlock(&td->work_mutex);
}
pthread_exit(NULL);
}
This job function is supposed to wake up all the threads, do the job, and wait for them to finish:
void
job()
{
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_cond_signal(&td[i].cond);
}
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_mutex_lock(&td[i].work_mutex); // block until the work is done
}
}
In some rare situations (1 out of 1000 runs maybe), the above setup will encounter a freeze. When that happens, the 'marker' line in thread_worker will not be signaled by pthread_cond_signal, it just kept on waiting. It's very rare but it happens from time to time. I've produced numerous log messages, and I verified that pthread_cond_wait is always called before pthread_cond_signal. What am I doing wrong here?
There is nothing there that forces the pthread_cond_wait() to be called before the pthread_cond_signal(). Despite what you say about logging, it's entirely possible for the logged lines to be out-of-sequence with what really happened.
You aren't using mutexes and condition variables correctly: mutexes should only be unlocked by the same thread that locked them, and condition variables should be paired with a test over some shared state (called a predicate). The shared state is supposed to be protected by the mutex that is passed to pthread_cond_wait().
For example, your example can be reworked to correctly use mutexes and condition variables. First, add an int work_status to the THREAD_DATA structure, where 0 indicates that the thread is waiting for work, 1 indicates that work is available and 2 indicates that the work is complete.
You don't appear to need two mutexes in each THREAD_DATA, and you don't want to lock the mutex in the main thread when you're setting it up:
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_cond_init(&td[i].cond, NULL);
pthread_mutex_init(&td[i].cond_mutex, NULL);
td[i].work_status = 0;
pthread_create(&th[i], NULL, thread_worker, (void *)&td[i]);
}
Have the threads wait on work_status using the condition variable:
void*
thread_worker(void* data)
{
THREAD_DATA *td = (THREAD_DATA *)data;
while (1) {
/* Wait for work to be available */
pthread_mutex_lock(&td->cond_mutex);
while (td->work_status != 1)
pthread_cond_wait(&td->cond, &td->cond_mutex);
pthread_mutex_unlock(&td->cond_mutex);
// do work ...
/* Tell main thread that the work has finished */
pthread_mutex_lock(&td->cond_mutex);
td->work_status = 2;
pthread_cond_signal(&td->cond);
pthread_mutex_unlock(&td->cond_mutex);
}
pthread_exit(NULL);
}
...and set and wait on work_status as appropriate in job():
void
job()
{
/* Tell threads that work is available */
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_mutex_lock(&td[i].cond_mutex);
td[i].work_status = 1;
pthread_cond_signal(&td[i].cond);
pthread_mutex_unlock(&td[i].cond_mutex);
}
/* Wait for threads to signal work complete */
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_mutex_lock(&td[i].cond_mutex);
while (td[i].work_status != 2)
pthread_cond_wait(&td[i].cond, &td[i].cond_mutex);
pthread_mutex_unlock(&td[i].cond_mutex);
}
}
Some check lists:
1) Do you lock the mutex td->cond_mutex before waiting on the cond variable? Otherwise, it's undefined.
2) Do you check predicate after pthread_cond_wait() returns? Typical usage is
while(!flag) pthread_cond_wait(&cv, &mutex); //waits on flag
which is not what you have. This is to protect against spurious wake-ups and also ensure the predicate hasn't changed in the meantime.
3) pthread_cond_signal() is guaranteed to wake up at least one thread. You may want to use pthread_cond_broadcast() if there are multiple threads waiting on the same condition variable.
4) If no thread is waiting on a conditional variable then pthread_cond_signal() or pthread_cond_broadcast() has no effect.

Waiting until another process locks and then unlocks a Win32 mutex

I am trying to tell when a producer process accesses a shared windows mutex. After this happens, I need to lock that same mutex and process the associated data. Is there a build in way in Windows to do this, short of a ridiculous loop?
I know the result of this is doable through creating a custom Windows event in the producer process, but I want to avoid changing this programs code as much as possible.
What I believe will work (in a ridiculously inefficient way) would be this (NOTE: this is not my real code, I know there are like 10 different things very wrong with this; I want to avoid doing anything like this):
#include <Windows.h>
int main() {
HANDLE h = CreateMutex(NULL, 0, "name");
if(!h) return -1;
int locked = 0;
while(true) {
if(locked) {
//can assume it wont be locked longer than a second, but even if it does should work fine
if(WaitForSingleObject(h, 1000) == WAIT_OBJECT_0) {
// do processing...
locked = 0;
ReleaseMutex(h);
}
// oh god this is ugly, and wastes so much CPU...
} else if(!(locked = WaitForSingleObject(h, 0) == WAIT_TIMEOUT)) {
ReleaseMutex(h);
}
}
return 0;
}
If there is an easier way with C++ for whatever reason, my code is actually that. This example was just easier to construct in C.
You will not be able to avoid changing the producer if efficient sharing is needed. Your design is fundamentally flawed for that.
A producer needs to be able to signal a consumer when data is ready to be consumed, and to make sure it does not alter the data while it is busy being consumed. You cannot do that with a single mutex alone.
The best way is to have the producer set an event when data is ready, and have the consumer reset the event when the data has been consumed. Use the mutex only to sync access to the data, not to signal the data's readiness.
#include <Windows.h>
int main()
{
HANDLE readyEvent = CreateEvent(NULL, TRUE, FALSE, "ready");
if (!readyEvent) return -1;
HANDLE mutex = CreateMutex(NULL, FALSE, "name");
if (!mutex) return -1;
while(true)
{
if (WaitForSingleObject(readyEvent, 1000) == WAIT_OBJECT_0)
{
if (WaitForSingleObject(mutex, 1000) == WAIT_OBJECT_0)
{
// process as needed...
ResetEvent(readyEvent);
ReleaseMutex(mutex);
}
}
}
return 0;
}
If you can't change the producer to use an event, then at least add a flag to the data itself. The producer can lock the mutex, update the data and flag, and unlock the mutex. Consumers will then have to periodically lock the mutex, check the flag and read the new data if the flag is set, reset the flag, and unlock the mutex.
#include <Windows.h>
int main()
{
HANDLE mutex = CreateMutex(NULL, FALSE, "name");
if (!mutex) return -1;
while(true)
{
if (WaitForSingleObject(mutex, 1000) == WAIT_OBJECT_0)
{
if (ready)
{
// process as needed...
ready = false;
}
ReleaseMutex(mutex);
}
}
return 0;
}
So either way, your logic will have to be tweaked in both the producer and consumer.
Otherwise, if you can't change the producer at all, then you have no choice but to change the consumer alone to simply check the data for changes peridiodically:
#include <Windows.h>
int main()
{
HANDLE mutex = CreateMutex(NULL, 0, "name");
if (!mutex) return -1;
while(true)
{
if (WaitForSingleObject(mutex, 1000) == WAIT_OBJECT_0)
{
// check data for changes
// process new data as needed
// cache results for next time...
ReleaseMutex(mutex);
}
}
return 0;
}
Tricky. I'm going to answer the underlying question: when is the memory written?
This can be observed via a four step solution:
Inject a DLL in the watched process
Add a vectored exception handler for STATUS_GUARD_PAGE_VIOLATION
Set the guard page bit on the 2 MB memory range (finding it could be a challenge)
From the vectored exception handler, inform your process and re-establish the guard bit (it's one-shot)
You may need only a single guard page if the image is always fully rewritten.

Windows creates events on thread shutdown

I am attempting to add handle leak detection to the unit test framework on my code. (Windows 7, x64 VS2010)
I basically call GetProcessHandleCount() before and after each unit test.
This works fine except when threads are created/destroyed as part of the test.
It seems that windows is occasionally creating an 1-3 events on thread shutdown. Running the same test in a loop does not increase the event creation count. (eg running the test 5000 times in a loop only results in 1-3 extra events being created)
I do not create events manually in my own code.
It seems that this is similar to this problem:
boost::thread causing small event handle leak?
but I am doing manual thread creation/shutdown.
I followed this code:
http://blogs.technet.com/b/yongrhee/archive/2011/12/19/how-to-troubleshoot-a-handle-leak.aspx
And got this callstack from WinDbg:
Outstanding handles opened since the previous snapshot:
--------------------------------------
Handle = 0x0000000000000108 - OPEN
Thread ID = 0x00000000000030dc, Process ID = 0x0000000000000c90
0x000000007715173a: ntdll!NtCreateEvent+0x000000000000000a
0x0000000077133f26: ntdll!RtlpCreateCriticalSectionSem+0x0000000000000026
0x0000000077133ee3: ntdll!RtlpWaitOnCriticalSection+0x000000000000014e
0x000000007714e40b: ntdll!RtlEnterCriticalSection+0x00000000000000d1
0x0000000077146ad2: ntdll!LdrShutdownThread+0x0000000000000072
0x0000000077146978: ntdll!RtlExitUserThread+0x0000000000000038
0x0000000076ef59f5: kernel32!BaseThreadInitThunk+0x0000000000000015
0x000000007712c541: ntdll!RtlUserThreadStart+0x000000000000001d
--------------------------------------
As you can see, this is an event created on the thread shutdown.
Is there a better way of doing this handle leak detection in unit tests? My only current options are:
Forget trying to do this handle leak detection
Spin up some dummy tasks to attempt to create these spurious events.
Allow some small tolerance value in leaks and run each test 100's of times (so actual leaks will be a large number)
Get the handle count excluding events (difficult amount of code)
I have also tried switching to using std::thread in VS2013, but it seems that it creates a lot of background threads and handles when used. (makes the count difference much worse)
Here is a self contained example where 99+% of the time (on my computer) an event is created behind the scenes. (handle count is different). Putting the startup/shutdown code in a loop indicates it does not directly leak, but accumulates the occasional events:
#include "stdio.h"
#include <Windows.h>
#include <process.h>
#define THREADCOUNT 3
static HANDLE s_semCommand, s_semRender;
static unsigned __stdcall ExecutiveThread(void *)
{
WaitForSingleObject(s_semCommand, INFINITE);
ReleaseSemaphore(s_semRender, THREADCOUNT - 1, NULL);
return 0;
}
static unsigned __stdcall WorkerThread(void *)
{
WaitForSingleObject(s_semRender, INFINITE);
return 0;
}
int main(int argc, char* argv[])
{
DWORD oldHandleCount = 0;
GetProcessHandleCount(GetCurrentProcess(), &oldHandleCount);
s_semCommand = CreateSemaphoreA(NULL, 0, 0xFFFF, NULL);
s_semRender = CreateSemaphoreA(NULL, 0, 0xFFFF, NULL);
// Spool threads up
HANDLE threads[THREADCOUNT];
for (int i = 0; i < THREADCOUNT; i++)
{
threads[i] = (HANDLE)_beginthreadex(NULL, 4096, (i==0) ? ExecutiveThread : WorkerThread, NULL, 0, NULL);
}
// Signal shutdown - Wait for threads and close semaphores
ReleaseSemaphore(s_semCommand, 1, NULL);
for (int i = 0; i < THREADCOUNT; i++)
{
WaitForSingleObject(threads[i], INFINITE);
CloseHandle(threads[i]);
}
CloseHandle(s_semCommand);
CloseHandle(s_semRender);
DWORD newHandleCount = 0;
GetProcessHandleCount(GetCurrentProcess(), &newHandleCount);
printf("Handle %d -> %d", oldHandleCount, newHandleCount);
return 0;
}

c++ parallel programming bug

I am trying to do some parallel programming. I have been following a guide and I have this code:
void main()
{
CPUs = GetNumCPUs();
HANDLE *threads = new HANDLE[CPUs];
queues = new queue<functionPointer>[CPUs];
DWORD_PTR threadID = 0;
DWORD_PTR threadCore = 1 << 0;
threads[0] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)loop, (LPVOID)&queues, NULL, &threadID);
SetThreadAffinityMask(threads[0], threadCore);
for (DWORD_PTR i = 1; i < CPUs; i++)
{
threadID = i;
threadCore = 1 << i;
threads[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)Coroutine, (LPVOID)&queues[i], NULL, &threadID);
SetThreadAffinityMask(threads[i], threadCore);
wprintf(L"Creating Thread %d (0x%08x) Assigning to CPU 0x%08x\r\n", i, (LONG_PTR)threads[i], threadCore);
}
while(true) Sleep(1000);
}
The function the threads is just adding 1 to a variable. I have seen that this code is not faster than the code without the threads. I think that I did something wrong in it and it is not multicore. What is it?
Here is the guild: http://www.dreamincode.net/forums/topic/52380-multi-threading-on-multi-processors/
Adding 1 the a varible was an example. I have a very complicated program that is taking 8-9 secondes to finish. That why I am need the multi processing.
If you are not running your code on a multi-processor/multi-core system then you will not see any performance gain.
If you are but your threads are just doing simple processing (adding 1 to a variable?) it may cost more processor cycles to spawn/shutdown the thread than it does for the thread to do its work. In which case you'd be better off doing all the work in a single thread.

When is it more appropriate to use a pthread barrier instead of a condition wait and broadcast?

I am coding a telemetry system in C++ and have been having some difficulty syncing certain threads with the standard pthread_cond_timedwait and pthread_cond_broadcast.
The problem was that I needed some way for the function that was doing the broadcasting to know if another thread acted on the broadcast.
After some hearty searching I decided I might try using a barrier for the two threads instead. However, I still wanted the timeout functionality of the pthread_cond_timedwait.
Here is basically what I came up with: (However it feels excessive)
Listen Function: Checks for a period of milliseconds to see if an event is currently being triggered.
bool listen(uint8_t eventID, int timeout)
{
int waitCount = 0;
while(waitCount <= timeout)
{
globalEventID = eventID;
if(getUpdateFlag(eventID) == true)
{
pthread_barrier_wait(&barEvent);
return true;
}
threadSleep(); //blocks for 1 millisecond
++waitCount;
}
return false;
}
Trigger Function: Triggers an event for a period of milliseconds by setting an update flag for the triggering period
bool trigger(uint8_t eventID, int timeout)
int waitCount = 0;
while(waitCount <= timeout)
{
setUpdateFlag(eventID, true); //Sets the update flag to true
if(globalEventID == eventID)
{
pthread_barrier_wait(&barEvent);
return true;
}
threadSleep(); //blocks for 1 millisecond
++waitCount;
}
setUpdateFlag(eventID, false);
return false;
}
My questions: Is another way to share information with the broadcaster, or are barriers really the only efficient way? Also, is there another way of getting timeout functionality with barriers?
Based on your described problem:
Specifically, I am trying to let thread1 know that the message it is
waiting for has been parsed and stored in a global list by thread2,
and that thread2 can continue parsing and storing because thread1 will
now copy that message from the list ensuring that thread2 can
overwrite that message with a new version and not disrupt the
operations of thread1.
It sounds like your problem can be solved by having both threads alternately wait on the condition variable. Eg. in thread 1:
pthread_mutex_lock(&mutex);
while (!message_present)
pthread_cond_wait(&cond, &mutex);
copy_message();
message_present = 0;
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mutex);
process_message();
and in thread 2:
parse_message();
pthread_mutex_lock(&mutex);
while (message_present)
pthread_cond_wait(&cond, &mutex);
store_message();
message_present = 1;
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mutex);