I have a multithreaded application that i am writing using Boost Thread locking.
In this case, there is one writer, and multiple readers. As I have it now, the writer seems to wait for all the readers to complete before it can write again.
What i want, is to give the writer priority, so that if it wants to write again, it does so, no matter what. the readers work around it.
For example:
Now:
Writer;
reader1;
reader2;
reader3;
reader4;
What i would like, is:
Writer;
reader1;
reader2;
Writer(if ready);
reader3;
reader4;
Is this possible? My code is replicated below:
typedef boost::shared_mutex Lock;
typedef boost::unique_lock< Lock > WriteLock;
typedef boost::shared_lock< Lock > ReadLock;
Lock frameLock;
cv::Mat currentFrame;
bool frameOk;
void writer()
{
while (true)
{
cv::Mat frame;
cv::Mat src = cv::imread("C:\\grace_17.0001.jpg");
cv::resize(src, frame, cv::Size(src.cols / 4, src.rows / 4));
int64 t0 = cv::getTickCount();
WriteLock w_lock(frameLock);
frame.copyTo(currentFrame);
frameLock.unlock();
frameOk = true; // tells read we have at least one frame
int64 t1 = cv::getTickCount();
double secs = (t1 - t0) / cv::getTickFrequency();
std::cout << "wait time WRITE: " << secs * 1000 << std::endl;
}
}
void readerTwo(int wait)
{
while (true)
{
if (frameOk) // if first frame is written
{
static cv::Mat readframe;
int64 t0 = cv::getTickCount();
//gets frame
ReadLock r_lockz(frameLock);
currentFrame.copyTo(readframe);
r_lockz.unlock();
std::cout << "READ: " << std::to_string(wait)<< std::endl;
cv::imshow(std::to_string(wait), readframe);
cv::waitKey(1);
std::this_thread::sleep_for(std::chrono::milliseconds(20));
}
}
}
void main()
{
const int readerthreadcount = 50;
std::vector<boost::thread*> readerthread;
boost::thread* wThread = new boost::thread(writer);
for (int i = 0; i<readerthreadcount; i++) {
ostringstream id; id << "reader" << i + 1;
readerthread.push_back(new boost::thread(readerTwo, (i)));
}
wThread->join(); delete wThread;
for (int i = 0; i<readerthreadcount; i++) {
readerthread[i]->join(); delete readerthread[i];
}
}
Thank you.
Writer starvation is a typical problem with reader/writer locks.
Reader/writer locks unfortunately have to be tuned per-algorithm and per-architecture. (At least until something smarter is developed.)
Is this possible?
Yes, it's possible. With condition variables. Have a count waitingWriters. When a writer comes in, it acquires the mutex, increments waitingWriters, then waits on condition readerCount == 0. When a reader thread ends, it acquires the mutex, decrements readerCount, and signals the writer condition if readerCount == 0. When a reader thread comes in, it acquires the mutex. If waitingWriters == 0, increment readerCount and release the mutex. Otherwise wait on condition waitingWriters == 0. When a writer thread finishes, it acquires the mutex. If waitingWriters == 0, it signals the reader condition. Otherwise, it signals the next writer.
Note that this algorithm I just gave you:
Now prioritizes writes over reads. It is the other extreme in that
reads can be starved instead of writes.
Only uses 1 mutex. (not a
reader mutex & writer mutex)
Wouldn't be suitable for quick reads i.e. reads whose read operation is shorter than one scheduling timeslice. For that you would want to use spinlocks (check out the Big Reader)
The tuning depends on many factors, the most important of which are the ratio of reader vs writer threads and how long the critical sections are.
Here is my higly efficient write-prioritized shared mutex. In optimal cases, it needs only one atomic exchange for locking and unlocking - in contrast to other implementations which need two atomic exchanges.
#pragma once
#include <cstdint>
#include <cassert>
#include <thread>
#include <new>
#include <atomic>
#include "semaphore.h"
static_assert(std::atomic<std::uint64_t>::is_always_lock_free, "std::uint64_t must be lock-free");
class alignas(std::hardware_constructive_interference_size) wprio_shared_mutex
{
public:
wprio_shared_mutex();
wprio_shared_mutex( wprio_shared_mutex const & ) = delete;
~wprio_shared_mutex();
void lock_shared();
void unlock_shared();
void shared_to_write();
void lock_writer();
void write_to_shared();
void unlock_writer();
bool we_are_writer();
private:
std::atomic<std::uint64_t> m_atomic; // bit 0 - 20: readers
// bit 21 - 41: waiting readers
// bit 42 - 62: waiting writers
// bit 61: writer-flag
std::thread::id m_writerId;
std::uint32_t m_writerRecursionCount;
semaphore m_releaseReadersSem,
m_releaseWriterSem;
static unsigned const WAITING_READERS_BASE = 21,
WAITING_WRITERS_BASE = 42,
WRITER_FLAG_BASE = 63;
static std::uint64_t const MASK21 = 0x1FFFFFu;
static std::uint64_t const READERS_MASK = MASK21,
WAITING_READERS_MASK = MASK21 << WAITING_READERS_BASE,
WAITING_WRITERS_MASK = MASK21 << WAITING_WRITERS_BASE,
WRITER_FLAG_MASK = (std::uint64_t)1 << WRITER_FLAG_BASE;
static std::uint64_t const READER_VALUE = (std::uint64_t)1,
WAITING_READERS_VALUE = (std::uint64_t)1 << WAITING_READERS_BASE,
WAITING_WRITERS_VALUE = (std::uint64_t)1 << WAITING_WRITERS_BASE;
static bool check( std::uint64_t flags );
};
inline
bool wprio_shared_mutex::check( std::uint64_t flags )
{
unsigned readers = (unsigned)(flags & MASK21),
waitingReaders = (unsigned)((flags >> WAITING_READERS_BASE) & MASK21),
waitingWriters = (unsigned)((flags >> WAITING_WRITERS_BASE) & MASK21),
writerFlag = (unsigned)((flags >> WRITER_FLAG_BASE) & 1);
if( readers && (waitingReaders || writerFlag) )
return false;
if( waitingReaders && (readers || !writerFlag) )
return false;
if( waitingWriters && !(writerFlag || readers) )
return false;
if( writerFlag && readers )
return false;
return true;
}
wprio_shared_mutex::wprio_shared_mutex()
{
m_atomic.store( 0, std::memory_order_relaxed );
}
wprio_shared_mutex::~wprio_shared_mutex()
{
assert(m_atomic == 0);
}
void wprio_shared_mutex::lock_shared()
{
using namespace std;
for( uint64_t cmp = m_atomic.load( std::memory_order_relaxed ); ; )
{
assert(check( cmp ));
if( (cmp & WRITER_FLAG_MASK) == 0 )
[[likely]]
{
if( m_atomic.compare_exchange_weak( cmp, cmp + READER_VALUE, memory_order_acquire, memory_order_relaxed ) )
[[likely]]
return;
}
else
if( m_atomic.compare_exchange_weak( cmp, cmp + WAITING_READERS_VALUE, memory_order_relaxed, memory_order_relaxed ) )
[[likely]]
{
m_releaseReadersSem.forced_wait();
return;
}
}
}
void wprio_shared_mutex::unlock_shared()
{
using namespace std;
for( uint64_t cmp = m_atomic.load( std::memory_order_relaxed ); ; )
{
assert(check( cmp ));
assert((cmp & READERS_MASK) >= READER_VALUE);
if( (cmp & READERS_MASK) != READER_VALUE || (cmp & WAITING_WRITERS_MASK) == 0 )
[[likely]]
{
if( m_atomic.compare_exchange_weak( cmp, cmp - READER_VALUE, memory_order_relaxed, memory_order_relaxed ) )
[[likely]]
return;
}
else
{
assert(!(cmp & WRITER_FLAG_MASK));
if( m_atomic.compare_exchange_weak( cmp, (cmp - READER_VALUE - WAITING_WRITERS_VALUE) | WRITER_FLAG_MASK, memory_order_relaxed, memory_order_relaxed ) )
[[likely]]
{
m_releaseWriterSem.forced_release( 1 );
return;
}
}
}
}
void wprio_shared_mutex::shared_to_write()
{
using namespace std;
for( uint64_t cmp = m_atomic.load( std::memory_order_relaxed ); ; )
{
assert(check( cmp ));
assert((cmp & READERS_MASK) >= READER_VALUE);
if( (cmp & READERS_MASK) == READER_VALUE )
[[likely]]
{
assert(!(cmp & WRITER_FLAG_MASK));
if( m_atomic.compare_exchange_weak( cmp, (cmp - READER_VALUE) | WRITER_FLAG_MASK, memory_order_acquire, memory_order_relaxed ) )
[[likely]]
{
m_writerId = this_thread::get_id();
m_writerRecursionCount = 0;
return;
}
}
else
{
assert((cmp & READERS_MASK) > READER_VALUE);
if( m_atomic.compare_exchange_weak( cmp, cmp - READER_VALUE + WAITING_WRITERS_VALUE, memory_order_relaxed, memory_order_relaxed ) )
[[likely]]
{
m_releaseWriterSem.forced_wait();
m_writerId = this_thread::get_id();
m_writerRecursionCount = 0;
return;
}
}
}
}
void wprio_shared_mutex::lock_writer()
{
using namespace std;
uint64_t cmp = m_atomic.load( std::memory_order_acquire );
if( (cmp & WRITER_FLAG_MASK) && m_writerId == this_thread::get_id() )
{
++m_writerRecursionCount;
return;
}
for( ; ; )
{
assert(check( cmp ));
if( (cmp & (WRITER_FLAG_MASK | READERS_MASK)) == 0 )
[[likely]
{
if( m_atomic.compare_exchange_weak( cmp, cmp | WRITER_FLAG_MASK, memory_order_acquire, memory_order_relaxed ) )
[[likely]
{
m_writerId = this_thread::get_id();
m_writerRecursionCount = 0;
return;
}
}
else
if( m_atomic.compare_exchange_weak( cmp, cmp + WAITING_WRITERS_VALUE, memory_order_relaxed, memory_order_relaxed ) )
[[likely]]
{
m_releaseWriterSem.forced_wait();
m_writerId = this_thread::get_id();
m_writerRecursionCount = 0;
return;
}
}
}
void wprio_shared_mutex::unlock_writer()
{
using namespace std;
uint64_t cmp = m_atomic.load( std::memory_order_relaxed );
if( (cmp & WRITER_FLAG_MASK) && m_writerRecursionCount && m_writerId == this_thread::get_id() )
{
--m_writerRecursionCount;
return;
}
m_writerId = thread::id();
for( ; ; )
{
assert(cmp & WRITER_FLAG_MASK && !(cmp & READERS_MASK));
assert(check( cmp ));
if( (cmp & WAITING_WRITERS_MASK) != 0 )
[[unlikely]]
if( m_atomic.compare_exchange_weak( cmp, cmp - WAITING_WRITERS_VALUE, memory_order_release, memory_order_relaxed ) )
[[likely]]
{
m_releaseWriterSem.forced_release( 1 );
return;
}
else
continue;
if( (cmp & WAITING_READERS_MASK) != 0 )
[[unlikely]]
{
uint64_t wakeups = (cmp & WAITING_READERS_MASK) >> WAITING_READERS_BASE;
if( m_atomic.compare_exchange_weak( cmp, (cmp & ~WRITER_FLAG_MASK) - (cmp & WAITING_READERS_MASK) + wakeups, memory_order_release, memory_order_relaxed ) )
[[likely]]
{
m_releaseReadersSem.forced_release( (unsigned)wakeups );
return;
}
else
continue;
}
if( m_atomic.compare_exchange_weak( cmp, 0, memory_order_release, memory_order_relaxed ) )
[[likely]]
return;
}
}
bool wprio_shared_mutex::we_are_writer()
{
return (m_atomic.load( std::memory_order_relaxed ) & WRITER_FLAG_MASK) && m_writerId == std::this_thread::get_id();
}
The algorithm allows continuing readers, but as soon as a writer registers for writing, further readers are enqueued and the current readers are waited to be finished; and this is all done though a single 64 bit atomic value!
The code allows reader- as well as writer-recursion. But when you are reader multiple times, you shouldn't do shared_to_write(); you'll get a deadlock then. The ability to have recursion comes by nature for shared reading and has no extra-overhead. But for writing there's an additional recursion-counter as well as a thread::id.
I'm not going to include my semaphore-class here as it should self-explanatory. With my semaphore-class I have forced_wait and forced_release; this are two functions which repeatedly do a wait or release if it fails.
The [[likely]]- and [[unlikely]]-tags are C++20 optimization-hints. You can remove them with earlier compilers. The we_are_writer-method checks if the current thread has write-ownership. This could be used f.e. for debugging-purposes with assert().
The shared-mutex is aligned to cachelines through the alignas()-directive. But the whole object itself may be larger than a cacheline because of the two semaphores at the end of the object. But the data for the short locking-path is at the header which fits into a cacheline. It shouldn't hurt if the semaphores at the end of the object don't fit into the same cacheline since sleepy locking is slow anyay.
The object isn't neither copyable, nor moveable because the semaphore might not be also. This might be f.e. because POSIX-semaphores rely on a non-copyable sem_t-datatype which might me directly embedded in a C++ semaphore-datatype and thereby make it non-copy or -moveable.
Related
I have multiple buffers being shared with multiple reader/writer threads, and different writers change the data as different manners.
For example, Writer1 merely appends new data, while Writer2 extends the size of buffer(re-alloc memory and move data).
If I put a single mutex to sync all the accesses to the data, the performance maybe not better, because most reader just need to read a single buffer, and most writer just need to write a little piece of data to a single buffer.
If I prepare one mutex for each buffer, the locking/unlocking relationship between threads will be more complicated.
Now I want to confirm a thing:
If a writer change the data only with a shared_lock on the mutex, whether the others would see dirty data with a unique_lock/shared_lock on same mutex?
I coded an experimental program as following, and it looks like no error, but I still dare not use it in product.
atomic_bool g_abShouldRun = true;
sem_t g_semDoIt1;
sem_t g_semDone1;
sem_t g_semDoIt2;
sem_t g_semDone2;
shared_mutex g_mutex;
int g_iX = 3, g_iY = 9, g_iR1 = 1, g_iR2 = 3;
void writer() {
std::srand( 8 );
while( g_abShouldRun ) {
sem_wait( &g_semDoIt1 );
while( rand() % 8 != 0 )
;
{
shared_lock<shared_mutex> lk( g_mutex );
g_iX *= 2;
g_iY *= 2;
}
sem_post( &g_semDone1 );
};
};
void reader() {
std::srand( 8 );
while( g_abShouldRun ) {
sem_wait( &g_semDoIt2 );
while( rand() % 8 != 0 )
;
{
unique_lock<shared_mutex> lk( g_mutex );
g_iR1 = g_iX;
g_iR2 = g_iY;
}
sem_post( &g_semDone2 );
};
};
int main( int argc, char** argv ) {
int iLasting = 10, iError = 0;
if( argc > 1 )
iLasting = atoi( argv[1] );
steady_clock::time_point tpEnd = steady_clock::now() + seconds( iLasting );
if( sem_init( &g_semDoIt1, 0, 0 ) || sem_init( &g_semDone2, 0, 0 ) ||
sem_init( &g_semDoIt2, 0, 0 ) || sem_init( &g_semDone2, 0, 0 ) ) {
cerr << "Failed to create semaphors." << endl;
return EXIT_FAILURE;
}
thread thd1( writer );
thread thd2( reader );
while( steady_clock::now() < tpEnd ) {
sem_post( &g_semDoIt1 );
sem_post( &g_semDoIt2 );
sem_wait( &g_semDone1 );
sem_wait( &g_semDone2 );
if( g_iR1 * 3 != g_iR2 )
++iError;
}
g_abShouldRun = false;
sem_post( &g_semDoIt1 );
sem_post( &g_semDoIt2 );
thd1.join();
thd2.join();
sem_destroy( &g_semDoIt1 );
sem_destroy( &g_semDoIt2 );
sem_destroy( &g_semDone1 );
sem_destroy( &g_semDone2 );
cout << "Error:" << iError << endl;
return EXIT_SUCCESS;
};
The following problems jump out at a quick look:
change the code to use unique_lock when writing;
change the code to use shared_lock when reading;
do not modify other common global variables when reading -- that will practically be writing, just in a different place;
how many { shared_mutexs, function using unique_lock, function using shared_lock } tuples you'll be using with multiple threads and multiple buffers, you'll need to figure out yourself -- but it'll be between 1 and the number of buffers.
As an educational exercise I'm implementing a thread pool using condition variables. A controller thread creates a pool of threads that wait on a signal (an atomic variable being set to a value above zero). When signaled the threads wake, perform their work, and when the last thread is done it signals the main thread to awaken. The controller thread blocks until the last thread is complete. The pool is then available for subsequent re-use.
Every now and then I was getting a timeout on the controller thread waiting for the worker to signal completion (likely because of a race condition when decrementing the active work counter), so in an attempt to solidify the pool I replaced the "wait(lck)" form of the condition variable's wait method with "wait(lck, predicate)". Since doing this, the behaviour of the thread pool is such that it seems to permit decrementing of the active work counter below 0 (which is the condition for reawakening the controller thread) - I have a race condition. I've read countless articles on atomic variables, synchronisation, memory ordering, spurious and lost wakeups on stackoverflow and various other sites, have incorporated what I've learnt to the best of my ability, and still cannot for the life of me work out why the way I've coded the predicated wait just does not work. The counter should only ever be as high as the number of threads in the pool (say, 8) and as low as zero. I've started losing faith in myself - it just shouldn't be this hard to do something fundamentally simple. There is clearly something else I need to learn here :)
Considering of course that there was a race condition I ensured that the two variables that drive the awakening and termination of the pool are both atomic, and that both are only ever changed while protected with a unique_lock. Specifically, I made sure that when a request to the pool was launched, the lock was acquired, the active thread counter was changed from 0 to 8, unlocked the mutex, and then "notified_all". The controller thread would only then be awakened with the active thread count at zero, once the last worker thread decremented it that far and "notified_one".
In the worker thread, the condition variable would wait and wake only when the active thread count is greater than zero, unlock the mutex, in parallel proceed to execute the work preassigned to the processor when the pool was created, re-acquire the mutex, and atomically decrement the active thread count. It would then, while still supposedly protected by the lock, test if it was the last thread still active, and if so, again unlock the mutex and "notify_one" to awaken the controller.
The problem is - the active thread counter repeatedly proceeds below zero after even only 1 or 2 iterations. If I test the active thread count at the start of a new workload, I could find the active thread count down around -6 - it is as if the pool was allowed to reawaken the controller thread before the work was completed.
Given that the thread counter and terminate flag are both atomic variables and are only ever modified while under the protection of the same mutex, I am using sequential memory ordering for all updates, I just cannot see how this is happening and I'm lost.
#include <stdafx.h>
#include <Windows.h>
#include <iostream>
#include <thread>
using std::thread;
#include <mutex>
using std::mutex;
using std::unique_lock;
#include <condition_variable>
using std::condition_variable;
#include <atomic>
using std::atomic;
#include <chrono>
#include <vector>
using std::vector;
class IWorkerThreadProcessor
{
public:
virtual void Process(int) = 0;
};
class MyProcessor : public IWorkerThreadProcessor
{
int index_ = 0;
public:
MyProcessor(int index)
{
index_ = index;
}
void Process(int threadindex)
{
for (int i = 0; i < 5000000; i++);
std::cout << '(' << index_ << ':' << threadindex << ") ";
}
};
#define MsgBox(x) do{ MessageBox(NULL, x, L"", MB_OK ); }while(false)
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
//This goes negative when using the wait_for with predicate
atomic<int> threadsActive_ = 0;
atomic<bool> terminateFlag_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtxWorker_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtxWorker_);
threadsActive_ = 0;
terminateFlag_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
//Only used to see how many invocations I was getting through before experiencing the issue - sadly it's only one or two
invocations_++;
try
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use the predicated wait this break will fire !!!!!!
if (threadsActive_.load() != 0)
__debugbreak();
threadsActive_.store(poolSize_);
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
{
//As you can tell this has taken me through a journey trying to characterise the issue...
if (threadsActive_ > 0)
MsgBox(L"Thread pool timed out with still active threads");
else if (threadsActive_ == 0)
MsgBox(L"Thread pool timed out with zero active threads");
else
MsgBox(L"Thread pool timed out with negative active threads");
}
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use this predicated wait I see the failure !!!!!!
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
//!!!!!!!! Does not cause the failure but obviously will not handle
//spurious wake-ups !!!!!!!!!!
//cvSignalWork_.wait(lck);
if (terminateFlag_.load())
return;
//Unlock to parallelise the work load
lck.unlock();
processor->Process(threadIndex);
//Re-lock to decrement the work count
lck.lock();
//This returns the value before the subtraction so theoretically if the previous value was 1 then we're the last thread going and we can now signal the controller thread to wake. This is the only place that the decrement happens so I don't know how it could possibly go negative
if (threadsActive_.fetch_sub(1, std::memory_order_seq_cst) == 1)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtxWorker_);
if (!terminateFlag_)
{
terminateFlag_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
int main()
{
std::vector<IWorkerThreadProcessor*> processors;
for (int i = 0; i < 8; i++)
processors.push_back(new MyProcessor(i));
std::cout << "Instantiating thread pool\n";
auto pool = new ThreadPool;
std::cout << "Initialisting thread pool\n";
pool->Init(processors);
std::cout << "Thread pool initialised\n";
for (int i = 0; i < 200; i++)
{
std::cout << "Workload " << i << "\n";
pool->ProcessWorkload(std::chrono::milliseconds(500));
std::cout << "Workload " << i << " complete." << "\n";
}
for (auto a : processors)
delete a;
delete pool;
return 0;
}
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
std::atomic<unsigned int> awakenings_ = 0;
std::atomic<unsigned int> startedWorkloads_ = 0;
std::atomic<unsigned int> completedWorkloads_ = 0;
atomic<bool> terminate_ = false;
atomic<bool> stillFiring_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtx_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtx_);
//threadsActive_ = 0;
terminate_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
awakenings_ = 0;
completedWorkloads_ = 0;
startedWorkloads_ = 0;
invokations_ = 0;
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
try
{
unique_lock<mutex> lck(mtx_);
invokations_++;
if (startedWorkloads_ != 0)
__debugbreak();
if (completedWorkloads_ != 0)
__debugbreak();
if (awakenings_ != 0)
__debugbreak();
if (stillFiring_)
__debugbreak();
stillFiring_ = true;
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
//[this] { return this->threadsActive_.load() == 0; })
[this] { return completedWorkloads_ == poolSize_ && !stillFiring_; })
)
{
if (completedWorkloads_ < poolSize_)
{
if (startedWorkloads_ < poolSize_)
MsgBox(L"Thread pool timed out with some threads unstarted");
else if (startedWorkloads_ == poolSize_)
MsgBox(L"Thread pool timed out with all threads started but not all completed");
}
else
__debugbreak();
}
if (completedWorkloads_ != poolSize_)
__debugbreak();
if (awakenings_ != poolSize_)
__debugbreak();
awakenings_ = 0;
completedWorkloads_ = 0;
startedWorkloads_ = 0;
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtx_);
cvSignalWork_.wait(
lck,
[this] {
return
(stillFiring_ && (startedWorkloads_ < poolSize_)) ||
terminate_;
});
awakenings_++;
if (startedWorkloads_ == 0 && terminate_)
return;
if (stillFiring_ && startedWorkloads_ < poolSize_) //guard against spurious wakeup
{
startedWorkloads_++;
if (startedWorkloads_ == poolSize_)
stillFiring_ = false;
lck.unlock();
processor->Process(threadIndex);
lck.lock();
completedWorkloads_++;
if (completedWorkloads_ == poolSize_)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtx_);
if (!terminate_) //Don't attempt to double-terminate
{
terminate_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
I'm not certain if the following helps solve the problem, but I think the error is as shown below:
This
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
should be replaced by
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[&] { return threadsActive_.load() == 0; })
)
Looks like the lambda is not accessing the instantiated member of the class. Here is some reference to back my case. Look at Lambda Capture section of this page.
Edit:
Another place you are using wait for with lambdas.
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
Maybe modify all the lambdas and then see if it works?
The reason I'm looking at the lambda is because it seems like a case similar to a spurious wakeup. Hope it helps.
I'm working on a school lab and we are instructed to create a recursive mutex lock for a counting program. I've written some code (which doesn't work), but I think that this is mostly because I do not understand the real idea behind using a recursive mutex lock. Could anyone elaborate what a recursive mutex lock should do/look like?
General Note: I'm not asking for an answer, just some clarification as to what recursive mutex lock should do.
Also, if anyone is curious, here is the code required for this. The code that I am editing/implementing is the recmutex.c.
recmutex.h
#include <pthread.h>
/*
* The recursive_mutex structure.
*/
struct recursive_mutex {
pthread_cond_t cond;
pthread_mutex_t mutex; //a non-recursive pthread mutex
pthread_t owner;
unsigned int count;
unsigned int wait_count;
};
typedef struct recursive_mutex recursive_mutex_t;
/* Initialize the recursive mutex object.
*Return a non-zero integer if errors occur.
*/
int recursive_mutex_init (recursive_mutex_t *mu);
/* Destroy the recursive mutex object.
*Return a non-zero integer if errors occur.
*/
int recursive_mutex_destroy (recursive_mutex_t *mu);
/* The recursive mutex object referenced by mu shall be
locked by calling pthread_mutex_lock(). When a thread
successfully acquires a mutex for the first time,
the lock count shall be set to one and successfully return.
Every time a thread relocks this mutex, the lock count
shall be incremented by one and return success immediately.
And any other calling thread can only wait on the conditional
variable until being waked up. Return a non-zero integer if errors occur.
*/
int recursive_mutex_lock (recursive_mutex_t *mu);
/* The recursive_mutex_unlock() function shall release the
recursive mutex object referenced by mu. Each time the owner
thread unlocks the mutex, the lock count shall be decremented by one.
When the lock count reaches zero, the mutex shall become available
for other threads to acquire. If a thread attempts to unlock a
mutex that it has not locked or a mutex which is unlocked,
an error shall be returned. Return a non-zero integer if errors occur.
*/
int recursive_mutex_unlock (recursive_mutex_t *mu);
recmutex.c: contains the functions for the recursive mutex
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
#include "recmutex.h"
int recursive_mutex_init (recursive_mutex_t *mu){
int err;
err = pthread_mutex_init(&mu->mutex, NULL);
if(err != 0){
perror("pthread_mutex_init");
return -1;
}else{
return 0;
}
return 0;
}
int recursive_mutex_destroy (recursive_mutex_t *mu){
int err;
err = pthread_mutex_destroy(&mu->mutex);
if(err != 0){
perror("pthread_mutex_destroy");
return -1;
}else{
return 1;
}
return 0;
}
int recursive_mutex_lock (recursive_mutex_t *mu){
if(mutex_lock_count == 0){
pthread_mutex_lock(&mu->mutex);
mu->count++;
mu->owner = pthread_self();
printf("%s", mu->owner);
return 0;
}else if(mutex_lock_count > 0){
pthread_mutex_lock(&mu->mutex);
mu->count++;
mu->owner = pthread_self();
return 0;
}else{
perror("Counter decremented incorrectly");
return -1;
}
}
int recursive_mutex_unlock (recursive_mutex_t *mu){
if(mutex_lock_count <= 0){
printf("Nothing to unlock");
return -1;
}else{
mutex_lock_count--;
pthread_mutex_unlock(&mu->mutex);
return 0;
}
}
count_recursive.cc: The counting program mentioned above. Uses the recmutex functions.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
#include <assert.h>
#include <string.h>
#include "recmutex.h"
//argument structure for the thread
typedef struct _arg_{
int n1;
int n2;
int ntimes;
}Arg;
int count; //global counter
recursive_mutex_t mutex; //the recursive mutex
void do_inc(int n){
int ret;
if(n == 0){
return;
}else{
int c;
ret = recursive_mutex_lock(&mutex);
assert(ret == 0);
c = count;
c = c + 1;
count = c;
do_inc(n - 1);
ret = recursive_mutex_unlock(&mutex);
assert(ret == 0);
}
}
/* Counter increment function. It will increase the counter by n1 * n2 * ntimes. */
void inc(void *arg){
Arg * a = (Arg *)arg;
for(int i = 0; i < a->n1; i++){
for(int j = 0; j < a->n2; j++){
do_inc(a->ntimes);
}
}
}
int isPositiveInteger (const char * s)
{
if (s == NULL || *s == '\0' || isspace(*s))
return 0;
char * p;
int ret = strtol (s, &p, 10);
if(*p == '\0' && ret > 0)
return 1;
else
return 0;
}
int test1(char **argv){
printf("==========================Test 1===========================\n");
int ret;
//Get the arguments from the command line.
int num_threads = atoi(argv[1]); //The number of threads to be created.
int n1 = atoi(argv[2]); //The outer loop count of the inc function.
int n2 = atoi(argv[3]); //The inner loop count of the inc function.
int ntimes = atoi(argv[4]); //The number of increments to be performed in the do_inc function.
pthread_t *th_pool = new pthread_t[num_threads];
pthread_attr_t attr;
pthread_attr_init( &attr );
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
ret = recursive_mutex_init(&mutex);
assert(ret == 0);
printf("Start Test. Final count should be %d\n", num_threads * n1 * n2 * ntimes );
// Create threads
for(int i = 0; i < num_threads; i++){
Arg *arg = (Arg *)malloc(sizeof(Arg));
arg->n1 = n1;
arg->n2 = n2;
arg->ntimes = ntimes;
ret = pthread_create(&(th_pool[i]), &attr, (void * (*)(void *)) inc, (void *)arg);
assert(ret == 0);
}
// Wait until threads are done
for(int i = 0; i < num_threads; i++){
ret = pthread_join(th_pool[i], NULL);
assert(ret == 0);
}
if ( count != num_threads * n1 * n2 * ntimes) {
printf("\n****** Error. Final count is %d\n", count );
printf("****** It should be %d\n", num_threads * n1 * n2 * ntimes );
}
else {
printf("\n>>>>>> O.K. Final count is %d\n", count );
}
ret = recursive_mutex_destroy(&mutex);
assert(ret == 0);
delete [] th_pool;
return 0;
}
int foo(){
int ret;
printf("Function foo\n");
ret = recursive_mutex_unlock(&mutex);
assert(ret != 0);
return ret;
}
//test a thread call unlock without actually holding it.
int test2(){
int ret;
printf("\n==========================Test 2==========================\n");
pthread_t th;
pthread_attr_t attr;
pthread_attr_init( &attr );
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
ret = recursive_mutex_init(&mutex);
ret = pthread_create(&th, &attr, (void * (*)(void *))foo, NULL);
printf("Waiting for thread to finish\n");
ret = pthread_join(th, NULL);
assert(ret == 0);
return 0;
}
int main( int argc, char ** argv )
{
int ret;
count = 0;
if( argc != 5 ) {
printf("You must enter 4 arguments. \nUsage: ./count_recursive num_threads n1 n2 ntimes\n");
return -1;
}
if(isPositiveInteger(argv[1]) != 1 || isPositiveInteger(argv[2]) != 1 || isPositiveInteger(argv[3]) != 1 || isPositiveInteger(argv[4]) != 1 ){
printf("All the 4 arguments must be positive integers\n");
return -1;
}
test1(argv);
test2();
return 0;
}
The idea of a recursive mutex is that it can be successfully relocked by the thread that is currently holding the lock. For example:
if I had some mutexes like this (this is pseudocode):
mutex l;
recursive_mutex r;
In a single thread if I did this:
l.lock();
l.lock(); // this would hang the thread.
but
r.lock();
r.lock();
r.lock(); // this would all pass though with no issue.
In implimenting a recursive mutex you need to check what threadId has locked it, if it was locked, and if it matches the current thread id, return success.
The point of a recursive mutex, is to let you write this:
recursive_mutext_t rmutex;
void foo(...) {
recursive_lock_lock(&rmutex);
...
recursive_lock_unlock(&rmutex);
}
void bar(...) {
recursive_lock_lock(&rmutex);
...
foo(...);
...
recursive_lock_unlock(&rmutex);
}
void baz(...) {
...
foo(...);
...
}
The function foo() needs the mutex to be locked, but you want to be able to call it either from bar() where the same mutex is already locked, or from baz() where the mutex is not locked. If you used an ordinary mutex(), the thread would self-deadlock when foo() is called from bar() because the ordinary mutex lock() function will not return until the mutex is unlocked, and there's no other thread that will unlock it.
Your recursive_mutex_lock() needs to distinguish these cases; (1) The mutex is not locked, (2) the mutex is already locked, but the calling thread is the owner, and (3) the mutex is already locked by some other thread.
Case (3) needs to block the calling thread until the owner completely unlocks the mutex. At that point, it then converts to case (1). Here's a hint: Handle case (3) with a condition variable. That is to say, when the calling thread is not the owner, the calling thread should do a pthread_condition_wait(...) call.
I am now testing std::condition_variable recently , and find it is quite different with pthread_cond_t after test , I like to know if anything in my test wrong ? or std::condition_variable is really quite different with pthread_cond_t ?
The pthread_cond_t source is the following , compiled at gcc 4.4.6 :
pthread_cond_t condA = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int ProcessRow = 0 ;
#define LOOPCNT 10
void *producer()
{
int idx ;
for(idx=0;idx<LOOPCNT;idx++)
{
//pthread_mutex_lock(&mutex);
__sync_add_and_fetch(&ProcessRow,1) ;
pthread_cond_signal(&condA);
printf("sending signal...(%d)\n",ProcessRow) ;
//pthread_mutex_unlock(&mutex);
}
printf("I am out ... \n") ;
}
void *consumer()
{
int icnt = 0 ;
while(1)
{
pthread_mutex_lock(&mutex);
while (ProcessRow <= 0)
pthread_cond_wait(&condA, &mutex);
pthread_mutex_unlock(&mutex); // I forget to add unlock to fail this test
__sync_sub_and_fetch(&ProcessRow,1) ;
++icnt ;
printf("receving=(%d)\n",ProcessRow) ;
usleep(10000) ;
}
printf("(%d)\n",ProcessRow) ;
}
The output :
sending signal...(1)
sending signal...(2)
sending signal...(3)
sending signal...(4)
sending signal...(5)
sending signal...(6)
sending signal...(7)
sending signal...(8)
sending signal...(9)
sending signal...(10)
I am out ...
receving=(9)
Look like comsumer thread block in pthread_cond_wait , so that "receving" only print
one time !!!!
and then the following test is for std::condition_variable !!!!
The following binsem.hpp comes from
https://gist.github.com/yohhoy/2156481
with a little modification , compiled at g++ 4.8.1
class binsem {
public:
explicit binsem(int init_count = count_max)
: count_(init_count) {}
// P-operation / acquire
void wait()
{
std::unique_lock<std::mutex> lk(m_);
cv_.wait(lk, [this]{ return 0 < count_; });
--count_;
}
bool try_wait()
{
std::lock_guard<std::mutex> lk(m_);
if (0 < count_)
{
--count_;
return true;
} else
{
return false;
}
}
// V-operation / release
void signal()
{
std::lock_guard<std::mutex> lk(m_);
//if (count_ < count_max) // I mark here
//{ // I mark here
++count_;
cv_.notify_one();
//} // I mark here
}
// Lockable requirements
void lock() { wait(); }
bool try_lock() { return try_wait(); }
void unlock() { signal(); }
private:
static const int count_max = 1;
int count_;
std::mutex m_;
std::condition_variable cv_;
};
and my source :
#define LOOPCNT 10
atomic<int> ProcessRow ;
void f4()
{
for(int i=0;i<LOOPCNT;i++)
{
sem2.unlock() ;
++ProcessRow ;
}
cout << "i am out" << endl ;
}
void f5()
{
int icnt = 0 ;
std::chrono::milliseconds sleepDuration(1000);
while(1)
{
sem2.lock() ;
++icnt ;
std::this_thread::sleep_for(sleepDuration);
cout << ProcessRow << "in f5 " << endl ;
--ProcessRow ;
if(icnt >= LOOPCNT)
break ;
}
printf("(%d)\n",icnt) ;
}
The output :
i am out
10in f5
9in f5
8in f5
7in f5
6in f5
5in f5
4in f5
3in f5
2in f5
1in f5
(10)
Look like signal only effect if the pthread_cond_wait is waiting!! if not , signal is losted !!
And for std::condition_variable , look like std::condition_variable.wait() will wake up the times notify_one() are called ,if you call notify_one() 10 seconds ago and then call wait() , std::condition_variable.wait() still will get that notify_one() message , quite different with pthread_cond_t !!
Am I miss something in this test ? Or just like my test , std::condition and pthread_cond_t just act like the test showes ?
Edit :
I think the following will showes more easier for this test , sorry to forget to unlock so that the test failed , they are the same behavior !!!!
int main()
{
//pthread_mutex_lock(&mutex);
++ProcessRow ;
pthread_cond_signal(&condA);
//pthread_mutex_unlock(&mutex);
printf("sending signal...\n") ;
sleep(10) ;
pthread_mutex_lock(&mutex);
while (ProcessRow <= 0)
pthread_cond_wait(&condA, &mutex);
pthread_mutex_unlock(&mutex);
printf("wait pass through\n") ;
}
This will showes :
sending signal...
wait pass through
And for std::condition_variable
int main()
{
sem2.unlock() ;
std::chrono::milliseconds sleepDuration(10000);
cout << "going sleep" << endl ;
std::this_thread::sleep_for(sleepDuration);
sem2.lock() ;
cout << "lock pass through " << endl ;
}
Will showes :
going sleep
lock pass through
So it is my fault to do the test wrong , cause to deadlock !!! Thanks for all great advice!
In your pthread code, you never unlock the mutex, The consumer() function deadlocks on the second iteration. Also, the outer while loop should break out when some condition is satisfied. I suggest that it should break out when icnt reaches the LOOPCNT. This sort of matches how you break the loop in f5().
void *consumer(void *x)
{
int icnt = 0 ;
while(1)
{
pthread_mutex_lock(&mutex);
while (ProcessRow <= 0)
pthread_cond_wait(&condA, &mutex);
__sync_sub_and_fetch(&ProcessRow,1) ;
++icnt ;
printf("receving=(%d) icnt=(%d)\n",ProcessRow, icnt) ;
pthread_mutex_unlock(&mutex);
if (icnt == LOOPCNT) break;
usleep(10000) ;
}
printf("(%d)\n",ProcessRow) ;
}
It doesn't seem like your std::thread version of the code closely matches the pthread version at all, so I don't think you can compare their executions in this way. Instead of mimicking a semaphore, I think it better to just use the std::condition_variable exactly like you use it in the pthread version of the code. This way, you can really compare apples to apples.
std::condition_variable condA;
std::mutex mutex;
volatile int ProcessRow = 0 ;
#define LOOPCNT 10
void producer()
{
int idx ;
for(idx=0;idx<LOOPCNT;idx++)
{
std::unique_lock<std::mutex> lock(mutex);
__sync_add_and_fetch(&ProcessRow,1) ;
condA.notify_one();
printf("sending signal...(%d)\n",ProcessRow) ;
}
printf("I am out ... \n") ;
}
void consumer()
{
int icnt = 0 ;
while(icnt < LOOPCNT)
{
if(icnt > 0) usleep(10000);
std::unique_lock<std::mutex> lock(mutex);
while (ProcessRow <= 0)
condA.wait(lock);
__sync_sub_and_fetch(&ProcessRow,1) ;
++icnt ;
printf("receving=(%d) icnt=(%d)\n",ProcessRow, icnt) ;
}
printf("(%d)\n",ProcessRow) ;
}
Both pthread_cond_t and std::condition_variable work the same way. They are stateless and a signal can only get "lost" if no thread is blocked, in which case no signal is needed because there is no thread that needs one.
I learned how to send additional parameters to a thread on a related post, but now i would like to know if i can get back the data(processed by the thread) back into the calling function!
I am writing a program in which i need to use a thread that continuously puts user input into a string variable. The problem is that i don't know how to get the string variable data back to the main() where it is displayed (graphically). And so (i prefer that) the getting of user input and the Displaying of the string be done independently (since they need to be looped at different rates : say...30 fps for user input and 16 fps for display)
i hope i am clear
Here is an ideal problematic situation(but not one that i need a solution to):
typedef struct
{
int a,b;
}ThreadData;
int avg(void* data)
{
ThreadData* tdata=(ThreadData*)data;
int processed_average=(tdata->a+tdata->b)/2.0;
//this is what i want to send back to the main()
return 0;
}
void main()
{
int a=10,b=20;
SDL_Thread* mythread=SDL_CreateThread(avg,myThreadData);
cout<<"The average of a and b is "; //i dont know what to put here!
}
Forgive me for any syntax errors in my demo
As a conclusive question :
How to get the current contents of a string that is continuously updated by a thread (using a loop) back into the main() which contains another loop that continuously updates the screen (graphically) with the current(latest) contents of the string?
A decent pattern for inter thread communication is a message queue - you can implement one with a mutex, a list and a condition variable - one use an off the shelf variant. Here are some implementations you can look at:
http://pocoproject.org/docs/Poco.NotificationQueue.html
http://gnodebian.blogspot.com.es/2013/07/a-thread-safe-asynchronous-queue-in-c11.html
http://docs.wxwidgets.org/trunk/classwx_message_queue_3_01_t_01_4.html
http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/reference/containers_overview/concurrent_queue_cls.htm
You would then have the thread push data onto the queue - and in main pop data from the queue.
Edit 1: in response to the OP's edit.
If you have a single string that has to be edited by the thread and then rendered by main it is best to just use std::string, protect all access to it with a mutex, and then use a condition variable to signal the main thread when the string changes. Will try and write some sample code for you in a minute.
Edit 2: Sample code as promised:
#include <SDL/SDL.h>
#include <SDL/SDL_thread.h>
#include <iostream>
#include <sstream>
#include <stdexcept>
class SdlMutex
{
public:
SdlMutex()
{
mutex = SDL_CreateMutex();
if ( !mutex ) throw std::runtime_error( "SDL_CreateMutex == NULL" );
}
~SdlMutex()
{
SDL_DestroyMutex( mutex );
}
void lock()
{
if( SDL_mutexP( mutex ) == -1 ) throw std::runtime_error( "SDL_mutexP == -1" );
// Note:
// -1 does not mean it was already locked - it means there was an error in locking -
// if it was locked it will just block - see SDL_mutexP(3)
}
void unlock()
{
if ( SDL_mutexV( mutex ) == -1 ) throw std::runtime_error( "SDL_mutexV == -1" );
}
SDL_mutex* underlying()
{
return mutex;
}
private:
SDL_mutex* mutex;
};
class SdlScopedLock
{
public:
SdlScopedLock( SdlMutex& mutex )
:
mutex( mutex )
{
mutex.lock();
}
~SdlScopedLock()
{
try
{
this->unlock();
}
catch( const std::exception& e )
{
// Destructors should never throw ...
std::cerr << "SdlScopedLock::~SdlScopedLock - caught : " << e.what() << std::endl;
}
}
void unlock()
{
mutex.unlock();
}
private:
SdlMutex& mutex;
};
class ThreadData
{
public:
ThreadData()
:
dataReady( false ),
done( false )
{
condition = SDL_CreateCond();
}
~ThreadData()
{
SDL_DestroyCond( condition );
}
// Using stringstream so I can just shift on integers...
std::stringstream data;
bool dataReady;
bool done;
SdlMutex mutex;
SDL_cond* condition;
};
int threadFunction( void* data )
{
try
{
ThreadData* threadData = static_cast< ThreadData* >( data );
for ( size_t i = 0; i < 100; i++ )
{
{
SdlScopedLock lock( threadData->mutex );
// Everything in this scope is now syncronized with the mutex
if ( i != 0 ) threadData->data << ", ";
threadData->data << i;
threadData->dataReady = true;
} // threadData->mutex is automatically unlocked here
// Its important to note that condition should be signaled after mutex is unlocked
if ( SDL_CondSignal( threadData->condition ) == -1 ) throw std::runtime_error( "Failed to signal" );
}
{
SdlScopedLock lock( threadData->mutex );
threadData->done = true;
}
if ( SDL_CondSignal( threadData->condition ) == -1 ) throw std::runtime_error( "Failed to signal" );
return 0;
}
catch( const std::exception& e )
{
std::cerr << "Caught : " << e.what() << std::endl;
return 1;
}
}
int main()
{
ThreadData threadData;
SDL_Thread* thread = SDL_CreateThread( threadFunction, &threadData );
while ( true )
{
SdlScopedLock lock( threadData.mutex );
while ( threadData.dataReady == false && threadData.done == false )
{
// NOTE: must call condition wait with mutex already locked
if ( SDL_CondWait( threadData.condition, threadData.mutex.underlying() ) == -1 ) throw std::runtime_error( "Failed to wait" );
}
// once dataReady == true or threadData.done == true we get here
std::cout << "Got data = " << threadData.data.str() << std::endl;
threadData.data.str( "" );
threadData.dataReady = false;
if ( threadData.done )
{
std::cout << "child done - ending" << std::endl;
break;
}
}
int status = 99;
SDL_WaitThread( thread, &status );
std::cerr << "Thread completed with : " << status << std::endl;
}
Edit 3: And then the cage comes down...
You should probbably not use SDL thread support in C++, or atleast wrap it in some RAII classes - for example, in the above code - if an exception is throw - you should ensure mutex is unlocked. I will update sample with RAII, but there are many better options to SDL thread helpers. (NOTE: Edit 4 adds RAII - so now mutex is unlocked when an exception is thrown)
Edit 4: Code is now safer - still make sure you do error checks - and basically: don't use SDL threads in C++ - use boost::thread or std::thread.
I think you want SDL_WaitThread.
void SDL_WaitThread(SDL_Thread *thread, int *status);
The return code for the thread function is placed in the area pointed
to by status, if status is not NULL.
Have your avg function return the average.