C++ app performance varies when using threads - c++

I have a C++ app with 2 threads. The app displays a gauge on the screen, with an indicator that rotates based on an angle received via UDP socket. My problem is that the indicator should be rotating at a constant rate but it behaves like time slows down at times, and it also fast-forwards to catch up quickly at other times, with some pauses intermittently.
Each frame, the display (main) thread guards a copy of the angle from the UDP thread. The UDP thread also guards writing to the shared variable. I use a Windows CriticalSection object to guard the 'communication' between threads. The UDP packet is received at approximately the same rate as the display update. I am using Windows 7, 64 bit, with a 4-core processor.
I am using a separate python app to broadcast the UDP packet. I use the python function, time.sleep, to keep the broadcast at a constant rate.
Why does the application slow down?
Why does the application seem to fast-forward instead of snapping to the latest angle?
What is the proper fix?
EDIT: I am not 100% sure all angle values are remembered when the app seems to 'fast forward'. The app is snapping to some value (not sure if it is the 'latest') at times.
EDIT 2: per request, some code.
void App::udp_update(DWORD thread_id)
{
Packet p;
_socket.recv(p); // edit: blocks until transmission is received
{
Locker lock(_cs);
_packet = p;
}
}
void App::main_update()
{
float angle_copy = 0.f;
{
Locker lock(_cs);
angle_copy = _packet.angle;
}
draw(angle_copy); // edit: blocks until monitor refreshes
}
Thread.h
class CS
{
private:
friend Locker;
CRITICAL_SECTION _handle;
void _lock();
void _unlock();
// not implemented by design
CS(CS&);
CS& operator=(const CS&);
public:
CS();
~CS();
};
class Locker
{
private:
CS& _cs;
// not implemented by design
Locker();
Locker(const Locker&);
Locker& operator=(const Locker&);
public:
Locker(CS& c)
: _cs(c)
{
_cs._lock();
}
~Locker()
{
_cs._unlock();
}
};
class Win32ThreadPolicy
{
public:
typedef Functor<void,TYPELIST_1(DWORD)> Callback;
private:
Callback _callback;
//SECURITY_DESCRIPTOR _sec_descr;
//SECURITY_ATTRIBUTES _sec_attrib;
HANDLE _handle;
//DWORD _exitValue;
#ifdef USE_BEGIN_API
unsigned int _id;
#else // USE_BEGIN_API
DWORD _id;
#endif // USE_BEGIN_API
/*volatile*/ bool _is_joined;
#ifdef USE_BEGIN_API
static unsigned int WINAPI ThreadProc( void* lpParameter );
#else // USE_BEGIN_API
static DWORD WINAPI ThreadProc( LPVOID lpParameter );
#endif // USE_BEGIN_API
DWORD _run();
void _join();
// not implemented by design
Win32ThreadPolicy();
Win32ThreadPolicy(const Win32ThreadPolicy&);
Win32ThreadPolicy& operator=(const Win32ThreadPolicy&);
public:
Win32ThreadPolicy(Callback& func);
~Win32ThreadPolicy();
void Spawn();
void Join();
};
/// helps to manage parallel operations.
/// attempts to mimic the C++11 std::thread interface, but also passes the thread ID.
class Thread
{
public:
typedef Functor<void,TYPELIST_1(DWORD)> Callback;
typedef Win32ThreadPolicy PlatformPolicy;
private:
PlatformPolicy _platform;
/// not implemented by design
Thread();
Thread(const Thread&);
Thread& operator=(const Thread&);
public:
/// begins parallel execution of the parameter, func.
/// \param func, the function object to be executed.
Thread(Callback& func)
: _platform(func)
{
_platform.Spawn();
}
/// stops parallel execution and joins with main thread.
~Thread()
{
_platform.Join();
}
};
Thread.cpp
#include "Thread.h"
void CS::_lock()
{
::EnterCriticalSection( &_handle );
}
void CS::_unlock()
{
::LeaveCriticalSection( &_handle );
}
CS::CS()
: _handle()
{
::memset( &_handle, 0, sizeof(CRITICAL_SECTION) );
::InitializeCriticalSection( &_handle );
}
CS::~CS()
{
::DeleteCriticalSection( &_handle );
}
Win32ThreadPolicy::Win32ThreadPolicy(Callback& func)
: _handle(NULL)
//, _sec_descr()
//, _sec_attrib()
, _id(0)
, _is_joined(true)
, _callback(func)
{
}
void Win32ThreadPolicy::Spawn()
{
// for an example of managing descriptors, see:
// http://msdn.microsoft.com/en-us/library/windows/desktop/aa446595%28v=vs.85%29.aspx
//BOOL success_descr = ::InitializeSecurityDescriptor( &_sec_descr, SECURITY_DESCRIPTOR_REVISION );
//TODO: do we want to start with CREATE_SUSPENDED ?
// TODO: wrap this with exception handling
#ifdef USE_BEGIN_END
// http://msdn.microsoft.com/en-us/library/kdzttdcb%28v=vs.100%29.aspx
_handle = (HANDLE) _beginthreadex( NULL, 0, &Thread::ThreadProc, this, 0, &_id );
#else // USE_BEGIN_END
_handle = ::CreateThread( NULL, 0, &Win32ThreadPolicy::ThreadProc, this, 0, &_id );
#endif // USE_BEGIN_END
}
void Win32ThreadPolicy::_join()
{
// signal that the thread should complete
_is_joined = true;
// maybe ::WFSO is not the best solution.
// "Except that WaitForSingleObject and its big brother WaitForMultipleObjects are dangerous.
// The basic problem is that these calls can cause deadlocks,
// if you ever call them from a thread that has its own message loop and windows."
// http://marc.durdin.net/2012/08/waitforsingleobject-why-you-should-never-use-it/
//
// He advises to use MsgWaitForMultipleObjects instead:
// http://msdn.microsoft.com/en-us/library/windows/desktop/ms684242%28v=vs.85%29.aspx
DWORD result = ::WaitForSingleObject( _handle, INFINITE );
// _handle must have THREAD_QUERY_INFORMATION security access enabled to use the following:
//DWORD exitCode = 0;
//BOOL success = ::GetExitCodeThread( _handle, &_exitValue );
}
Win32ThreadPolicy::~Win32ThreadPolicy()
{
}
void Win32ThreadPolicy::Join()
{
if( !_is_joined )
{
_join();
}
// this example shows that it is correct to pass the handle returned by CreateThread
// http://msdn.microsoft.com/en-us/library/windows/desktop/ms682516%28v=vs.85%29.aspx
::CloseHandle( _handle );
_handle = NULL;
}
DWORD Win32ThreadPolicy::_run()
{
// TODO: do we need to make sure _id has been assigned?
while( !_is_joined )
{
_callback(_id);
::Sleep(0);
}
// TODO: what should we return?
return 0;
}
#ifdef USE_BEGIN_END
unsigned int WINAPI Thread::ThreadProc( LPVOID lpParameter )
#else // USE_BEGIN_END
DWORD WINAPI Win32ThreadPolicy::ThreadProc( LPVOID lpParameter )
#endif // USE_BEGIN_END
{
Win32ThreadPolicy* tptr = static_cast<Win32ThreadPolicy*>( lpParameter );
tptr->_is_joined = false;
// when this function (ThreadProc) returns, ::ExitThread is used to terminate the thread with an "implicit" call.
// http://msdn.microsoft.com/en-us/library/windows/desktop/ms682453%28v=vs.85%29.aspx
return tptr->_run();
}

I know this is a bit in the assumption space but:
The rate you are talking about is set in "server" and "client" via a sleep that controls the speed with which the packets are sent. This is not necessarily the rate of actual transmission, as the OS can schedule your processes in a very asymmetric way (time wise).
This can mean that when the server gets more time, it will fill an OS buffer with packets (the client will get less processor time, thus, consumming at a lower rate => slowing down the meter). Then, when the client gets more time that the server, it will consume fast all packets, while the update thread will still do some waiting. But this doesn't mean it will "snap", because you are using a critical section to lock the packet update, so probably you don't get to consume too many packages from the OS buffer until a new update. (you may have a "snap to", but with a small step). I am basing this on the fact that i see no actual sleeping in your receive or update methods (the only sleep is done on server side).

Related

Passing a function to wxWidgets thread-pool with inter-thread communication

I am a hobby programmer learning C++ and multi-threading, and getting started on my first thread-pool attempt.
I use Code::Blocks 20.3, wxWidgets 3.1.4, and MinGW 17.1 on a Windows 10 Pro computer.
I have tried several thread-pool examples, but all blocked the GUI.
I found an example shown in https://wiki.wxwidgets.org/Inter-Thread_and_Inter-Process_communication
that uses detached threads in a pool. This should not block the GUI.
I have "restructured" the 1 file example to work in a test project (gui, app, main, thread-pool modules).
I placed the classes in their own file, and moved the "main" part to the Main.cpp in my test project and replaced the gui code with a separate class file.
The standard example works as expected.
In the example, strings are passed to the thread-pool and other strings back to the main thread.
I have been searching for main thread AddToQueue() to pass any function like e.g. aTask() (void, or returning something to the main thread) that executes in the thread-pool. My search was not successful :-(.
=== Simple Task to be executed in a thread ===
std::vector<wxString> wxThreadCom2Frame::aTask(wxString wsSomeString, int x)
{
std::vector<wxString> vTest{};
for(int i = 0; i < x; i++)
{
wxString wsTest{};
wsTest << wsSomeString << " [" << i << "]";
vTest.push_back(wsTest);
}
return vTest;
}
=== Or as alternative, pass the vector by reference
aTask(_T("Just some text"), 5, &vTest); // to be queued with AddJob
===
void wxThreadCom2Frame::aTask(wxString wsSomeString, int x, std::vector<wxString> *vTest)
{
for(int i = 0; i < x; i++)
{
wxString wsTest{};
wsTest << wsSomeString << " [" << i << "]\n";
vTest->push_back(wsTest);
}
}
===
I hope someone can help me understand how to do this.
This is the first step to what I actually like to achieve.
An 'extraction' function returns a structure of 20 tags from a music file (mp3, flac, etc).
The main 'collecting' function will call the 'extraction' function for each file (up to 7000) in a list and place it in the queue of the thread-pool.
The 'collecting' function returns a vector of structures to the main thread for further processing.
Regards, Ruud.
=== ThreadCom.cpp ===
/////////////////////////////////////////////////////////////////////////////
// https://wiki.wxwidgets.org/Inter-Thread_and_Inter-Process_communication //
/////////////////////////////////////////////////////////////////////////////
// Standard
#include <stdlib.h>
#include <assert.h>
#include <map>
#include <list>
// wxWidgets
#include <wx/frame.h>
#include <wx/thread.h>
#include <wx/menu.h>
#include <wx/app.h>
class tJOB
{
public:
enum tCOMMANDS // list of commands that are currently implemented
{
eID_THREAD_EXIT=wxID_EXIT, // thread should exit or wants to exit
eID_THREAD_NULL=wxID_HIGHEST+1, // dummy command
eID_THREAD_STARTED, // worker thread has started OK
eID_THREAD_JOB = ID_THREAD_JOB, // process normal job
eID_THREAD_JOBERR = ID_THREAD_JOBERR // process erroneous job after which thread likes to exit
}; // enum tCOMMANDS
tJOB() : m_cmd(eID_THREAD_NULL) {}
tJOB(tCOMMANDS cmd, const wxString& arg) : m_cmd(cmd), m_Arg(arg) {}
tCOMMANDS m_cmd; wxString m_Arg;
}; // class tJOB
class QUEUE
{
public:
enum tPRIORITY { eHIGHEST, eHIGHER, eNORMAL, eBELOW_NORMAL, eLOW, eIDLE }; // priority classes
QUEUE(wxEvtHandler* pParent) : m_pParent(pParent) {}
void AddJob(const tJOB& job, const tPRIORITY& priority=eNORMAL) // push a job with given priority class onto the FIFO
{
wxMutexLocker lock(m_MutexQueue); // lock the queue
m_Jobs.insert(std::make_pair(priority, job)); // insert the prioritized entry into the multimap
m_QueueCount.Post(); // new job has arrived: increment semaphore counter
} // void AddJob(const tJOB& job, const tPRIORITY& priority=eNORMAL)
tJOB Pop()
{
tJOB element;
m_QueueCount.Wait(); // wait for semaphore (=queue count to become positive)
m_MutexQueue.Lock(); // lock queue
element=(m_Jobs.begin())->second; // get the first entry from queue (higher priority classes come first)
m_Jobs.erase(m_Jobs.begin()); // erase it
m_MutexQueue.Unlock(); // unlock queue
return element; // return job entry
} // tJOB Pop()
void Report(const tJOB::tCOMMANDS& cmd, const wxString& sArg=wxEmptyString, int iArg=0) // report back to parent
{
wxCommandEvent evt(wxEVT_THREAD, cmd); // create command event object
evt.SetString(sArg); // associate string with it
evt.SetInt(iArg);
m_pParent->AddPendingEvent(evt); // and add it to parent's event queue
} // void Report(const tJOB::tCOMMANDS& cmd, const wxString& arg=wxEmptyString)
size_t Stacksize() // helper function to return no of pending jobs
{
wxMutexLocker lock(m_MutexQueue); // lock queue until the size has been read
return m_Jobs.size();
}
private:
wxEvtHandler* m_pParent;
std::multimap<tPRIORITY, tJOB> m_Jobs; // multimap to reflect prioritization: values with lower keys come first, newer values with same key are appended
wxMutex m_MutexQueue; // protects queue access
wxSemaphore m_QueueCount; // semaphore count reflects number of queued jobs
};
class WorkerThread : public wxThread
{
public:
WorkerThread(QUEUE* pQueue, int id=0) : m_pQueue(pQueue), m_ID(id) { assert(pQueue); wxThread::Create(); }
private:
QUEUE* m_pQueue;
int m_ID;
virtual wxThread::ExitCode Entry()
{
Sleep(1000); // sleep a while to simulate some time-consuming init procedure
tJOB::tCOMMANDS iErr;
m_pQueue->Report(tJOB::eID_THREAD_STARTED, wxEmptyString, m_ID); // tell main thread that worker thread has successfully started
try { while(true) OnJob(); } // this is the main loop: process jobs until a job handler throws
catch(tJOB::tCOMMANDS& i) { m_pQueue->Report(iErr=i, wxEmptyString, m_ID); } // catch return value from error condition
return (wxThread::ExitCode)iErr; // and return exit code
} // virtual wxThread::ExitCode Entry()
virtual void OnJob()
{
tJOB job=m_pQueue->Pop(); // pop a job from the queue. this will block the worker thread if queue is empty
switch(job.m_cmd)
{
case tJOB::eID_THREAD_EXIT: // thread should exit
Sleep(1000); // wait a while
throw tJOB::eID_THREAD_EXIT; // confirm exit command
case tJOB::eID_THREAD_JOB: // process a standard job
Sleep(2000);
m_pQueue->Report(tJOB::eID_THREAD_JOB, wxString::Format(wxT("Job #%s done."), job.m_Arg.c_str()), m_ID); // report successful completion
break;
case tJOB::eID_THREAD_JOBERR: // process a job that terminates with an error
m_pQueue->Report(tJOB::eID_THREAD_JOB, wxString::Format(wxT("Job #%s erroneous."), job.m_Arg.c_str()), m_ID);
Sleep(1000);
throw tJOB::eID_THREAD_EXIT; // report exit of worker thread
break;
case tJOB::eID_THREAD_NULL: // dummy command
default:
break; // default
} // switch(job.m_cmd)
} // virtual void OnJob()
}; // class WorkerThread : public wxThread
=== partial wxThreadCom2Main.cpp ===
void wxThreadCom2Frame::AddToQueue( wxCommandEvent& event )
{
int iJob=rand();
m_pQueue->AddJob(tJOB((tJOB::tCOMMANDS)event.GetId(), wxString::Format(wxT("%u"), iJob)));
SetStatusText(wxString::Format(wxT("Job #%i started."), iJob)); // just set the status text
}
void wxThreadCom2Frame::OnThread(wxCommandEvent& event) // handler for thread notifications
{
switch(event.GetId())
{
case tJOB::eID_THREAD_JOB:
// Get the returned vector and do something with it
SetStatusText(wxString::Format(wxT("[%i]: %s"), event.GetInt(), event.GetString().c_str())); // progress display
break;
case tJOB::eID_THREAD_EXIT:
SetStatusText(wxString::Format(wxT("[%i]: Stopped."), event.GetInt()));
m_Threads.remove(event.GetInt()); // thread has exited: remove thread ID from list
if(m_Threads.empty()) { EnableControls(false); } // disable some menu items if no more threads
break;
case tJOB::eID_THREAD_STARTED:
SetStatusText(wxString::Format(wxT("[%i]: Ready."), event.GetInt()));
EnableControls(true); // at least one thread successfully started: enable controls
break;
default:
event.Skip();
}
}
void wxThreadCom2Frame::EnableControls(bool bEnable) // en/dis-able Stop, Add Job, Add JobErr
{
wxMenu* pMenu=GetMenuBar()->GetMenu(0);
static const int MENUIDS[]={/*ID_START_THREAD, */ID_THREAD_EXIT, ID_THREAD_JOB, ID_THREAD_JOBERR};
for(unsigned int i=0; i<WXSIZEOF(MENUIDS); pMenu->Enable(MENUIDS[i++], bEnable));
}
===

Wait until a variable becomes zero

I'm writing a multithreaded program that can execute some tasks in separate threads.
Some operations require waiting for them at the end of execution of my program. I've written simple guard for such "important" operations:
class CPendingOperationGuard final
{
public:
CPendingOperationGuard()
{
InterlockedIncrementAcquire( &m_ullCounter );
}
~CPendingOperationGuard()
{
InterlockedDecrementAcquire( &m_ullCounter );
}
static bool WaitForAll( DWORD dwTimeOut )
{
// Here is a topic of my question
// Return false on timeout
// Return true if wait was successful
}
private:
static volatile ULONGLONG m_ullCounter;
};
Usage is simple:
void ImportantTask()
{
CPendingOperationGuard guard;
// Do work
}
// ...
void StopExecution()
{
if(!CPendingOperationGuard::WaitForAll( 30000 )) {
// Handle error
}
}
The question is: how to effectively wait until a m_ullCounter becames zero or until timeout.
I have two ideas:
To launch this function in another separate thread and write WaitForSingleObject( hThread, dwTimeout ):
DWORD WINAPI WaitWorker( LPVOID )
{
while(InterlockedCompareExchangeRelease( &m_ullCounter, 0, 0 ))
;
}
But it will "eat" almost 100% of CPU time - bad idea.
Second idea is to allow other threads to start:
DWORD WINAPI WaitWorker( LPVOID )
{
while(InterlockedCompareExchangeRelease( &m_ullCounter, 0, 0 ))
Sleep( 0 );
}
But it'll switch execution context into kernel mode and back - too expensive in may task. Bad idea too
The question is:
How to perform almost-zero-overhead waiting until my variable becames zero? Maybe without separate thread... The main condition is to support stopping of waiting by timeout.
Maybe someone can suggest completely another idea for my task - to wait for all registered operations (like in WinAPI's ThreadPools - its API has, for instance, WaitForThreadpoolWaitCallbacks to perform waiting for ALL registered tasks).
PS: it is not possible to rewrite my code with ThreadPool API :(
Have a look at the WaitOnAddress() and WakeByAddressSingle()/WakeByAddressAll() functions introduced in Windows 8.
For example:
class CPendingOperationGuard final
{
public:
CPendingOperationGuard()
{
InterlockedIncrementAcquire(&m_ullCounter);
Wake­By­Address­All(&m_ullCounter);
}
~CPendingOperationGuard()
{
InterlockedDecrementAcquire(&m_ullCounter);
Wake­By­Address­All(&m_ullCounter);
}
static bool WaitForAll( DWORD dwTimeOut )
{
ULONGLONG Captured, Now, Deadline = GetTickCount64() + dwTimeOut;
DWORD TimeRemaining;
do
{
Captured = InterlockedExchangeAdd64((LONG64 volatile *)&m_ullCounter, 0);
if (Captured == 0) return true;
Now = GetTickCount64();
if (Now >= Deadline) return false;
TimeRemaining = static_cast<DWORD>(Deadline - Now);
}
while (WaitOnAddress(&m_ullCounter, &Captured, sizeof(ULONGLONG), TimeRemaining));
return false;
}
private:
static volatile ULONGLONG m_ullCounter;
};
Raymond Chen wrote a series of blog articles about these functions:
WaitOnAddress lets you create a synchronization object out of any data variable, even a byte
Implementing a critical section in terms of WaitOnAddress
Spurious wakes, race conditions, and bogus FIFO claims: A peek behind the curtain of WaitOnAddress
Extending our critical section based on WaitOnAddress to support timeouts
Comparing WaitOnAddress with futexes (futexi? futexen?)
Creating a semaphore from WaitOnAddress
Creating a semaphore with a maximum count from WaitOnAddress
Creating a manual-reset event from WaitOnAddress
Creating an automatic-reset event from WaitOnAddress
A helper template function to wait for WaitOnAddress in a loop
you need for this task something like Run-Down Protection instead CPendingOperationGuard
before begin operation, you call ExAcquireRundownProtection and only if it return TRUE - begin execute operation. at the end you must call ExReleaseRundownProtection
so pattern must be next
if (ExAcquireRundownProtection(&RunRef)) {
do_operation();
ExReleaseRundownProtection(&RunRef);
}
when you want stop this process and wait for all active calls do_operation(); finished - you call ExWaitForRundownProtectionRelease (instead WaitWorker)
After ExWaitForRundownProtectionRelease is called, the ExAcquireRundownProtection routine will return FALSE (so new operations will not start after this). ExWaitForRundownProtectionRelease waits to return until all calls the ExReleaseRundownProtection routine to release the previously acquired run-down protection (so when all current(if exist) operation complete). When all outstanding accesses are completed, ExWaitForRundownProtectionRelease returns
unfortunately this api implemented by system only in kernel mode and no analog in user mode. however not hard implement such idea yourself
this is my example:
enum RundownState {
v_complete = 0, v_init = 0x80000000
};
template<typename T>
class RundownProtection
{
LONG _Value;
public:
_NODISCARD BOOL IsRundownBegin()
{
return 0 <= _Value;
}
_NODISCARD BOOL AcquireRP()
{
LONG Value, NewValue;
if (0 > (Value = _Value))
{
do
{
NewValue = InterlockedCompareExchangeNoFence(&_Value, Value + 1, Value);
if (NewValue == Value) return TRUE;
} while (0 > (Value = NewValue));
}
return FALSE;
}
void ReleaseRP()
{
if (InterlockedDecrement(&_Value) == v_complete)
{
static_cast<T*>(this)->RundownCompleted();
}
}
void Rundown_l()
{
InterlockedBitTestAndResetNoFence(&_Value, 31);
}
void Rundown()
{
if (AcquireRP())
{
Rundown_l();
ReleaseRP();
}
}
RundownProtection(RundownState Value = v_init) : _Value(Value)
{
}
void Init()
{
_Value = v_init;
}
};
///////////////////////////////////////////////////////////////
class OperationGuard : public RundownProtection<OperationGuard>
{
friend RundownProtection<OperationGuard>;
HANDLE _hEvent;
void RundownCompleted()
{
SetEvent(_hEvent);
}
public:
OperationGuard() : _hEvent(0) {}
~OperationGuard()
{
if (_hEvent)
{
CloseHandle(_hEvent);
}
}
ULONG WaitComplete(ULONG dwMilliseconds = INFINITE)
{
return WaitForSingleObject(_hEvent, dwMilliseconds);
}
ULONG Init()
{
return (_hEvent = CreateEvent(0, 0, 0, 0)) ? NOERROR : GetLastError();
}
} g_guard;
//////////////////////////////////////////////
ULONG CALLBACK PendingOperationThread(void*)
{
while (g_guard.AcquireRP())
{
Sleep(1000);// do operation
g_guard.ReleaseRP();
}
return 0;
}
void demo()
{
if (g_guard.Init() == NOERROR)
{
if (HANDLE hThread = CreateThread(0, 0, PendingOperationThread, 0, 0, 0))
{
CloseHandle(hThread);
}
MessageBoxW(0, 0, L"UI Thread", MB_ICONINFORMATION|MB_OK);
g_guard.Rundown();
g_guard.WaitComplete();
}
}
why simply wait when wait until a m_ullCounter became zero not enough
if we read 0 from m_ullCounter this mean only at this time no active operation. but pending operation can begin already after we check that m_ullCounter == 0 . we can use special flag (say bool g_bQuit) and set it. operation before begin check this flag and not begin if it true. but this anyway not enough
naive code:
//worker thread
if (!g_bQuit) // (1)
{
// MessageBoxW(0, 0, L"simulate delay", MB_ICONWARNING);
InterlockedIncrement(&g_ullCounter); // (4)
// do operation
InterlockedDecrement(&g_ullCounter); // (5)
}
// here we wait for all operation done
g_bQuit = true; // (2)
// wait on g_ullCounter == 0, how - not important
while (g_ullCounter) continue; // (3)
pending operation checking g_bQuit flag (1) - it yet false, so it
begin
worked thread is swapped (use MessageBox for simulate this)
we set g_bQuit = true; // (2)
we check/wait for g_ullCounter == 0, it 0 so we exit (3)
working thread wake (return from MessageBox) and increment
g_ullCounter (4)
problem here that operation can use some resources which we already begin destroy after g_ullCounter == 0
this happens because check quit flag (g_Quit) and increment counter after this not atomic - can be a gap between them.
for correct solution we need atomic access to flag+counter. this and do rundown protection. for flag+counter used single LONG variable (32 bit) because we can do atomic access to it. 31 bits used for counter and 1 bits used for quit flag. windows solution use 0 bit for flag (1 mean quit) and [1..31] bits for counter. i use the [0..30] bits for counter and 31 bit for flag (0 mean quit). look for

libuv - Limiting callback rate of idle event without blocking thread without multithreading

I'm using libsourcey which uses libuv as its underlying I/O networking layer.
Everything is setup and seems to run (haven't testen anything yet at all since I'm only prototyping and experimenting). However, I require that next to the application loop (the one that comes with libsourcey which relies on libuv's loop), also calls an "Idle function". As it is now, it calls the Idle CB on every cycle which is very CPU consuming. I'd need a way to limit the call-rate of the uv_idle_cb without blocking the calling thread which is the same the application uses to process I/O data (not sure about this last statement, correct me if i'm mistaken).
The idle function will be managing several different aspects of the application and it needs to run only x times within 1 second. Also, everything needs to run one the same thread (planning to upgrade an older application's network infrastructure which runs entirely single-threaded).
This is the code I have so far which also includes the test I did with sleeping the thread within the callback but it blocks everything so even the 2nd idle cb I set up has the same call-rate as the 1st one.
struct TCPServers
{
CTCPManager<scy::net::SSLSocket> ssl;
};
int counter = 0;
void idle_cb(uv_idle_t *handle)
{
printf("Idle callback %d TID %d\n", counter, std::this_thread::get_id());
counter++;
std::this_thread::sleep_for(std::chrono::milliseconds(1000 / 25));
}
int counter2 = 0;
void idle_cb2(uv_idle_t *handle)
{
printf("Idle callback2 %d TID %d\n", counter2, std::this_thread::get_id());
counter2++;
std::this_thread::sleep_for(std::chrono::milliseconds(1000 / 50));
}
class CApplication : public scy::Application
{
public:
CApplication() : scy::Application(), m_uvIdleCallback(nullptr), m_bUseSSL(false)
{}
void start()
{
run();
if (m_uvIdleCallback)
uv_idle_start(&m_uvIdle, m_uvIdleCallback);
if (m_uvIdleCallback2)
uv_idle_start(&m_uvIdle2, m_uvIdleCallback2);
}
void stop()
{
scy::Application::stop();
uv_idle_stop(&m_uvIdle);
if (m_bUseSSL)
scy::net::SSLManager::instance().shutdown();
}
void bindIdleEvent(uv_idle_cb cb)
{
m_uvIdleCallback = cb;
uv_idle_init(loop, &m_uvIdle);
}
void bindIdleEvent2(uv_idle_cb cb)
{
m_uvIdleCallback2 = cb;
uv_idle_init(loop, &m_uvIdle2);
}
void initSSL(const std::string& privateKeyFile = "", const std::string& certificateFile = "")
{
scy::net::SSLManager::instance().initNoVerifyServer(privateKeyFile, certificateFile);
m_bUseSSL = true;
}
private:
uv_idle_t m_uvIdle;
uv_idle_t m_uvIdle2;
uv_idle_cb m_uvIdleCallback;
uv_idle_cb m_uvIdleCallback2;
bool m_bUseSSL;
};
int main()
{
CApplication app;
app.bindIdleEvent(idle_cb);
app.bindIdleEvent2(idle_cb2);
app.initSSL();
app.start();
TCPServers srvs;
srvs.ssl.start("127.0.0.1", 9000);
app.waitForShutdown([&](void*) {
srvs.ssl.shutdown();
});
app.stop();
system("PAUSE");
return 0;
}
Thanks in advance if anyone can help out.
Solved the problem by using uv_timer_t and uv_timer_cb (Hadn't digged into libuv's doc yet). CPU usage went down drastically and nothing gets blocked.

EnterCriticalSection Deadlocking

I found some code that claimed to be able to make a thread sleep for an accurate amount of time. Testing the code out, it seems to work great, however it always deadlocks after a short amount of time.
Here is the original code. I put prints before entering and leaving the critical section, and saw that sometimes it leaves or enters twice in a row. It seems to deadlock at the EnterCriticalSection call within the Wait function.
Is there a way I can modify this code to retain its functionality while not deadlocking?
//----------------------------------------------------------------
class PreciseTimer
{
public:
PreciseTimer() : mRes(0), toLeave(false), stopCounter(-1)
{
InitializeCriticalSection(&crit);
mRes = timeSetEvent(1, 0, &TimerProc, (DWORD)this,
TIME_PERIODIC);
}
virtual ~PreciseTimer()
{
mRes = timeKillEvent(mRes);
DeleteCriticalSection(&crit);
}
///////////////////////////////////////////////////////////////
// Function name : Wait
// Description : Waits for the required duration of msecs.
// : Timer resolution is precisely 1 msec
// Return type : void :
// Argument : int timeout : timeout in msecs
///////////////////////////////////////////////////////////////
void Wait(int timeout)
{
if ( timeout )
{
stopCounter = timeout;
toLeave = true;
// this will do the actual delay - timer callback shares
// same crit section
EnterCriticalSection(&crit);
LeaveCriticalSection(&crit);
}
}
///////////////////////////////////////////////////////////////
// Function name : TimerProc
// Description : Timer callback procedure that is called
// : every 1msec
// : by high resolution media timers
// Return type : void CALLBACK :
// Argument : UINT uiID :
// Argument : UINT uiMsg :
// Argument : DWORD dwUser :
// Argument : DWORD dw1 :
// Argument : DWORD dw2 :
///////////////////////////////////////////////////////////////
static void CALLBACK TimerProc(UINT uiID, UINT uiMsg, DWORD
dwUser, DWORD dw1, DWORD dw2)
{
static volatile bool entered = false;
PreciseTimer* pThis = (PreciseTimer*)dwUser;
if ( pThis )
{
if ( !entered && !pThis->toLeave ) // block section as
// soon as we can
{
entered = true;
EnterCriticalSection(&pThis->crit);
}
else if ( pThis->toLeave && pThis->stopCounter == 0 )
// leave section
// when counter
// has expired
{
pThis->toLeave = false;
entered = false;
LeaveCriticalSection(&pThis->crit);
}
else if ( pThis->stopCounter > 0 ) // if counter is set
// to anything, then
// continue to drop
// it...
--pThis->stopCounter;
}
}
private:
MMRESULT mRes;
CRITICAL_SECTION crit;
volatile bool toLeave;
volatile int stopCounter;
};
A deadlock in EnterCriticalSection() usually means that another thread called EnterCriticalSection() but never called LeaveCriticalSection().
As shown, this code is not very thread-safe (and timeSetEvent() is a threaded timer). If multiple PreciseTimer timers are running at the same time, they are using the same TimerProc() callback, and thus are sharing the same entered variable without protecting it from concurrent access. And if multiple threads call Wait() on the same PreciseTimer object at the same time, they are going to step over each other's use of the stopCounter and toLeave members, which are also not protected them from concurrent access. Even a single thread calling Wait() on a single PreciseTimer is not safe since TimerProc() runs in its own thread and stopCounter is not adequately protected.
This code is full of race conditions.

Attempting asynchronous I/O with Win32 threads

I'm writing a serial port software for Windows. To improve performance I'm trying to convert the routines to use asynchronous I/O. I have the code up and working fairly well, but I'm a semi-beginner at this, and I would like to improve the performance of the program further. During stress tests of the program (ie burst data to/from the port as fast as possible at high baudrate), the CPU load gets quite high.
If anyone out there has experience from asynchronous I/O and multi-threading in Windows, I'd be grateful if you could take a look at my program. I have two main concerns:
Is the asynchronous I/O implemented correctly? I found some fairly reliable source on the net suggesting that you can pass user data to the callback functions, by implementing your own OVERLAPPED struct with your own data at the end. This seems to be working just fine, but it does look a bit "hackish" to me. Also, the program's performance didn't improve all that much when I converted from synchronous/polled to asynchronous/callback, making me suspect I'm doing something wrong.
Is it sane to use STL std::deque for the FIFO data buffers? As the program is currently written, I only allow 1 byte of data to be received at a time, before it must be processed. Because I don't know how much data I will receive, it could be endless amounts. I assume this 1-byte-at-a-time will yield sluggish behaviour behind the lines of deque when it has to allocate data. And I don't trust deque to be thread-safe either (should I?).
If using STL deque isn't sane, are there any suggestions for a better data type to use? Static array-based circular ring buffer?
Any other feedback on the code is most welcome as well.
The serial routines are implemented so that I have a parent class called "Comport", which handles everything serial I/O related. From this class I inherit another class called "ThreadedComport", which is a multi-threaded version.
ThreadedComport class (relevant parts of it)
class ThreadedComport : public Comport
{
private:
HANDLE _hthread_port; /* thread handle */
HANDLE _hmutex_port; /* COM port access */
HANDLE _hmutex_send; /* send buffer access */
HANDLE _hmutex_rec; /* rec buffer access */
deque<uint8> _send_buf;
deque<uint8> _rec_buf;
uint16 _data_sent;
uint16 _data_received;
HANDLE _hevent_kill_thread;
HANDLE _hevent_open;
HANDLE _hevent_close;
HANDLE _hevent_write_done;
HANDLE _hevent_read_done;
HANDLE _hevent_ext_send; /* notifies external thread */
HANDLE _hevent_ext_receive; /* notifies external thread */
typedef struct
{
OVERLAPPED overlapped;
ThreadedComport* caller; /* add user data to struct */
} OVERLAPPED_overlap;
OVERLAPPED_overlap _send_overlapped;
OVERLAPPED_overlap _rec_overlapped;
uint8* _write_data;
uint8 _read_data;
DWORD _bytes_read;
static DWORD WINAPI _tranceiver_thread (LPVOID param);
void _send_data (void);
void _receive_data (void);
DWORD _wait_for_io (void);
static void WINAPI _send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
static void WINAPI _receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
};
The main thread routine created through CreateThread():
DWORD WINAPI ThreadedComport::_tranceiver_thread (LPVOID param)
{
ThreadedComport* caller = (ThreadedComport*) param;
HANDLE handle_array [3] =
{
caller->_hevent_kill_thread, /* WAIT_OBJECT_0 */
caller->_hevent_open, /* WAIT_OBJECT_1 */
caller->_hevent_close /* WAIT_OBJECT_2 */
};
DWORD result;
do
{
/* wait for anything to happen */
result = WaitForMultipleObjects(3,
handle_array,
false, /* dont wait for all */
INFINITE);
if(result == WAIT_OBJECT_1 ) /* open? */
{
do /* while port is open, work */
{
caller->_send_data();
caller->_receive_data();
result = caller->_wait_for_io(); /* will wait for the same 3 as in handle_array above,
plus all read/write specific events */
} while (result != WAIT_OBJECT_0 && /* while not kill thread */
result != WAIT_OBJECT_2); /* while not close port */
}
else if(result == WAIT_OBJECT_2) /* close? */
{
; /* do nothing */
}
} while (result != WAIT_OBJECT_0); /* kill thread? */
return 0;
}
which in turn calls the following three functions:
void ThreadedComport::_send_data (void)
{
uint32 send_buf_size;
if(_send_buf.size() != 0) // anything to send?
{
WaitForSingleObject(_hmutex_port, INFINITE);
if(_is_open) // double-check port
{
bool result;
WaitForSingleObject(_hmutex_send, INFINITE);
_data_sent = 0;
send_buf_size = _send_buf.size();
if(send_buf_size > (uint32)_MAX_MESSAGE_LENGTH)
{
send_buf_size = _MAX_MESSAGE_LENGTH;
}
_write_data = new uint8 [send_buf_size];
for(uint32 i=0; i<send_buf_size; i++)
{
_write_data[i] = _send_buf.front();
_send_buf.pop_front();
}
_send_buf.clear();
ReleaseMutex(_hmutex_send);
result = WriteFileEx (_hcom, // handle to output file
(void*)_write_data, // pointer to input buffer
send_buf_size, // number of bytes to write
(LPOVERLAPPED)&_send_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_send_callback);
SleepEx(INFINITE, true); // Allow callback to come
if(result == false)
{
// error handling here
}
} // if(_is_open)
ReleaseMutex(_hmutex_port);
}
else /* nothing to send */
{
SetEvent(_hevent_write_done); // Skip write
}
}
void ThreadedComport::_receive_data (void)
{
WaitForSingleObject(_hmutex_port, INFINITE);
if(_is_open)
{
BOOL result;
_bytes_read = 0;
result = ReadFileEx (_hcom, // handle to output file
(void*)&_read_data, // pointer to input buffer
1, // number of bytes to read
(OVERLAPPED*)&_rec_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_receive_callback);
SleepEx(INFINITE, true); // Allow callback to come
if(result == FALSE)
{
DWORD last_error = GetLastError();
if(last_error == ERROR_OPERATION_ABORTED) // disconnected ?
{
close(); // close the port
}
}
}
ReleaseMutex(_hmutex_port);
}
DWORD ThreadedComport::_wait_for_io (void)
{
DWORD result;
bool is_write_done = false;
bool is_read_done = false;
HANDLE handle_array [5] =
{
_hevent_kill_thread,
_hevent_open,
_hevent_close,
_hevent_write_done,
_hevent_read_done
};
do /* COM port message pump running until sending / receiving is done */
{
result = WaitForMultipleObjects(5,
handle_array,
false, /* dont wait for all */
INFINITE);
if(result <= WAIT_OBJECT_2)
{
break; /* abort */
}
else if(result == WAIT_OBJECT_3) /* write done */
{
is_write_done = true;
SetEvent(_hevent_ext_send);
}
else if(result == WAIT_OBJECT_4) /* read done */
{
is_read_done = true;
if(_bytes_read > 0)
{
uint32 errors = 0;
WaitForSingleObject(_hmutex_rec, INFINITE);
_rec_buf.push_back((uint8)_read_data);
_data_received += _bytes_read;
while((uint16)_rec_buf.size() > _MAX_MESSAGE_LENGTH)
{
_rec_buf.pop_front();
}
ReleaseMutex(_hmutex_rec);
_bytes_read = 0;
ClearCommError(_hcom, &errors, NULL);
SetEvent(_hevent_ext_receive);
}
}
} while(!is_write_done || !is_read_done);
return result;
}
Asynchronous I/O callback functions:
void WINAPI ThreadedComport::_send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
_this->_data_sent = dwNumberOfBytesTransfered;
}
}
delete [] _this->_write_data; /* always clean this up */
SetEvent(lpOverlapped->hEvent);
}
void WINAPI ThreadedComport::_receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
_this->_bytes_read = dwNumberOfBytesTransfered;
}
}
SetEvent(lpOverlapped->hEvent);
}
The first question is simple. The method is not hackish; you own the OVERLAPPED memory and everything that follows it. This is best described by Raymond Chen: http://blogs.msdn.com/b/oldnewthing/archive/2010/12/17/10106259.aspx
You would only expect a performance improvement if you've got better things to while waiting for the I/O to complete. If all you do is SleepEx, you'll only see CPU% go down. The clue is in the name "overlapped" - it allows you to overlap calculations and I/O.
std::deque<unsigned char> can handle FIFO data without big problems. It will probably recycle 4KB chunks (precise number determined by extensive profiling, all done for you).
[edit]
I've looked into your code a bit further, and it seems the code is needlessly complex. For starters, one of the main benefits of asynchronous I/O is that you don't need all that thread stuff. Threads allow you to use more cores, but you're dealing with a slow I/O device. Even a single core is sufficient, if it doesn't spend all its time waiting. And that's precisely what overlapped I/O is for. You just dedicate one thread to all I/O work for the port. Since it's the only thread, it doesn't need a mutex to access that port.
OTOH, you would want a mutex around the deque<uint8> objects since the producer/consumer threads aren't the same as the comport thread.
I don't see any reason for using asynchronous I/O in a project like this. Asynchronous I/O is good when you're handling a large number of sockets or have work to do while waiting for data, but as far as I can tell, you're only dealing with a single socket and not doing any work in between.
Also, just for the sake of knowledge, you would normally use an I/O completion port to handle your asynchronous I/O. I'm not sure if there are any situations where using an I/O completion port has a negative impact on performance.
But yes, your asynchronous I/O usage looks okay. Implementing your own OVERLAPPED struct does look like a hack, but it is correct; there's no other way to associate your own data with the completion.
Boost also has a circular buffer implementation, though I'm not sure if it's thread safe. None of the standard library containers are thread safe, though.
I think that your code has suboptimal design.
You are sharing too many data structures with too many threads, I guess. I think that you should put all handling of the serial device IO for one port into a single thread and put a synchronized command/data queue between the IO thread and all client threads. Have the IO thread watch out for commands/data in the queue.
You seem to be allocating and freeing some buffers for each sent event. Avoid that. If you keep all the IO in a single thread, you can reuse a single buffer. You are limiting the size of the message anyway, you can just pre-allocate a single big enough buffer.
Putting the bytes that you want to send into a std::deque is suboptimal. You have to serialize them into a continuous memory block for the WriteFile(). Instead, if you use some sort of commdand/data queue between one IO thread and other threads, you can have the client threads provide the continuous chunk of memory at once.
Reading 1 byte at a time seem silly, too. Unless it does not work for serial devices, you could provide large enough buffer to ReadFileEx(). It returns how many bytes it has actually managed to read. It should not block, AFAIK, unless of course I am wrong.
You are waiting for the overlapped IO to finish using the SleepEx() invocation. What is the point of the overlapped IO then if you are just ending up being synchronous?