Attempting asynchronous I/O with Win32 threads - c++

I'm writing a serial port software for Windows. To improve performance I'm trying to convert the routines to use asynchronous I/O. I have the code up and working fairly well, but I'm a semi-beginner at this, and I would like to improve the performance of the program further. During stress tests of the program (ie burst data to/from the port as fast as possible at high baudrate), the CPU load gets quite high.
If anyone out there has experience from asynchronous I/O and multi-threading in Windows, I'd be grateful if you could take a look at my program. I have two main concerns:
Is the asynchronous I/O implemented correctly? I found some fairly reliable source on the net suggesting that you can pass user data to the callback functions, by implementing your own OVERLAPPED struct with your own data at the end. This seems to be working just fine, but it does look a bit "hackish" to me. Also, the program's performance didn't improve all that much when I converted from synchronous/polled to asynchronous/callback, making me suspect I'm doing something wrong.
Is it sane to use STL std::deque for the FIFO data buffers? As the program is currently written, I only allow 1 byte of data to be received at a time, before it must be processed. Because I don't know how much data I will receive, it could be endless amounts. I assume this 1-byte-at-a-time will yield sluggish behaviour behind the lines of deque when it has to allocate data. And I don't trust deque to be thread-safe either (should I?).
If using STL deque isn't sane, are there any suggestions for a better data type to use? Static array-based circular ring buffer?
Any other feedback on the code is most welcome as well.
The serial routines are implemented so that I have a parent class called "Comport", which handles everything serial I/O related. From this class I inherit another class called "ThreadedComport", which is a multi-threaded version.
ThreadedComport class (relevant parts of it)
class ThreadedComport : public Comport
{
private:
HANDLE _hthread_port; /* thread handle */
HANDLE _hmutex_port; /* COM port access */
HANDLE _hmutex_send; /* send buffer access */
HANDLE _hmutex_rec; /* rec buffer access */
deque<uint8> _send_buf;
deque<uint8> _rec_buf;
uint16 _data_sent;
uint16 _data_received;
HANDLE _hevent_kill_thread;
HANDLE _hevent_open;
HANDLE _hevent_close;
HANDLE _hevent_write_done;
HANDLE _hevent_read_done;
HANDLE _hevent_ext_send; /* notifies external thread */
HANDLE _hevent_ext_receive; /* notifies external thread */
typedef struct
{
OVERLAPPED overlapped;
ThreadedComport* caller; /* add user data to struct */
} OVERLAPPED_overlap;
OVERLAPPED_overlap _send_overlapped;
OVERLAPPED_overlap _rec_overlapped;
uint8* _write_data;
uint8 _read_data;
DWORD _bytes_read;
static DWORD WINAPI _tranceiver_thread (LPVOID param);
void _send_data (void);
void _receive_data (void);
DWORD _wait_for_io (void);
static void WINAPI _send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
static void WINAPI _receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
};
The main thread routine created through CreateThread():
DWORD WINAPI ThreadedComport::_tranceiver_thread (LPVOID param)
{
ThreadedComport* caller = (ThreadedComport*) param;
HANDLE handle_array [3] =
{
caller->_hevent_kill_thread, /* WAIT_OBJECT_0 */
caller->_hevent_open, /* WAIT_OBJECT_1 */
caller->_hevent_close /* WAIT_OBJECT_2 */
};
DWORD result;
do
{
/* wait for anything to happen */
result = WaitForMultipleObjects(3,
handle_array,
false, /* dont wait for all */
INFINITE);
if(result == WAIT_OBJECT_1 ) /* open? */
{
do /* while port is open, work */
{
caller->_send_data();
caller->_receive_data();
result = caller->_wait_for_io(); /* will wait for the same 3 as in handle_array above,
plus all read/write specific events */
} while (result != WAIT_OBJECT_0 && /* while not kill thread */
result != WAIT_OBJECT_2); /* while not close port */
}
else if(result == WAIT_OBJECT_2) /* close? */
{
; /* do nothing */
}
} while (result != WAIT_OBJECT_0); /* kill thread? */
return 0;
}
which in turn calls the following three functions:
void ThreadedComport::_send_data (void)
{
uint32 send_buf_size;
if(_send_buf.size() != 0) // anything to send?
{
WaitForSingleObject(_hmutex_port, INFINITE);
if(_is_open) // double-check port
{
bool result;
WaitForSingleObject(_hmutex_send, INFINITE);
_data_sent = 0;
send_buf_size = _send_buf.size();
if(send_buf_size > (uint32)_MAX_MESSAGE_LENGTH)
{
send_buf_size = _MAX_MESSAGE_LENGTH;
}
_write_data = new uint8 [send_buf_size];
for(uint32 i=0; i<send_buf_size; i++)
{
_write_data[i] = _send_buf.front();
_send_buf.pop_front();
}
_send_buf.clear();
ReleaseMutex(_hmutex_send);
result = WriteFileEx (_hcom, // handle to output file
(void*)_write_data, // pointer to input buffer
send_buf_size, // number of bytes to write
(LPOVERLAPPED)&_send_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_send_callback);
SleepEx(INFINITE, true); // Allow callback to come
if(result == false)
{
// error handling here
}
} // if(_is_open)
ReleaseMutex(_hmutex_port);
}
else /* nothing to send */
{
SetEvent(_hevent_write_done); // Skip write
}
}
void ThreadedComport::_receive_data (void)
{
WaitForSingleObject(_hmutex_port, INFINITE);
if(_is_open)
{
BOOL result;
_bytes_read = 0;
result = ReadFileEx (_hcom, // handle to output file
(void*)&_read_data, // pointer to input buffer
1, // number of bytes to read
(OVERLAPPED*)&_rec_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_receive_callback);
SleepEx(INFINITE, true); // Allow callback to come
if(result == FALSE)
{
DWORD last_error = GetLastError();
if(last_error == ERROR_OPERATION_ABORTED) // disconnected ?
{
close(); // close the port
}
}
}
ReleaseMutex(_hmutex_port);
}
DWORD ThreadedComport::_wait_for_io (void)
{
DWORD result;
bool is_write_done = false;
bool is_read_done = false;
HANDLE handle_array [5] =
{
_hevent_kill_thread,
_hevent_open,
_hevent_close,
_hevent_write_done,
_hevent_read_done
};
do /* COM port message pump running until sending / receiving is done */
{
result = WaitForMultipleObjects(5,
handle_array,
false, /* dont wait for all */
INFINITE);
if(result <= WAIT_OBJECT_2)
{
break; /* abort */
}
else if(result == WAIT_OBJECT_3) /* write done */
{
is_write_done = true;
SetEvent(_hevent_ext_send);
}
else if(result == WAIT_OBJECT_4) /* read done */
{
is_read_done = true;
if(_bytes_read > 0)
{
uint32 errors = 0;
WaitForSingleObject(_hmutex_rec, INFINITE);
_rec_buf.push_back((uint8)_read_data);
_data_received += _bytes_read;
while((uint16)_rec_buf.size() > _MAX_MESSAGE_LENGTH)
{
_rec_buf.pop_front();
}
ReleaseMutex(_hmutex_rec);
_bytes_read = 0;
ClearCommError(_hcom, &errors, NULL);
SetEvent(_hevent_ext_receive);
}
}
} while(!is_write_done || !is_read_done);
return result;
}
Asynchronous I/O callback functions:
void WINAPI ThreadedComport::_send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
_this->_data_sent = dwNumberOfBytesTransfered;
}
}
delete [] _this->_write_data; /* always clean this up */
SetEvent(lpOverlapped->hEvent);
}
void WINAPI ThreadedComport::_receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
_this->_bytes_read = dwNumberOfBytesTransfered;
}
}
SetEvent(lpOverlapped->hEvent);
}

The first question is simple. The method is not hackish; you own the OVERLAPPED memory and everything that follows it. This is best described by Raymond Chen: http://blogs.msdn.com/b/oldnewthing/archive/2010/12/17/10106259.aspx
You would only expect a performance improvement if you've got better things to while waiting for the I/O to complete. If all you do is SleepEx, you'll only see CPU% go down. The clue is in the name "overlapped" - it allows you to overlap calculations and I/O.
std::deque<unsigned char> can handle FIFO data without big problems. It will probably recycle 4KB chunks (precise number determined by extensive profiling, all done for you).
[edit]
I've looked into your code a bit further, and it seems the code is needlessly complex. For starters, one of the main benefits of asynchronous I/O is that you don't need all that thread stuff. Threads allow you to use more cores, but you're dealing with a slow I/O device. Even a single core is sufficient, if it doesn't spend all its time waiting. And that's precisely what overlapped I/O is for. You just dedicate one thread to all I/O work for the port. Since it's the only thread, it doesn't need a mutex to access that port.
OTOH, you would want a mutex around the deque<uint8> objects since the producer/consumer threads aren't the same as the comport thread.

I don't see any reason for using asynchronous I/O in a project like this. Asynchronous I/O is good when you're handling a large number of sockets or have work to do while waiting for data, but as far as I can tell, you're only dealing with a single socket and not doing any work in between.
Also, just for the sake of knowledge, you would normally use an I/O completion port to handle your asynchronous I/O. I'm not sure if there are any situations where using an I/O completion port has a negative impact on performance.
But yes, your asynchronous I/O usage looks okay. Implementing your own OVERLAPPED struct does look like a hack, but it is correct; there's no other way to associate your own data with the completion.
Boost also has a circular buffer implementation, though I'm not sure if it's thread safe. None of the standard library containers are thread safe, though.

I think that your code has suboptimal design.
You are sharing too many data structures with too many threads, I guess. I think that you should put all handling of the serial device IO for one port into a single thread and put a synchronized command/data queue between the IO thread and all client threads. Have the IO thread watch out for commands/data in the queue.
You seem to be allocating and freeing some buffers for each sent event. Avoid that. If you keep all the IO in a single thread, you can reuse a single buffer. You are limiting the size of the message anyway, you can just pre-allocate a single big enough buffer.
Putting the bytes that you want to send into a std::deque is suboptimal. You have to serialize them into a continuous memory block for the WriteFile(). Instead, if you use some sort of commdand/data queue between one IO thread and other threads, you can have the client threads provide the continuous chunk of memory at once.
Reading 1 byte at a time seem silly, too. Unless it does not work for serial devices, you could provide large enough buffer to ReadFileEx(). It returns how many bytes it has actually managed to read. It should not block, AFAIK, unless of course I am wrong.
You are waiting for the overlapped IO to finish using the SleepEx() invocation. What is the point of the overlapped IO then if you are just ending up being synchronous?

Related

Is there any way to run two tasks asynchronously in one thread?

I'm working on a software product that runs intensive operations on the main thread. Running them on a separate thread is not supported by design and won't be changed.
At the same time we need to handle mouse movements coming from UI. In one case mouse cursor freezes because the main thread is being busy with computations.
Seems a good case for introducing asynchronous operation: run computations asynchronously in a separate thread while main thread is still handling mouse movements. But as I said before it is not supported in the current design.
Recently I came across an idea to run two tasks asynchronously in one thread. Meaning that thread context is switched between two tasks and each task is partially executed for a quantum of time until each of them gets finished.
Is this possible in C++? The version of the language (11 or 14) does not matter.
The software uses WinApi and standard message queue to receive mouse events.
Tried to look at Microsoft PPL but from my understanding the lib does not help in this case.
Thanks everyone for help.
What you are looking for is cooperative multi-tasking. This is possible on a single thread. You can take a look at coroutines, e.g. in boost or the standard library (since C++20).
You can also roll your own, stripped down version. The key ingerdients are:
Each task needs to store its context (e.g. parameters) itself
Each task needs a way to suspend and resume operations. It decides on its own when to suspend.
You might need some form of scheduler that keeps track of all the tasks and run them frequently. You might want to design it in a way that the GUI main loop calls into your scheduler which runs for approximately 30-50 ms at most by passing the available time budget to each of the tasks it keeps track of.
This is quite feasible if threads are not an option at all.
Boost.Coroutine, Boost.Context, and Boost.Asio all support single thread concurrency at some level or another. Coroutines are cooperative, reentrant, interruptible, resumable functions. Context is user land context switching. Asio executors can schedule many different tasks to run on one thread. For your case, I think you can take your pick as to what you're comfortable putting into your application.
EDIT
Boost.Fiber implements mini thread-like "fibers" on top of the Context library.
Here is how I would implement my own run to completion cooperative multitasking:
enum class eStep
{
START,
STEP1,
STEP2,
DONE
};
struct sLongFuncContext
{
//whatver is meaning full to go from one step to the next
};
eStep long_func_split_in_steps(eStep aStep,sLongFuncContext &aContext)
{
eStep next;
switch (aStep)
{
case eStep::START:
// execute first part of func, save context
next = eStep::STEP1;
break;
case eStep::STEP1:
// execute 2nd part of func, save context
next = eStep::STEP2;
break;
case eStep::STEP2:
next = eStep::DONE;
break;
// repeat
};
return (next);
}
int main()
{
eStep step = eStep::START;
sLongFuncContext context;
while (step != eStep::DONE)
{
// do a part of the long function
step = long_func_split_in_steps(step,context);
// handle mouse events
// ...
}
return 0;
}
Since you are targeting windows but doesn't have access to c++ 20 coroutines (using old compiler) you can use winapi Fibers which is like heavy coroutines .
It's documented here :
Fibers Win32 apps
And this is an example of using it :
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
VOID
__stdcall
ReadFiberFunc(LPVOID lpParameter);
VOID
__stdcall
WriteFiberFunc(LPVOID lpParameter);
void DisplayFiberInfo(void);
typedef struct
{
DWORD dwParameter; // DWORD parameter to fiber (unused)
DWORD dwFiberResultCode; // GetLastError() result code
HANDLE hFile; // handle to operate on
DWORD dwBytesProcessed; // number of bytes processed
} FIBERDATASTRUCT, *PFIBERDATASTRUCT, *LPFIBERDATASTRUCT;
#define RTN_OK 0
#define RTN_USAGE 1
#define RTN_ERROR 13
#define BUFFER_SIZE 32768 // read/write buffer size
#define FIBER_COUNT 3 // max fibers (including primary)
#define PRIMARY_FIBER 0 // array index to primary fiber
#define READ_FIBER 1 // array index to read fiber
#define WRITE_FIBER 2 // array index to write fiber
LPVOID g_lpFiber[FIBER_COUNT];
LPBYTE g_lpBuffer;
DWORD g_dwBytesRead;
int __cdecl _tmain(int argc, TCHAR *argv[])
{
LPFIBERDATASTRUCT fs;
if (argc != 3)
{
printf("Usage: %s <SourceFile> <DestinationFile>\n", argv[0]);
return RTN_USAGE;
}
//
// Allocate storage for our fiber data structures
//
fs = (LPFIBERDATASTRUCT) HeapAlloc(
GetProcessHeap(), 0,
sizeof(FIBERDATASTRUCT) * FIBER_COUNT);
if (fs == NULL)
{
printf("HeapAlloc error (%d)\n", GetLastError());
return RTN_ERROR;
}
//
// Allocate storage for the read/write buffer
//
g_lpBuffer = (LPBYTE)HeapAlloc(GetProcessHeap(), 0, BUFFER_SIZE);
if (g_lpBuffer == NULL)
{
printf("HeapAlloc error (%d)\n", GetLastError());
return RTN_ERROR;
}
//
// Open the source file
//
fs[READ_FIBER].hFile = CreateFile(
argv[1],
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_FLAG_SEQUENTIAL_SCAN,
NULL
);
if (fs[READ_FIBER].hFile == INVALID_HANDLE_VALUE)
{
printf("CreateFile error (%d)\n", GetLastError());
return RTN_ERROR;
}
//
// Open the destination file
//
fs[WRITE_FIBER].hFile = CreateFile(
argv[2],
GENERIC_WRITE,
0,
NULL,
CREATE_NEW,
FILE_FLAG_SEQUENTIAL_SCAN,
NULL
);
if (fs[WRITE_FIBER].hFile == INVALID_HANDLE_VALUE)
{
printf("CreateFile error (%d)\n", GetLastError());
return RTN_ERROR;
}
//
// Convert thread to a fiber, to allow scheduling other fibers
//
g_lpFiber[PRIMARY_FIBER]=ConvertThreadToFiber(&fs[PRIMARY_FIBER]);
if (g_lpFiber[PRIMARY_FIBER] == NULL)
{
printf("ConvertThreadToFiber error (%d)\n", GetLastError());
return RTN_ERROR;
}
//
// Initialize the primary fiber data structure. We don't use
// the primary fiber data structure for anything in this sample.
//
fs[PRIMARY_FIBER].dwParameter = 0;
fs[PRIMARY_FIBER].dwFiberResultCode = 0;
fs[PRIMARY_FIBER].hFile = INVALID_HANDLE_VALUE;
//
// Create the Read fiber
//
g_lpFiber[READ_FIBER]=CreateFiber(0,ReadFiberFunc,&fs[READ_FIBER]);
if (g_lpFiber[READ_FIBER] == NULL)
{
printf("CreateFiber error (%d)\n", GetLastError());
return RTN_ERROR;
}
fs[READ_FIBER].dwParameter = 0x12345678;
//
// Create the Write fiber
//
g_lpFiber[WRITE_FIBER]=CreateFiber(0,WriteFiberFunc,&fs[WRITE_FIBER]);
if (g_lpFiber[WRITE_FIBER] == NULL)
{
printf("CreateFiber error (%d)\n", GetLastError());
return RTN_ERROR;
}
fs[WRITE_FIBER].dwParameter = 0x54545454;
//
// Switch to the read fiber
//
SwitchToFiber(g_lpFiber[READ_FIBER]);
//
// We have been scheduled again. Display results from the
// read/write fibers
//
printf("ReadFiber: result code is %lu, %lu bytes processed\n",
fs[READ_FIBER].dwFiberResultCode, fs[READ_FIBER].dwBytesProcessed);
printf("WriteFiber: result code is %lu, %lu bytes processed\n",
fs[WRITE_FIBER].dwFiberResultCode, fs[WRITE_FIBER].dwBytesProcessed);
//
// Delete the fibers
//
DeleteFiber(g_lpFiber[READ_FIBER]);
DeleteFiber(g_lpFiber[WRITE_FIBER]);
//
// Close handles
//
CloseHandle(fs[READ_FIBER].hFile);
CloseHandle(fs[WRITE_FIBER].hFile);
//
// Free allocated memory
//
HeapFree(GetProcessHeap(), 0, g_lpBuffer);
HeapFree(GetProcessHeap(), 0, fs);
return RTN_OK;
}
VOID
__stdcall
ReadFiberFunc(
LPVOID lpParameter
)
{
LPFIBERDATASTRUCT fds = (LPFIBERDATASTRUCT)lpParameter;
//
// If this fiber was passed NULL for fiber data, just return,
// causing the current thread to exit
//
if (fds == NULL)
{
printf("Passed NULL fiber data; exiting current thread.\n");
return;
}
//
// Display some information pertaining to the current fiber
//
DisplayFiberInfo();
fds->dwBytesProcessed = 0;
while (1)
{
//
// Read data from file specified in the READ_FIBER structure
//
if (!ReadFile(fds->hFile, g_lpBuffer, BUFFER_SIZE,
&g_dwBytesRead, NULL))
{
break;
}
//
// if we reached EOF, break
//
if (g_dwBytesRead == 0) break;
//
// Update number of bytes processed in the fiber data structure
//
fds->dwBytesProcessed += g_dwBytesRead;
//
// Switch to the write fiber
//
SwitchToFiber(g_lpFiber[WRITE_FIBER]);
} // while
//
// Update the fiber result code
//
fds->dwFiberResultCode = GetLastError();
//
// Switch back to the primary fiber
//
SwitchToFiber(g_lpFiber[PRIMARY_FIBER]);
}
VOID
__stdcall
WriteFiberFunc(
LPVOID lpParameter
)
{
LPFIBERDATASTRUCT fds = (LPFIBERDATASTRUCT)lpParameter;
DWORD dwBytesWritten;
//
// If this fiber was passed NULL for fiber data, just return,
// causing the current thread to exit
//
if (fds == NULL)
{
printf("Passed NULL fiber data; exiting current thread.\n");
return;
}
//
// Display some information pertaining to the current fiber
//
DisplayFiberInfo();
//
// Assume all writes succeeded. If a write fails, the fiber
// result code will be updated to reflect the reason for failure
//
fds->dwBytesProcessed = 0;
fds->dwFiberResultCode = ERROR_SUCCESS;
while (1)
{
//
// Write data to the file specified in the WRITE_FIBER structure
//
if (!WriteFile(fds->hFile, g_lpBuffer, g_dwBytesRead,
&dwBytesWritten, NULL))
{
//
// If an error occurred writing, break
//
break;
}
//
// Update number of bytes processed in the fiber data structure
//
fds->dwBytesProcessed += dwBytesWritten;
//
// Switch back to the read fiber
//
SwitchToFiber(g_lpFiber[READ_FIBER]);
} // while
//
// If an error occurred, update the fiber result code...
//
fds->dwFiberResultCode = GetLastError();
//
// ...and switch to the primary fiber
//
SwitchToFiber(g_lpFiber[PRIMARY_FIBER]);
}
void
DisplayFiberInfo(
void
)
{
LPFIBERDATASTRUCT fds = (LPFIBERDATASTRUCT)GetFiberData();
LPVOID lpCurrentFiber = GetCurrentFiber();
//
// Determine which fiber is executing, based on the fiber address
//
if (lpCurrentFiber == g_lpFiber[READ_FIBER])
printf("Read fiber entered");
else
{
if (lpCurrentFiber == g_lpFiber[WRITE_FIBER])
printf("Write fiber entered");
else
{
if (lpCurrentFiber == g_lpFiber[PRIMARY_FIBER])
printf("Primary fiber entered");
else
printf("Unknown fiber entered");
}
}
//
// Display dwParameter from the current fiber data structure
//
printf(" (dwParameter is 0x%lx)\n", fds->dwParameter);
}
Given that you are using winapi and UI so you already have message processing I would suggest that you break up the problematic operation into more steps and use custom messages. Have each step in the problematic operation post the message that triggers the next step. Since this is something windows already handles (dealing with messages) it should fit much more neatly into what you already have than trying to use coroutines or windows fibers.
This will slow down overall processing of the problematic operation somewhat but will keep the UI responsive.
However I would also seriously consider abandoning the single-threaded approach. If your problematic operation simply takes input and produces an output shoving that operation onto a separate thread and dealing with the result when it comes (again via a posted message) is often a very reasonable solution.

Creating a dispatch queue / thread handler in C++ with pipes: FIFOs overfilling

Threads are resource-heavy to create and use, so often a pool of threads will be reused for asynchronous tasks. A task is packaged up, and then "posted" to a broker that will enqueue the task on the next available thread.
This is the idea behind dispatch queues (i.e. Apple's Grand Central Dispatch), and thread handlers (Android's Looper mechanism).
Right now, I'm trying to roll my own. In fact, I'm plugging a gap in Android whereby there is an API for posting tasks in Java, but not in the native NDK. However, I'm keeping this question platform independent where I can.
Pipes are the ideal choice for my scenario. I can easily poll the file descriptor of the read-end of a pipe(2) on my worker thread, and enqueue tasks from any other thread by writing to the write-end. Here's what that looks like:
int taskRead, taskWrite;
void setup() {
// Create the pipe
int taskPipe[2];
::pipe(taskPipe);
taskRead = taskPipe[0];
taskWrite = taskPipe[1];
// Set up a routine that is called when task_r reports new data
function_that_polls_file_descriptor(taskRead, []() {
// Read the callback data
std::function<void(void)>* taskPtr;
::read(taskRead, &taskPtr, sizeof(taskPtr));
// Run the task - this is unsafe! See below.
(*taskPtr)();
// Clean up
delete taskPtr;
});
}
void post(const std::function<void(void)>& task) {
// Copy the function onto the heap
auto* taskPtr = new std::function<void(void)>(task);
// Write the pointer to the pipe - this may block if the FIFO is full!
::write(taskWrite, &taskPtr, sizeof(taskPtr));
}
This code puts a std::function on the heap, and passes the pointer to the pipe. The function_that_polls_file_descriptor then calls the provided expression to read the pipe and execute the function. Note that there are no safety checks in this example.
This works great 99% of the time, but there is one major drawback. Pipes have a limited size, and if the pipe is filled, then calls to post() will hang. This in itself is not unsafe, until a call to post() is made within a task.
auto evil = []() {
// Post a new task back onto the queue
post({});
// Not enough new tasks, let's make more!
for (int i = 0; i < 3; i++) {
post({});
}
// Now for each time this task is posted, 4 more tasks will be added to the queue.
});
post(evil);
post(evil);
...
If this happens, then the worker thread will be blocked, waiting to write to the pipe. But the pipe's FIFO is full, and the worker thread is not reading anything from it, so the entire system is in deadlock.
What can be done to ensure that calls to post() eminating from the worker thread always succeed, allowing the worker to continue processing the queue in the event it is full?
Thanks to all the comments and other answers in this post, I now have a working solution to this problem.
The trick I've employed is to prioritise worker threads by checking which thread is calling post(). Here is the rough algorithm:
pipe ← NON-BLOCKING-PIPE()
overflow ← Ø
POST(task)
success ← WRITE(task, pipe)
IF NOT success THEN
IF THREAD-IS-WORKER() THEN
overflow ← overflow ∪ {task}
ELSE
WAIT(pipe)
POST(task)
Then on the worker thread:
LOOP FOREVER
task ← READ(pipe)
RUN(task)
FOR EACH overtask ∈ overflow
RUN(overtask)
overflow ← Ø
The wait is performed with pselect(2), adapted from the answer by #Sigismondo.
Here's the algorithm implemented in my original code example that will work for a single worker thread (although I haven't tested it after copy-paste). It can be extended to work for a thread pool by having a separate overflow queue for each thread.
int taskRead, taskWrite;
// These variables are only allowed to be modified by the worker thread
std::__thread_id workerId;
std::queue<std::function<void(void)>> overflow;
bool overflowInUse;
void setup() {
int taskPipe[2];
::pipe(taskPipe);
taskRead = taskPipe[0];
taskWrite = taskPipe[1];
// Make the pipe non-blocking to check pipe overflows manually
::fcntl(taskWrite, F_SETFL, ::fcntl(taskWrite, F_GETFL, 0) | O_NONBLOCK);
// Save the ID of this worker thread to compare later
workerId = std::this_thread::get_id();
overflowInUse = false;
function_that_polls_file_descriptor(taskRead, []() {
// Read the callback data
std::function<void(void)>* taskPtr;
::read(taskRead, &taskPtr, sizeof(taskPtr));
// Run the task
(*taskPtr)();
delete taskPtr;
// Run any tasks that were posted to the overflow
while (!overflow.empty()) {
taskPtr = overflow.front();
overflow.pop();
(*taskPtr)();
delete taskPtr;
}
// Release the overflow mechanism if applicable
overflowInUse = false;
});
}
bool write(std::function<void(void)>* taskPtr, bool blocking = true) {
ssize_t rc = ::write(taskWrite, &taskPtr, sizeof(taskPtr));
// Failure handling
if (rc < 0) {
// If blocking is allowed, wait for pipe to become available
int err = errno;
if ((errno == EAGAIN || errno == EWOULDBLOCK) && blocking) {
fd_set fds;
FD_ZERO(&fds);
FD_SET(taskWrite, &fds);
::pselect(1, nullptr, &fds, nullptr, nullptr, nullptr);
// Try again
return write(tdata);
}
// Otherwise return false
return false;
}
return true;
}
void post(const std::function<void(void)>& task) {
auto* taskPtr = new std::function<void(void)>(task);
if (std::this_thread::get_id() == workerId) {
// The worker thread gets 1st-class treatment.
// It won't be blocked if the pipe is full, instead
// using an overflow queue until the overflow has been cleared.
if (!overflowInUse) {
bool success = write(taskPtr, false);
if (!success) {
overflow.push(taskPtr);
overflowInUse = true;
}
} else {
overflow.push(taskPtr);
}
} else {
write(taskPtr);
}
}
Make the pipe write file descriptor non-blocking, so that write fails with EAGAIN when the pipe is full.
One improvement is to increase the pipe buffer size.
Another is to use a UNIX socket/socketpair and increase the socket buffer size.
Yet another solution is to use a UNIX datagram socket which many worker threads can read from, but only one gets the next datagram. In other words, you can use a datagram socket as a thread dispatcher.
You can use the old good select to determine whether the file descriptors are ready to be used for writing:
The file descriptors in writefds will be watched to see if
space is available for write (though a large write may still block).
Since you are writing a pointer, your write() cannot be classified as large at all.
Clearly you must be ready to handle the fact that a post may fail, and then be ready to retry it later... otherwise you will be facing indefinitely growing pipes, until you system will break again.
More or less (not tested):
bool post(const std::function<void(void)>& task) {
bool post_res = false;
// Copy the function onto the heap
auto* taskPtr = new std::function<void(void)>(task);
fd_set wfds;
struct timeval tv;
int retval;
FD_ZERO(&wfds);
FD_SET(taskWrite, &wfds);
// Don't wait at all
tv.tv_sec = 0;
tv.tv_usec = 0;
retval = select(1, NULL, &wfds, NULL, &tv);
// select() returns 0 when no FD's are ready
if (retval == -1) {
// handle error condition
} else if (retval > 0) {
// Write the pointer to the pipe. This write will succeed
::write(taskWrite, &taskPtr, sizeof(taskPtr));
post_res = true;
}
return post_res;
}
If you only look at Android/Linux using a pipe is not start of the art but using a event file descriptor together with epoll is the way to go.

Exit an infinite looping thread elegantly

I keep running into this problem of trying to run a thread with the following properties:
runs in an infinite loop, checking some external resource, e.g. data from the network or a device,
gets updates from its resource promptly,
exits promptly when asked to,
uses the CPU efficiently.
First approach
One solution I have seen for this is something like the following:
void class::run()
{
while(!exit_flag)
{
if (resource_ready)
use_resource();
}
}
This satisfies points 1, 2 and 3, but being a busy waiting loop, uses 100% CPU.
Second approach
A potential fix for this is to put a sleep statement in:
void class::run()
{
while(!exit_flag)
{
if (resource_ready)
use_resource();
else
sleep(a_short_while);
}
}
We now don't hammer the CPU, so we address 1 and 4, but we could wait up to a_short_while unnecessarily when the resource is ready or we are asked to quit.
Third approach
A third option is to do a blocking read on the resource:
void class::run()
{
while(!exit_flag)
{
obtain_resource();
use_resource();
}
}
This will satisfy 1, 2, and 4 elegantly, but now we can't ask the thread to quit if the resource does not become available.
Question
The best approach seems to be the second one, with a short sleep, so long as the tradeoff between CPU usage and responsiveness can be achieved.
However, this still seems suboptimal, and inelegant to me. This seems like it would be a common problem to solve. Is there a more elegant way to solve it? Is there an approach which can address all four of those requirements?
This depends on the specifics of the resources the thread is accessing, but basically to do it efficiently with minimal latency, the resources need to provide an API for either doing an interruptible blocking wait.
On POSIX systems, you can use the select(2) or poll(2) system calls to do that, if the resources you're using are files or file descriptors (including sockets). To allow the wait to be preempted, you also create a dummy pipe which you can write to.
For example, here's how you might wait for a file descriptor or socket to become ready or for the code to be interrupted:
// Dummy pipe used for sending interrupt message
int interrupt_pipe[2];
int should_exit = 0;
void class::run()
{
// Set up the interrupt pipe
if (pipe(interrupt_pipe) != 0)
; // Handle error
int fd = ...; // File descriptor or socket etc.
while (!should_exit)
{
// Set up a file descriptor set with fd and the read end of the dummy
// pipe in it
fd_set fds;
FD_CLR(&fds);
FD_SET(fd, &fds);
FD_SET(interrupt_pipe[1], &fds);
int maxfd = max(fd, interrupt_pipe[1]);
// Wait until one of the file descriptors is ready to be read
int num_ready = select(maxfd + 1, &fds, NULL, NULL, NULL);
if (num_ready == -1)
; // Handle error
if (FD_ISSET(fd, &fds))
{
// fd can now be read/recv'ed from without blocking
read(fd, ...);
}
}
}
void class::interrupt()
{
should_exit = 1;
// Send a dummy message to the pipe to wake up the select() call
char msg = 0;
write(interrupt_pipe[0], &msg, 1);
}
class::~class()
{
// Clean up pipe etc.
close(interrupt_pipe[0]);
close(interrupt_pipe[1]);
}
If you're on Windows, the select() function still works for sockets, but only for sockets, so you should install use WaitForMultipleObjects to wait on a resource handle and an event handle. For example:
// Event used for sending interrupt message
HANDLE interrupt_event;
int should_exit = 0;
void class::run()
{
// Set up the interrupt event as an auto-reset event
interrupt_event = CreateEvent(NULL, FALSE, FALSE, NULL);
if (interrupt_event == NULL)
; // Handle error
HANDLE resource = ...; // File or resource handle etc.
while (!should_exit)
{
// Wait until one of the handles becomes signaled
HANDLE handles[2] = {resource, interrupt_event};
int which_ready = WaitForMultipleObjects(2, handles, FALSE, INFINITE);
if (which_ready == WAIT_FAILED)
; // Handle error
else if (which_ready == WAIT_OBJECT_0))
{
// resource can now be read from without blocking
ReadFile(resource, ...);
}
}
}
void class::interrupt()
{
// Signal the event to wake up the waiting thread
should_exit = 1;
SetEvent(interrupt_event);
}
class::~class()
{
// Clean up event etc.
CloseHandle(interrupt_event);
}
You get a efficient solution if your obtain_ressource() function supports a timeout value:
while(!exit_flag)
{
obtain_resource_with_timeout(a_short_while);
if (resource_ready)
use_resource();
}
This effectively combines the sleep() with the obtain_ressurce() call.
Check out the manpage for nanosleep:
If the nanosleep() function returns because it has been interrupted by a signal, the function returns a value of -1 and sets errno to indicate the interruption.
In other words, you can interrupt sleeping threads by sending a signal (the sleep manpage says something similar). This means you can use your 2nd approach, and use an interrupt to immediately wake the thread if it's sleeping.
Use the Gang of Four Observer Pattern:
http://home.comcast.net/~codewrangler/tech_info/patterns_code.html#Observer
Callback, don't block.
Self-Pipe trick can be used here.
http://cr.yp.to/docs/selfpipe.html
Assuming that you are reading the data from file descriptor.
Create a pipe and select() for readability on the pipe input as well as on the resource you are interested.
Then when data comes on resource, the thread wakes up and does the processing. Else it sleeps.
To terminate the thread send it a signal and in signal handler, write something on the pipe (I would say something which will never come from the resource you are interested in, something like NULL for illustrating the point). The select call returns and thread on reading the input knows that it got the poison pill and it is time to exit and calls pthread_exit().
EDIT: Better way will be just to see that the data came on the pipe and hence just exit rather than checking the value which came on that pipe.
The Win32 API uses more or less this approach:
someThreadLoop( ... )
{
MSG msg;
int retVal;
while( (retVal = ::GetMessage( &msg, TaskContext::winHandle_, 0, 0 )) > 0 )
{
::TranslateMessage( &msg );
::DispatchMessage( &msg );
}
}
GetMessage itself blocks until any type of message is received therefore not using any processing (refer). If a WM_QUIT is received, it returns false, exiting the thread function gracefully. This is a variant of the producer/consumer mentioned elsewhere.
You can use any variant of a producer/consumer, and the pattern is often similar. One could argue that one would want to split the responsibility concerning quitting and obtaining of a resource, but OTOH quitting could depend on obtaining a resource too (or could be regarded as one of the resources - but a special one). I would at least abstract the producer consumer pattern and have various implementations thereof.
Therefore:
AbstractConsumer:
void AbstractConsumer::threadHandler()
{
do
{
try
{
process( dequeNextCommand() );
}
catch( const base_except& ex )
{
log( ex );
if( ex.isCritical() ){ throw; }
//else we don't want loop to exit...
}
catch( const std::exception& ex )
{
log( ex );
throw;
}
}
while( !terminated() );
}
virtual void /*AbstractConsumer::*/process( std::unique_ptr<Command>&& command ) = 0;
//Note:
// Either may or may not block until resource arrives, but typically blocks on
// a queue that is signalled as soon as a resource is available.
virtual std::unique_ptr<Command> /*AbstractConsumer::*/dequeNextCommand() = 0;
virtual bool /*AbstractConsumer::*/terminated() const = 0;
I usually encapsulate command to execute a function in the context of the consumer, but the pattern in the consumer is always the same.
Any (welln at least, most) approaches mentioned above will do the following: thread is created, then it's blocked wwiting for resource, then it's deleted.
If you're worried about efficiency, this is not a best approach when waiting for IO. On Windows at least, you'll allocate around 1mb of memory in user mode, some in kernel for just one additional thread. What if you have many such resources? Having many waiting threads will also increase context switches and slow down your program. What if resource takes longer to be available and many requests are made? You may end up with tons of waiting threads.
Now, the solution to it (again, on Windows, but I'm sure there should be something similar on other OSes) is using threadpool (the one provided by Windows). On Windows this will not only create limited amount of threads, it'll be able to detect when thread is waiting for IO and will stwal thread from there and reuse it for other operations while waitting.
See http://msdn.microsoft.com/en-us/library/windows/desktop/ms686766(v=vs.85).aspx
Also, for more fine-grained control bit still having ability give up thread when waiting for IO, see IO completion ports (I think they'll anyway use threadpool inside): http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx

How to check if WriteFile function is done

I want to check if the WriteFile function is done writing to UART so that i can call ReadFile on the same ComDev without causing an Exception.
It seems the WriteFile function can return before writing is done.
BOOL WriteCommBlock(HANDLE * pComDev, char *pBuffer , int BytesToWrite)
{
while(fComPortInUse){}
fComPortInUse = 1;
BOOL bWriteStat = 0;
DWORD BytesWritten = 0;
COMSTAT ComStat = {0};
OVERLAPPED osWrite = {0,0,0};
if(WriteFile(*pComDev,pBuffer,BytesToWrite,&BytesWritten,&osWrite) == FALSE)
{
short Errorcode = GetLastError();
if( Errorcode != ERROR_IO_PENDING )
short breakpoint = 5; // Error
Sleep(1000); // complete write operation TBD
fComPortInUse = 0;
return (FALSE);
}
fComPortInUse = 0;
return (TRUE);
}
I used Sleep(1000) as an workaround, but how can i wait for an appropriate time?
You can create a Event, store it in your overlapped structure and wait for it to be signalled. Like this (untested):
BOOL WriteCommBlock(HANDLE * pComDev, char *pBuffer , int BytesToWrite)
{
while(fComPortInUse){}
fComPortInUse = 1;
BOOL bWriteStat = 0;
DWORD BytesWritten = 0;
COMSTAT ComStat = {0};
OVERLAPPED osWrite = {0,0,0};
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (hEvent != NULL)
{
osWrite.hEvent = hEvent;
if(WriteFile(*pComDev,pBuffer,BytesToWrite,&BytesWritten,&osWrite) == FALSE)
{
short Errorcode = GetLastError();
if( Errorcode != ERROR_IO_PENDING )
short breakpoint = 5; // Error
WaitForSingleObject(hEvent, INFINITE);
fComPortInUse = 0;
return (FALSE);
}
CloseHandle(hEvent);
}
fComPortInUse = 0;
return (TRUE);
}
Note that depending on what else you are trying to do simply calling WaitForSingleObject() might not be the best idea. And neither might an INFINITE timeout.
Your problem is the incorrect use of the overlapped I/O, regardless to the UART or whatever underlying device.
The easiest (though not necessarily the most optimal) way to fix your code is to use an event to handle the I/O completion.
// ...
OVERLAPPED osWrite = {0,0,0};
osWrite.hEvent = CreateEvent(FALSE, NULL, NULL, FALSE);
if(WriteFile(*pComDev,pBuffer,BytesToWrite,&BytesWritten,&osWrite) == FALSE)
{
DWORD Errorcode = GetLastError();
// ensure it's ERROR_IO_PENDING
WaitForSingleObject(osWrite.hEvent, INFINITE);
}
CloseHandle(osWrite.hEvent);
Note however that the whole I/O is synchronous. It's handles by the OS in an asynchronous way, however your code doesn't go on until it's finished. If so, why do you use the overlapped I/O anyway?
One should use it to enable simultaneous processing of several I/Os (and other tasks) within the same thread. To do this correctly - you should allocate the OVERLAPPED structure on heap and use one of the available completion mechanisms: event, APC, completion port or etc. Your program flow logic should also be changed.
Since you didn't say that you need asynchronous I/O, you should try synchronous. It's easier. I think if you just pass a null pointer for the OVERLAPPED arg you get synchronous, blocking, I/O. Please see the example code I wrote in the "Windows C" section of this document:
http://www.pololu.com/docs/0J40/
Your Sleep(1000); is of no use, it will only execute after the writefile completes its operation.You have to wait till WriteFile is over.
if(WriteFile(*pComDev,pBuffer,BytesToWrite,&BytesWritten,&osWrite) == FALSE)
{}
You must be knowing that anything inside conditionals will only execute if the result is true.
And here the result is sent to the program after completion(whether complete or with error) of WriteFile routine.
OK, I missed the overlapped I/O OVL parameter in the read/write code, so It's just as well I only replied yesterday as a comment else I would be hammered with downvotes:(
The classic way of handling overlapped I/O is to have an _OVL struct as a data member of the buffer class that is issued in the overlapped read/write call. This makes it easy to have read and write calls loaded in at the same time, (or indeed, multiple read/write calls with separate buffer instances).
For COM posrts, I usually use an APC completion routine whose address is passed in the readFileEx/writeFileEx APIs. This leaves the hEvent field of the _OVL free to use to hold the instance pointer of the buffer so it's easy to cast it back inside the completion routine, (this means that each buffer class instance contains an _OVL memebr that contains an hEvent field that points to the buffer class instance - sounds a but weird, but works fine).

COM port read - Thread remains alive after timeout occurs

I have a dll which includes a function called ReadPort that reads data from serial COM port, written in c/c++. This function is called within an extra thread from another WINAPI function using the _beginthreadex. When COM port has data to be read, the worker thread returns the data, ends normaly, the calling thread closes the worker's thread handle and the dll works fine.
However, if ReadPort is called without data pending on the COM port, when timeout occurs then WaitForSingleObject returns WAIT_TIMEOUT but the worker thread never ends. As a result, virtual memory grows at about 1 MB every time, physical memory grows some KBs and the application that calls the dll becomes unstable. I also tryied to use TerminateThread() but i got the same results.
I have to admit that although i have enough developing experience, i am not familiar with c/c++. I did a lot of research before posting but unfortunately i didn't manage to solve my problem.
Does anyone have a clue on how could i solve this problem? However, I really want to stick to this kind of solution. Also, i want to mention that i think i can't use any global variables to use some kind of extra events, because each dll's functions may be called many times for every COM port.
I post some parts of my code below:
The Worker Thread:
unsigned int __stdcall ReadPort(void* readstr){
DWORD dwError; int rres;DWORD dwCommModemStatus, dwBytesTransferred;
int ret;
char szBuff[64] = "";
ReadParams* params = (ReadParams*)readstr;
ret = SetCommMask(params->param2, EV_RXCHAR | EV_CTS | EV_DSR | EV_RLSD | EV_RING);
if (ret == 0)
{
_endthreadex(0);
return -1;
}
ret = WaitCommEvent(params->param2, &dwCommModemStatus, 0);
if (ret == 0)
{
_endthreadex(0);
return -2;
}
ret = SetCommMask(params->param2, EV_RXCHAR | EV_CTS | EV_DSR | EV_RLSD| EV_RING);
if (ret == 0)
{
_endthreadex(0);
return -3;
}
if (dwCommModemStatus & EV_RXCHAR||dwCommModemStatus & EV_RLSD)
{
rres = ReadFile(params->param2, szBuff, 64, &dwBytesTransferred,NULL);
if (rres == 0)
{
switch (dwError = GetLastError())
{
case ERROR_HANDLE_EOF:
_endthreadex(0);
return -4;
}
_endthreadex(0);
return -5;
}
else
{
strcpy(params->param1,szBuff);
_endthreadex(0);
return 0;
}
}
else
{
_endthreadex(0);
return 0;
}
_endthreadex(0);
return 0;}
The Calling Thread:
int WINAPI StartReadThread(HANDLE porthandle, HWND windowhandle){
HANDLE hThread;
unsigned threadID;
ReadParams readstr;
DWORD ret, ret2;
readstr.param2 = porthandle;
hThread = (HANDLE)_beginthreadex( NULL, 0, ReadPort, &readstr, 0, &threadID );
ret = WaitForSingleObject(hThread, 500);
if (ret == WAIT_OBJECT_0)
{
CloseHandle(hThread);
if (readstr.param1 != NULL)
// Send message to GUI
return 0;
}
else if (ret == WAIT_TIMEOUT)
{
ret2 = CloseHandle(hThread);
return -1;
}
else
{
ret2 = CloseHandle(hThread);
if (ret2 == 0)
return -2;
}}
Thank you in advance,
Sna.
Don't use WaitCommEvent. You can call ReadFile even when there is no data waiting.
Use SetCommTimeouts to make ReadFile itself timeout, instead of building a timeout on the inter-thread communications.
Change the delay in the WaitForSingleObject call to 5000 or 10000 and I bet your problem frequency goes way down.
Edwin's answer is also valid. The spawned thread does not die because you closed the thread handle.
There is no guarantee that the ReadPort thread has even started by the time you are timing out. Windows takes a LONG time to start a thread.
Here are some suggestions:
You never check the return value of beginthreadex. How do you know the thread started?
Use whatever synchronization method with which you are comfortable to sync the ReadPort thread startup with StartReadThread. It could be as simple as an integer flag that ReadPort sets to 1 when its ready to work. Then the main thread can start its true waiting at that point. Otherwise you'll never know short of using a debugger what's happening between the 2 threads. Do not time out from the call to WaitForSingleObject in StartReadThread until your sync method indicates that ReadPort is working.
You should not use strcpy to copy the bytes received from the serial port with ReadFile. ReadFile tells you how many bytes it read. Use that value and memcpy to fill the buffer.
Look here and here for info on how to have ReadFile time out so your reads are not indefinite. Blocking forever on Windows is a recipe for disaster as it can cause zombie processes you cannot kill, among other problems.
You communicate no status to StartReadThread about what happened in the ReadPort thread. How do you know how many bytes ReadPort placed into szBuff? To get the theads exit code, use GetExitCodeThread. Documented here. Note that you cannot use GetExitCodeThread if you've closed the thread handle.
In your calling thread after a timeout you close the threadhandle. This will only stop you from using the handle. The worker thread however is still running. You should use a loop which waits again.