How is it possible to get the Socket ID (Handle) of the created sockets of a program?
I know I can get all the open sockets in all programs by GetTcpTable() but it has two problems:
It shows all programs sockets
It doesn't return ID (Handle) of sockets

As Remy said, its not trivial. You have to call OpenProcess with PROCESS_DUP_HANDLE for each process in the system. You might also need PROCESS_QUERY_INFORMATION and PROCESS_VM_READ, but I've never needed it (I've seen other code that uses it).
For each process, you access the donor process's handle table with NtQuerySystemInformation (with an information class of SystemHandleInformation). Finally, you call DuplicateHandle to make the process's handle your handle, too.
You will have to filter the handle types when enumerating the donor process's handle table. For each handle you have duplicated, call NtQueryObject with ObjectTypeInformation. If the type is a socket, you keep it open and put it in your list. Otherwise, close it and go on.
To perform the compare, the code looks similar to below. The type is returned as a UNICODE_STRING:
// info was returned from NtQueryObject, ObjectTypeInformation
wstring type( pObjectTypeInfo->Name.Buffer, pObjectTypeInfo->Name.Length );
if( 0 != wcscmp( L"Socket", type.c_str() ) ) { /* Not a Socket */ }
If there is no Socket type (I don't recall), you should try to get the name associated with the handle (its still a UNICODE_STRING), and look for \\Device\\Tcp. This time, you would use the same handle, but call NtQueryObject with ObjectNameInformation:
// info was returned from NtQueryObject, ObjectNameInformation
wstring name( pObjectNameInfo->Name.Buffer, pObjectNameInfo->Name.Length );
if( name.substr(0, 11) == "\\Device\\Tcp" ) ) { /* It's a TCP Socket */ }
Myself an another fellow did similar a few years ago. Instead of Sockets, we used Mutexes and Events to crash privileged Antivirus components from their userland UI program (which was sharing handles with the privileged component for IPC). See Old Dogs and New Tricks: Do You Know Where Your Handles Are?.

Ok, thanks to everyone that tried to solve my problem
After a lot of works I get how to handle it myself, this is how i tried to get the specified socket :
At the first I looked in to program's disassembly and find out the calls to WS2_32 Send function.
As the picture show there is a call to Socket send function at 0x467781 and the Socket handle saved to the stack in the EDX register
Now what i need to do is to Hook my code in to that function.
void GetSocket(int Flag,int DataSize, char* Data, SOCKET Socket)
sSocket = Socket;
sFlag = Flag;
sDataSize = DataSize;
sData = Data;
SendPacket(sSocket,Data,DataSize); //Send packets manually
__declspec(naked) void MyFunc()
PUSH EDX // Socket
PUSH ECX // Buffer
PUSH EAX // Buffer Size
PUSH 0 // Flag
CALL GetSocket
MOV EAX, sDataSize
MOV ECX, sData
MOV EDX, sWowSocket
JMP [JumpAddress] // JumpAddress = 0x467787 (After that CALL)
And now i all have to do is to change that CALL (in 0x467781) to a JMP to our function(MyFunc) and it can be done with the following function :
*(DWORD*) (0x467781 + 0x01) = (DWORD)MyFunc- (0x467781 + 0x05);
Now I'm done,I can easily see each packet that it sends to server and change them if necessary and also send my custom packets whit its Socket :)


Program crashes when trying to retrieve contents of pointer

I'm making a socket program in C++ using winsock2 and I'm trying to use WSAAccept to conditionally accept connections. I copied the example ConditionalFunction from MSDN for the lpfnCondition argument in WSAAccept as seen below.
_In_ SOCKET s,
_Out_ struct sockaddr *addr,
_Inout_ LPINT addrlen,
_In_ LPCONDITIONPROC lpfnCondition, //<---------
_In_ DWORD_PTR dwCallbackData
However when trying to access the contents of lpCallerId in the ConditionalFunction like so WSABUF buffer = *lpCallerData my program crashes. I know this is the source of the problem because when I comment that line out my program doesn't crash. I don't think all of my code would be necessary. Any help would be lovely.
CALLBACK ConditionalAccept(LPWSABUF lpCallerId,LPWSABUF lpCallerData,LPQOS lpSQOS,
GROUP *g,DWORD_PTR dwCallbackData)
WSABUF buffer = *lpCallerData;
if (lpSQOS != NULL) {
RtlZeroMemory(lpSQOS, sizeof(QOS));
return CF_ACCEPT;
} else
return CF_REJECT;
WSAAccept(slisten, (SOCKADDR*)&acceptSock, &Size, &ConditionalAccept, NULL);
As Luke stated, you are not checking lpCallerData for NULL before dereferencing it. That is why your code is crashing.
int CALLBACK ConditionalAccept(LPWSABUF lpCallerId,LPWSABUF lpCallerData,LPQOS lpSQOS,
GROUP *g,DWORD_PTR dwCallbackData)
WSABUF buffer = {0};
if (lpCallerData != NULL) { // <-- add this check!
buffer = *lpCallerData;
if (lpSQOS != NULL) {
RtlZeroMemory(lpSQOS, sizeof(QOS));
return CF_ACCEPT;
} else
return CF_REJECT;
However, lpCallerData is meaningless in TCP/IP and will always be NULL. TCP/IP does not support exchanging caller/callee data during connection establishment. This is clearly stated in the WSAConnect() documentation:
The lpCallerData parameter contains a pointer to any user data that is to be sent along with the connection request (called connect data). This is additional data, not in the normal network data stream, that is sent with network requests to establish a connection. This option is used by legacy protocols such as DECNet, OSI TP4, and others.
Note Connect data is not supported by the TCP/IP protocol in Windows. Connect data is supported only on ATM (RAWWAN) over a raw socket.

WriteFileEx completion routine succeeds, but bytes transferred is incorrect

I'm communicating between two processes on different machines via a pipe, using IO completion routines.
Occasionally, when the completion routine for WriteFileEx gets called, the completion routine parameter dwErrorCode is 0 (i.e. no error), GetOverlappedResult returns true (i.e. no error), but dwNumberOfBytesTransfered does not match nNumberOfBytesToWrite in the call to WriteFileEx. I only see this on the client end of the pipe however.
If the number of bytes transferred does not match the number of bytes that was requested to transfer, how can this be deemed a success?
This is how the client's handle to the pipe is created:
mHPipe = CreateFile(pipeName, // pipe name
GENERIC_READ | // read and write access
0, // no sharing
NULL, // default security attributes
OPEN_EXISTING, // opens existing pipe
FILE_FLAG_OVERLAPPED | // overlapped
FILE_FLAG_WRITE_THROUGH, // write through mode
NULL); // no template file
// do some checking...
// The pipe connected; change to message-read mode.
BOOL fSuccess = SetNamedPipeHandleState(mHPipe, // pipe handle
&dwMode, // new pipe mode
NULL, // don't set maximum bytes
NULL); // don't set maximum time
Can anyone see why this would happen?
The relevant WriteFileEx code is as follows:
void WINAPI CompletedWriteRoutine(DWORD dwErrorCode, DWORD dwNumberOfBytesTransfered, LPOVERLAPPED lpOverLap)
BOOL fWrite = FALSE;
// ! 99.9% of the time, dwNumberOfBytesTransfered == lpPipeInst->cbDataSize
// but 0.1% of the time, they do not match
// Some stuff
// Copy next message to send
memcpy_s(lpPipeInst->chData, sizeof(lpPipeInst->chData), pMsg->msg, pMsg->size);
lpPipeInst->cbDataSize = pMsg->size;
// Some other stuff
fWrite = WriteFileEx(lpPipeInst->hPipeInst,
// Some other, other stuff
Where LPPIPEINST is declared as:
typedef struct
OVERLAPPED oOverlap; // must remain first item
HANDLE hPipeInst;
DWORD cbDataSize;
And the initial call to CompletedWriteRoutine is given the lpOverlap parameter declared thusly:
PIPEINST pipeInstWrite = {0};
pipeInstWrite.hPipeInst = client.getPipeHandle();
pipeInstWrite.oOverlap.hEvent = hEvent[eventWriteComplete];
After trying re-initializing the overlapped structure as Harry suggested, I noticed something peculiar.
I memset the OVERLAPPED structure to zero before each WriteFileEx, and roughly 1/5000 completion routine callbacks, the cbWritten parameter and the OVERLAPPED structure's InternalHigh member was now set to the size of the previous message, instead of the most recent message. I added some logging to file on both the client and server ends of the pipe inside the completion routines, and the data sent and received at both ends was an exact match (and the correct, expected data). This then unveiled that in the time taken to write the data to a file, the InternalHigh member in the OVERLAPPED structure had changed to now reflect the size of the message I was expecting (cbWritten remains the old message size). I removed the file logging, and am now able to reproduce the issue like clockwork with this code:
void WINAPI CompletedWriteRoutine(DWORD dwErr, DWORD cbWritten, LPOVERLAPPED lpOverLap)
// Completion routine says it wrote the amount of data from the previous callback
if (cbWritten != lpPipeInst->cbDataSize)
// Roughly 1 in 5000 callbacks ends up in here
OVERLAPPED ovl1 = lpPipeInst->oOverlap; // Contains size of previous message, i.e. cbWritten
OVERLAPPED ovl2 = lpPipeInst->oOverlap; // Contains size of most recent message, i.e lpPipeInst->cbDataSize
It seems that sometimes, the completion routine is being called before the OVERLAPPED structure and the completion routine input parameter is updated. I'm using MsgWaitForMultipleObjectsEx(eventLast, hEvent, INFINITE, QS_POSTMESSAGE, MWMO_ALERTABLE); for the completion routines to be called on Windows 7 64 bit.
This MSDN page says:
"The system does not use the OVERLAPPED structure after the completion routine is called, so the completion routine can deallocate the memory used by the overlapped structure."
...so apparently, what this code can reproduce should never happen?
Is this a WINAPI bug?
Added FILE_FLAG_NO_BUFFERING to the CreateFile call - haven't seen the problem since. Thanks everyone who commented for your time.

How to structure worker thread logic for IOCP

I'm creating a client program that communicates with a device connected to my PC via LAN.
A typical communication between my program and the device is as follows:
Program -> Device 1616000D 08 02 00 00 00 21 11 A1 00 01 22 08 00 // Sender sends data (a specific command to the device) to Receiver
Program <- Device 16160002 80 00 // Receiver sends ACK to sender
Program <- Device 16160005 08 20 00 00 00 // Receiver sends command response to sender
Program -> Device 16160002 80 00 // Sender sends ACK to receiver
The last hex number of the first byte sequence indicates the size of data to follow (D = 13 bytes).
My send routine looks like:
bool TcpConnection::SendCommand(const Command& rCommand, const std::vector<BYTE>& rvecCommandOptions)
std::vector<BYTE> vecCommandData;
m_commandBuilder.BuildCommand(rCommand, rvecCommandOptions, vecCommandData);
if (vecCommandData.empty())
return false;
PerIoData *pPerIoData = new PerIoData;
if (!pPerIoData)
return false;
SecureZeroMemory(&(pPerIoData->m_overlapped), sizeof(WSAOVERLAPPED));
pPerIoData->m_socket = m_socket.Get();
pPerIoData->m_overlapped.hEvent = WSACreateEvent();
pPerIoData->m_vecBuffer.assign(vecCommandData.begin(), vecCommandData.end());
pPerIoData->m_wsaBuf.buf = (CHAR*)(&(pPerIoData->m_vecBuffer[0]));
pPerIoData->m_wsaBuf.len = pPerIoData->m_vecBuffer.size();
pPerIoData->m_dwFlags = 0;
pPerIoData->m_dwNumberOfBytesSent = 0;
pPerIoData->m_dwNumberOfBytesToSend = pPerIoData->m_wsaBuf.len;
pPerIoData->m_operationType = OP_TYPE_SEND;
if (!m_socket.Send(pPerIoData))
return false;
return true;
And my worker thread routine looks like:
DWORD WINAPI TcpConnection::WorkerThread(LPVOID lpParameter)
HANDLE hCompletionPort = (HANDLE)lpParameter;
DWORD dwNumberOfBytesTransferred;
ULONG ulCompletionKey;
PerIoData *pPerIoData;
DWORD dwNumberOfBytesReceived;
DWORD dwNumberOfBytesSent;
DWORD dwFlags;
while (GetQueuedCompletionStatus(hCompletionPort, &dwNumberOfBytesTransferred, &ulCompletionKey, (LPOVERLAPPED*)&pPerIoData, INFINITE))
if (!pPerIoData)
if ((dwNumberOfBytesTransferred == 0) && ((pPerIoData->m_operationType == OP_TYPE_SEND) || (pPerIoData->m_operationType == OP_TYPE_RECEIVE)))
delete pPerIoData;
if (pPerIoData->m_operationType == OP_TYPE_SEND)
pPerIoData->m_dwNumberOfBytesSent += dwNumberOfBytesTransferred;
if (pPerIoData->m_dwNumberOfBytesSent < pPerIoData->m_dwNumberOfBytesToSend)
pPerIoData->m_wsaBuf.buf = (CHAR*)(&(pPerIoData->m_vecBuffer[pPerIoData->m_dwNumberOfBytesSent]));
pPerIoData->m_wsaBuf.len = (pPerIoData->m_dwNumberOfBytesToSend - pPerIoData->m_dwNumberOfBytesSent);
if (WSASend(pPerIoData->m_socket, &(pPerIoData->m_wsaBuf), 1, &dwNumberOfBytesTransferred, 0, &(pPerIoData->m_overlapped), NULL) == 0)
if (WSAGetLastError() == WSA_IO_PENDING)
else if (pPerIoData->m_dwNumberOfBytesSent == pPerIoData->m_dwNumberOfBytesToSend)
delete pPerIoData;
// Q1. Do I create a new instance of PerIoData here before calling WSARecv() or reuse pPerIoData?
// QA. If I did do "PerIoData pPerIoData = new PerIoData" here, how do I handle if this momory allocation request has failed? Should I simply "continue" or "return -1"?
// QB. Or is this a wrong place to do this memory allocation to achive the typical communication between my program and the device?
SecureZeroMemory(&(pPerIoData->m_overlapped), sizeof(WSAOVERLAPPED));
pPerIoData->m_overlapped.hEvent = WSACreateEvent();
pPerIoData->m_wsaBuf.buf = (CHAR*)(&(pPerIoData->m_vecBuffer[0]));
pPerIoData->m_wsaBuf.len = pPerIoData->m_vecBuffer.size();
pPerIoData->m_operationType = OP_TYPE_RECEIVE;
if (WSARecv(pPerIoData->m_socket, &(pPerIoData->m_wsaBuf), 1, &dwNumberOfBytesReceived, &(pPerIoData->m_dwFlags), &(pPerIoData->m_overlapped), NULL) == 0)
if (WSAGetLastError() == WSA_IO_PENDING)
else if (pPerIoData->m_operationType == OP_TYPE_RECEIVE)
if ((pPerIoData->m_vecBuffer[0] == 0x16) && (pPerIoData->m_vecBuffer[1] == 0x16))
// Q2. Do I need to do SecureZeroMemory(&(pPerIoData->m_overlapped), sizeof(WSAOVERLAPPED)); here?
// Q3. Or do I new PerIoData?
pPerIoData->m_wsaBuf.buf = (CHAR*)(&(pPerIoData->m_vecBuffer[0]));
pPerIoData->m_wsaBuf.len = pPerIoData->m_vecBuffer.size();
pPerIoData->m_operationType = OP_TYPE_RECEIVE;
// QC. At this point two syn bytes (0x16) are received. I now need to receive two more bytes of data (000D = 13 bytes) to find out the size of the actual command response data.
// If I clear my m_vecBuffer here and try to resize its size to two, I get this debug assertion: "vector iterators incompatible" at runtime. Do you know how I can fix this problem?
if (WSARecv(pPerIoData->m_socket, &(pPerIoData->m_wsaBuf), 1, &dwNumberOfBytesReceived, &(pPerIoData->m_dwFlags), &(pPerIoData->m_overlapped), NULL) == 0)
if (WSAGetLastError() == WSA_IO_PENDING)
// QD. I'm not sure how to structure this if clause for when m_operationType is OP_TYPE_RECEIVE. I mean how do I distinguish one receive operation for getting two syn bytes from another for getting data size?
// One way I can think of doing is to create more receive operation types such as OP_TYPE_RECEIVE_DATA_SIZE or OP_TYPE_RECEIVE_DATA? So you can have something like below.
// Is this how you would do it?
//else if (pPerIoData->m_operationType == OP_TYPE_RECEIVE_DATA_SIZE)
// Call WSARecv() again to get command response data
return 0;
Please see my questions in the code above.
Many thanks
As the name of your PerIoData type refers to, you need one data structure per incomplete I/O request. A PerIoData structure should persist from the time you initiate asynchronous I/O with WSASend or WSARecv until the time that you retrieve that request's completion packet off of the I/O completion port using GetQueuedCompletionStatus.
You should always reinitialize your OVERLAPPED structures when you're about to start a new request.
You can re-use the PerIoData structure as long as the I/O request has completed. Given that you've retrieved pPerIoData off the I/O completion port, you may reuse it for subsequent requests. Just make sure that you've reset any applicable fields in that structure so that it is in a state that is appropriate for a new I/O request.
EDIT to answer follow-up questions:
A. I would continue because you want to continue processing I/O events even though you couldn't initiate an additional request. If you don't continue then you won't be able to handle any more I/O completions. Before you continue you might want to call an error handler of some sort.
B. I don't think there's necessarily a "right" or "wrong" place to allocate, but keep in mind that when you allocate your PerIoData there, what you effectively end up doing is repeated allocations and deletes of the same data structure over and over in a loop. When I write code using I/O completion ports, I allocate a pool of my PerIoData equivalent up front and re-use them.
C. I don't have enough context to know the answer. Show your code that does this and the line where the assertion hits and I might be able to help.
D. You could break up your operation type into finer-grained components as you suggested, such as a OP_TYPE_RECEIVE_DATA_SIZE operation. As a warning, reading a couple of bytes on each WSARecv call won't perform as well as you'd like. Winsock calls are expensive; it's a lot of overhead to make a request for a couple of bytes. I'd suggest that you read a larger block of data into your PerIoData buffer in one WSARecv. Then pull your sizing information out of that buffer, then start copying your data out of that buffer. If there's more data arriving than can fit in the buffer, then you can make additional WSARecv calls until you've read the rest in.

Attempting asynchronous I/O with Win32 threads

I'm writing a serial port software for Windows. To improve performance I'm trying to convert the routines to use asynchronous I/O. I have the code up and working fairly well, but I'm a semi-beginner at this, and I would like to improve the performance of the program further. During stress tests of the program (ie burst data to/from the port as fast as possible at high baudrate), the CPU load gets quite high.
If anyone out there has experience from asynchronous I/O and multi-threading in Windows, I'd be grateful if you could take a look at my program. I have two main concerns:
Is the asynchronous I/O implemented correctly? I found some fairly reliable source on the net suggesting that you can pass user data to the callback functions, by implementing your own OVERLAPPED struct with your own data at the end. This seems to be working just fine, but it does look a bit "hackish" to me. Also, the program's performance didn't improve all that much when I converted from synchronous/polled to asynchronous/callback, making me suspect I'm doing something wrong.
Is it sane to use STL std::deque for the FIFO data buffers? As the program is currently written, I only allow 1 byte of data to be received at a time, before it must be processed. Because I don't know how much data I will receive, it could be endless amounts. I assume this 1-byte-at-a-time will yield sluggish behaviour behind the lines of deque when it has to allocate data. And I don't trust deque to be thread-safe either (should I?).
If using STL deque isn't sane, are there any suggestions for a better data type to use? Static array-based circular ring buffer?
Any other feedback on the code is most welcome as well.
The serial routines are implemented so that I have a parent class called "Comport", which handles everything serial I/O related. From this class I inherit another class called "ThreadedComport", which is a multi-threaded version.
ThreadedComport class (relevant parts of it)
class ThreadedComport : public Comport
HANDLE _hthread_port; /* thread handle */
HANDLE _hmutex_port; /* COM port access */
HANDLE _hmutex_send; /* send buffer access */
HANDLE _hmutex_rec; /* rec buffer access */
deque<uint8> _send_buf;
deque<uint8> _rec_buf;
uint16 _data_sent;
uint16 _data_received;
HANDLE _hevent_kill_thread;
HANDLE _hevent_open;
HANDLE _hevent_close;
HANDLE _hevent_write_done;
HANDLE _hevent_read_done;
HANDLE _hevent_ext_send; /* notifies external thread */
HANDLE _hevent_ext_receive; /* notifies external thread */
typedef struct
OVERLAPPED overlapped;
ThreadedComport* caller; /* add user data to struct */
} OVERLAPPED_overlap;
OVERLAPPED_overlap _send_overlapped;
OVERLAPPED_overlap _rec_overlapped;
uint8* _write_data;
uint8 _read_data;
DWORD _bytes_read;
static DWORD WINAPI _tranceiver_thread (LPVOID param);
void _send_data (void);
void _receive_data (void);
DWORD _wait_for_io (void);
static void WINAPI _send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
static void WINAPI _receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
The main thread routine created through CreateThread():
DWORD WINAPI ThreadedComport::_tranceiver_thread (LPVOID param)
ThreadedComport* caller = (ThreadedComport*) param;
HANDLE handle_array [3] =
caller->_hevent_kill_thread, /* WAIT_OBJECT_0 */
caller->_hevent_open, /* WAIT_OBJECT_1 */
caller->_hevent_close /* WAIT_OBJECT_2 */
DWORD result;
/* wait for anything to happen */
result = WaitForMultipleObjects(3,
false, /* dont wait for all */
if(result == WAIT_OBJECT_1 ) /* open? */
do /* while port is open, work */
result = caller->_wait_for_io(); /* will wait for the same 3 as in handle_array above,
plus all read/write specific events */
} while (result != WAIT_OBJECT_0 && /* while not kill thread */
result != WAIT_OBJECT_2); /* while not close port */
else if(result == WAIT_OBJECT_2) /* close? */
; /* do nothing */
} while (result != WAIT_OBJECT_0); /* kill thread? */
return 0;
which in turn calls the following three functions:
void ThreadedComport::_send_data (void)
uint32 send_buf_size;
if(_send_buf.size() != 0) // anything to send?
WaitForSingleObject(_hmutex_port, INFINITE);
if(_is_open) // double-check port
bool result;
WaitForSingleObject(_hmutex_send, INFINITE);
_data_sent = 0;
send_buf_size = _send_buf.size();
if(send_buf_size > (uint32)_MAX_MESSAGE_LENGTH)
send_buf_size = _MAX_MESSAGE_LENGTH;
_write_data = new uint8 [send_buf_size];
for(uint32 i=0; i<send_buf_size; i++)
_write_data[i] = _send_buf.front();
result = WriteFileEx (_hcom, // handle to output file
(void*)_write_data, // pointer to input buffer
send_buf_size, // number of bytes to write
(LPOVERLAPPED)&_send_overlapped, // pointer to async. i/o data
SleepEx(INFINITE, true); // Allow callback to come
if(result == false)
// error handling here
} // if(_is_open)
else /* nothing to send */
SetEvent(_hevent_write_done); // Skip write
void ThreadedComport::_receive_data (void)
WaitForSingleObject(_hmutex_port, INFINITE);
BOOL result;
_bytes_read = 0;
result = ReadFileEx (_hcom, // handle to output file
(void*)&_read_data, // pointer to input buffer
1, // number of bytes to read
(OVERLAPPED*)&_rec_overlapped, // pointer to async. i/o data
SleepEx(INFINITE, true); // Allow callback to come
if(result == FALSE)
DWORD last_error = GetLastError();
if(last_error == ERROR_OPERATION_ABORTED) // disconnected ?
close(); // close the port
DWORD ThreadedComport::_wait_for_io (void)
DWORD result;
bool is_write_done = false;
bool is_read_done = false;
HANDLE handle_array [5] =
do /* COM port message pump running until sending / receiving is done */
result = WaitForMultipleObjects(5,
false, /* dont wait for all */
if(result <= WAIT_OBJECT_2)
break; /* abort */
else if(result == WAIT_OBJECT_3) /* write done */
is_write_done = true;
else if(result == WAIT_OBJECT_4) /* read done */
is_read_done = true;
if(_bytes_read > 0)
uint32 errors = 0;
WaitForSingleObject(_hmutex_rec, INFINITE);
_data_received += _bytes_read;
while((uint16)_rec_buf.size() > _MAX_MESSAGE_LENGTH)
_bytes_read = 0;
ClearCommError(_hcom, &errors, NULL);
} while(!is_write_done || !is_read_done);
return result;
Asynchronous I/O callback functions:
void WINAPI ThreadedComport::_send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
if(dwErrorCode == 0) // no errors
if(dwNumberOfBytesTransfered > 0)
_this->_data_sent = dwNumberOfBytesTransfered;
delete [] _this->_write_data; /* always clean this up */
void WINAPI ThreadedComport::_receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
if(dwErrorCode == 0) // no errors
if(dwNumberOfBytesTransfered > 0)
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
_this->_bytes_read = dwNumberOfBytesTransfered;
The first question is simple. The method is not hackish; you own the OVERLAPPED memory and everything that follows it. This is best described by Raymond Chen: http://blogs.msdn.com/b/oldnewthing/archive/2010/12/17/10106259.aspx
You would only expect a performance improvement if you've got better things to while waiting for the I/O to complete. If all you do is SleepEx, you'll only see CPU% go down. The clue is in the name "overlapped" - it allows you to overlap calculations and I/O.
std::deque<unsigned char> can handle FIFO data without big problems. It will probably recycle 4KB chunks (precise number determined by extensive profiling, all done for you).
I've looked into your code a bit further, and it seems the code is needlessly complex. For starters, one of the main benefits of asynchronous I/O is that you don't need all that thread stuff. Threads allow you to use more cores, but you're dealing with a slow I/O device. Even a single core is sufficient, if it doesn't spend all its time waiting. And that's precisely what overlapped I/O is for. You just dedicate one thread to all I/O work for the port. Since it's the only thread, it doesn't need a mutex to access that port.
OTOH, you would want a mutex around the deque<uint8> objects since the producer/consumer threads aren't the same as the comport thread.
I don't see any reason for using asynchronous I/O in a project like this. Asynchronous I/O is good when you're handling a large number of sockets or have work to do while waiting for data, but as far as I can tell, you're only dealing with a single socket and not doing any work in between.
Also, just for the sake of knowledge, you would normally use an I/O completion port to handle your asynchronous I/O. I'm not sure if there are any situations where using an I/O completion port has a negative impact on performance.
But yes, your asynchronous I/O usage looks okay. Implementing your own OVERLAPPED struct does look like a hack, but it is correct; there's no other way to associate your own data with the completion.
Boost also has a circular buffer implementation, though I'm not sure if it's thread safe. None of the standard library containers are thread safe, though.
I think that your code has suboptimal design.
You are sharing too many data structures with too many threads, I guess. I think that you should put all handling of the serial device IO for one port into a single thread and put a synchronized command/data queue between the IO thread and all client threads. Have the IO thread watch out for commands/data in the queue.
You seem to be allocating and freeing some buffers for each sent event. Avoid that. If you keep all the IO in a single thread, you can reuse a single buffer. You are limiting the size of the message anyway, you can just pre-allocate a single big enough buffer.
Putting the bytes that you want to send into a std::deque is suboptimal. You have to serialize them into a continuous memory block for the WriteFile(). Instead, if you use some sort of commdand/data queue between one IO thread and other threads, you can have the client threads provide the continuous chunk of memory at once.
Reading 1 byte at a time seem silly, too. Unless it does not work for serial devices, you could provide large enough buffer to ReadFileEx(). It returns how many bytes it has actually managed to read. It should not block, AFAIK, unless of course I am wrong.
You are waiting for the overlapped IO to finish using the SleepEx() invocation. What is the point of the overlapped IO then if you are just ending up being synchronous?

How to pass user-defined data to a worker thread using IOCP?

Hey... I created a small test server using I/O completion ports and winsock.
I can successfully connect and associate a socket handle with the completion port.
But I don´t know how to pass user-defined data-structures into the wroker thread...
What I´ve tried so far was passing a user-structure as (ULONG_PTR)&structure as the Completion Key in the association-call of CreateIoCompletionPort()
But that did not work.
Now I tried defining my own OVERLAPPED-structure and using CONTAINING_RECORD() as described here http://msdn.microsoft.com/en-us/magazine/cc302334.aspx and http://msdn.microsoft.com/en-us/magazine/bb985148.aspx.
But that does not work, too. (I get freaky values for the contents of pHelper)
So my Question is: How can I pass data to the worker thread using WSARecv(), GetQueuedCompletionStatus() and the Completion packet or the OVERLAPPED-strucutre?
EDIT: How can I successfully transmit "per-connection-data"?... It seems like I got the art of doing it (like explained in the two links above) wrong.
Here goes my code: (Yes, its ugly and its only TEST-code)
struct helper
SOCKET m_sock;
unsigned int m_key;
WSABUF wsabuffer;
char cbuf[250];
wsabuffer.buf = cbuf;
wsabuffer.len = 250;
DWORD flags, bytesrecvd;
newSock = accept(AcceptorSock, NULL, NULL);
if(newSock == INVALID_SOCKET)
ErrorAbort("could not accept a connection");
//associate socket with the CP
if(CreateIoCompletionPort((HANDLE)newSock, hCompletionPort, 3,0) != hCompletionPort)
ErrorAbort("Wrong port associated with the connection");
cout << "New Connection made and associated\n";
helper* pHelper = new helper;
pHelper->m_key = 3;
pHelper->m_sock = newSock;
memset(&(pHelper->over), 0, sizeof(OVERLAPPED));
flags = 0;
bytesrecvd = 0;
if(WSARecv(newSock, &wsabuffer, 1, NULL, &flags, (OVERLAPPED*)pHelper, NULL) != 0)
if(WSAGetLastError() != WSA_IO_PENDING)
ErrorAbort("WSARecv didnt work");
return 0;
DWORD dwNumberOfBytes = 0;
OVERLAPPED* pOver = nullptr;
helper* pHelper = nullptr;
char cBuffer[250];
RecvBuf.buf = cBuffer;
RecvBuf.len = 250;
DWORD dwRecvBytes = 0;
DWORD dwFlags = 0;
ULONG_PTR Key = 0;
GetQueuedCompletionStatus(h, &dwNumberOfBytes, &Key, &pOver, INFINITE);
//Extract helper
pHelper = (helper*)CONTAINING_RECORD(pOver, helper, over);
cout << "Received Overlapped item" << endl;
if(WSARecv(pHelper->m_sock, &RecvBuf, 1, &dwRecvBytes, &dwFlags, pOver, NULL) != 0)
cout << "Could not receive data\n";
cout << "Data Received: " << RecvBuf.buf << endl;
If you pass your struct like this it should work just fine:
helper* pHelper = new helper;
CreateIoCompletionPort((HANDLE)newSock, hCompletionPort, (ULONG_PTR)pHelper,0);
helper* pHelper=NULL;
GetQueuedCompletionStatus(h, &dwNumberOfBytes, (PULONG_PTR)&pHelper, &pOver, INFINITE);
Edit to add per IO data:
One of the frequently abused features of the asynchronous apis is they don't copy the OVERLAPPED struct, they simply use the provided one - hence the overlapped struct returned from GetQueuedCompletionStatus points to the originally provided struct. So:
struct helper {
SOCKET m_socket;
UINT m_key;
if(WSARecv(newSock, &wsabuffer, 1, NULL, &flags, &pHelper->m_over, NULL) != 0)
Notice that, again, in your original sample, you were getting your casting wrong. (OVERLAPPED*)pHelper was passing a pointer to the START of the helper struct, but the OVERLAPPED part was declared last. I changed it to pass the address of the actual overlapped part, which means that the code compiles without a cast, which lets us know we are doing the correct thing. I also moved the overlapped struct to be the first member of the struct.
To catch the data on the other side:
// c cast
helper* pConnData = (helper*)pOver;
On this side it is particularly important that the overlapped struct is the first member of the helper struct, as that makes it easy to cast back from the OVERLAPPED* the api gives us, and the helper* we actually want.
You can send special-purpose data of your own to the completion port via PostQueuedCompletionStatus.
The I/O completion packet will satisfy
an outstanding call to the
GetQueuedCompletionStatus function.
This function returns with the three
values passed as the second, third,
and fourth parameters of the call to
PostQueuedCompletionStatus. The system
does not use or validate these values.
In particular, the lpOverlapped
parameter need not point to an
OVERLAPPED structure.
I use the standard socket routines (socket, closesocket, bind, accept, connect ...) for creating/destroying and ReadFile/WriteFile for I/O as they allow use of the OVERLAPPED structure.
After your socket has accepted or connected you should associate it with the session context that it services. Then you associate your socket to an IOCP and (in the third parameter) provide it with a reference to the session context. The IOCP does not know what this reference is and doesn't care either for that matter. The reference is for YOUR use so that when you get an IOC through GetQueuedCompletionStatus the variable pointed to by parameter 3 will be filled in with the reference so that you immediately find the context associated with the socket event and can begin servicing the event. I usually use an indexed structure containing (among other things) the socket declaration, the overlapped structure as well as other session-specific data. The reference I pass to CreateIoCompletionPort in parameter 3 will be the index to the structure member containing the socket.
You need to check if GetQueuedCompletionStatus returned a completion or a timeout. With a timeout you can run through your indexed structure and see (for example) if one of them has timed out or something else and take appropriate house-keeping actions.
The overlapped structure also needs to be checked to see that the I/O completed correctly.
The function servicing the IOCP should be a separate, multi-threaded entity. Use the same number of threads that you have cores in your system, or at least no more than that as it wastes system resources (you don't have more resources for servicing the event than the number of cores in your system, right?).
IOCPs really are the best of all worlds (too good to be true) and anyone who says "one thread per socket" or "wait on multiple-socket list in one function" don't know what they are talking about. The former stresses your scheduler and the latter is polling and polling is ALWAYS extremely wasteful.