Handling of LPWSAOVERLAPPED after WSASend - c++

I am currently writing a winsock server-side socket in managed C++. After creating the LPWSAOVERLAPPED object and passing it to the WSASend function, I do not see where to delete it when the operation completes nonblocking (WSASend returns SOCKET_ERROR and WSAGetLastError() returns WSA_IO_PENDING). My current solution was to create a System::Threading::WaitHandle, get the unsafe pointer to the wait handle and pass that onto hEvent under the LPWSAOVERLAPPED object. However, this is causing unnessecary object creation since I do not really care about when the send operation is completed. On the other hand, I need a LPWSAOVERLAPPED object in order to make the operation complete non-blocking. Does anyone have any better solution to solve this? Here is my current code:
void Connectivity::ConnectionInformation::SendData(unsigned char data[], const int length)
{
if (isClosed || sendError)
return;
Monitor::Enter(this->sendSyncRoot);
try
{
LPWSAOVERLAPPED overlapped = OverlappedObjectPool::GetOverlapped();
WaitHandle ^ handle = gcnew ManualResetEvent(false);
IntPtr handlePointer = handle->SafeWaitHandle->DangerousGetHandle();
sendInfo->buf = (char*)data;
sendInfo->len = length;
overlapped->Internal = 0;
overlapped->InternalHigh = 0;
overlapped->Offset = 0;
overlapped->OffsetHigh = 0;
overlapped->Pointer = 0;
overlapped->hEvent = (void*)handlePointer; //Set pointer
if (WSASend(connection, sendInfo, 1, NULL, 0, overlapped, NULL) == SOCKET_ERROR)
{
if (WSAGetLastError() == WSA_IO_PENDING)
{
ThreadPool::UnsafeRegisterWaitForSingleObject(handle, sentCallback, (IntPtr)((void*)overlapped), -1, true);
}
else
{
this->sendError = true;
//The send error bool makes sure that the close function doesn't get called
//during packet processing which could lead to a lot of null reffernce exceptions.
OverlappedObjectPool::GiveObject(overlapped);
}
}
else
{
handle->Close();
sentData((IntPtr)((void*)overlapped), false);
}
}
finally
{
Monitor::Exit(this->sendSyncRoot);
}
}

For async I/O, completion is notified either by the calling of a completion routine or by the queueing of an IOCP completion message to an IOCP completion queue. In both cases, it should be noted that the OVL struct should have the lifetime of at least the entire async operation, but can be longer if convenient:)
In the case of a completion routine, the unused hEvent parameter in the OVL can be used to transfer a pointer to an 'IOrequest' class instance that contains the data buffer/s, WSABUF array and the OVL struct as members, (and surely a pointer to the socket object for which the I/O has been issued). The OVL pointer is supplied as a parameter to the completion routine and so the hEvent can be retrieved and cast to the class type, so retrieving the complete class instance - OVL, data buffer etc. When the data has been processed, (or immediately in the completion routine the case of WSASend), and this IOrequest is eventually destroyed, (or repooled), the OVL will go with it. This sounds a bit incestuous, but works fine and does not need any nasty macro or other tricks.
A similar approach can be used with full IOCP or, alternatively, the OVL passed as the lpCompletionKey 'spare' parameter.
Oh - and you do care if the operation is completed - you need to at least check for errors.

Related

IOUserClientMethodArguments completion value is always NULL

I'm trying to use IOConnectCallAsyncStructMethod in order set a callback between a client and a driver in DriverKit for iPadOS.
This is how I call IOConnectCallAsyncStructMethod
ret = IOConnectCallAsyncStructMethod(connection, MessageType_RegisterAsyncCallback, masterPort, asyncRef, kIOAsyncCalloutCount, nullptr, 0, &outputAssignCallback, &outputSize);
Where asyncRef is:
asyncRef[kIOAsyncCalloutFuncIndex] = (io_user_reference_t)AsyncCallback;
asyncRef[kIOAsyncCalloutRefconIndex] = (io_user_reference_t)nullptr;
and AsyncCallback is:
static void AsyncCallback(void* refcon, IOReturn result, void** args, uint32_t numArgs)
{
const char* funcName = nullptr;
uint64_t* arrArgs = (uint64_t*)args;
ReadDataStruct* output = (ReadDataStruct*)(arrArgs + 1);
switch (arrArgs[0])
{
case 1:
{
funcName = "'Register Async Callback'";
} break;
case 2:
{
funcName = "'Async Request'";
} break;
default:
{
funcName = "UNKNOWN";
} break;
}
printf("Got callback of %s from dext with returned data ", funcName);
printf("with return code: 0x%08x.\n", result);
// Stop the run loop so our program can return to normal processing.
CFRunLoopStop(globalRunLoop);
}
But IOConnectCallAsyncStructMethod is always returning kIOReturnBadArgument and I can see that when the method:
kern_return_t MyDriverClient::ExternalMethod(uint64_t selector, IOUserClientMethodArguments* arguments, const IOUserClientMethodDispatch* dispatch, OSObject* target, void* reference) {
kern_return_t ret = kIOReturnSuccess;
if (selector < NumberOfExternalMethods)
{
dispatch = &externalMethodChecks[selector];
if (!target)
{
target = this;
}
}
return super::ExternalMethod(selector, arguments, dispatch, target, reference);
is called, in IOUserClientMethodArguments* arguments, completion is completion =(OSAction •) NULL
This is the IOUserClientMethodDispatch I use to check the values:
[ExternalMethodType_RegisterAsyncCallback] =
{
.function = (IOUserClientMethodFunction) &Mk1dDriverClient::StaticRegisterAsyncCallback,
.checkCompletionExists = true,
.checkScalarInputCount = 0,
.checkStructureInputSize = 0,
.checkScalarOutputCount = 0,
.checkStructureOutputSize = sizeof(ReadDataStruct),
},
Any idea what I'm doing wrong? Or any other ideas?
The likely cause for kIOReturnBadArgument:
The port argument in your method call looks suspicious:
IOConnectCallAsyncStructMethod(connection, MessageType_RegisterAsyncCallback, masterPort, …
------------------------------------------------------------------------------^^^^^^^^^^
If you're passing the IOKit main/master port (kIOMasterPortDefault) into here, that's wrong. The purpose of this argument is to provide a notification Mach port which will receive the async completion message. You'll want to create a port and schedule it on an appropriate dispatch queue or runloop. I typically use something like this:
// Save this somewhere for the entire time you might receive notification callbacks:
IONotificationPortRef notify_port = IONotificationPortCreate(kIOMasterPortDefault);
// Set the GCD dispatch queue on which we want callbacks called (can be main queue):
IONotificationPortSetDispatchQueue(notify_port, callback_dispatch_queue);
// This is what you pass to each async method call:
mach_port_t callback_port = IONotificationPortGetMachPort(notify_port);
And once you're done with the notification port, make sure to destroy it using IONotificationPortDestroy().
It looks like you might be using runloops. In that case, instead of calling IONotificationPortSetDispatchQueue, you can use the IONotificationPortGetRunLoopSource function to get the notification port's runloop source, which you can then schedule on the CFRunloop object you're using.
Some notes about async completion arguments:
You haven't posted your DriverKit side AsyncCompletion() call, and at any rate this isn't causing your immediate problem, but will probably blow up once you fix the async call itself:
If your async completion passes only 2 user arguments, you're using the wrong callback function signature on the app side. Instead of IOAsyncCallback you must use the IOAsyncCallback2 form.
Also, even if you are passing 3 or more arguments where the IOAsyncCallback form is correct, I believe this code technically triggers undefined behaviour due to aliasing rules:
uint64_t* arrArgs = (uint64_t*)args;
ReadDataStruct* output = (ReadDataStruct*)(arrArgs + 1);
switch (arrArgs[0])
The following would I think be correct:
ReadDataStruct* output = (ReadDataStruct*)(args + 1);
switch ((uintptr_t)args[0])
(Don't cast the array pointer itself, cast each void* element.)
Notes about async output struct arguments
I notice you have a struct output argument in your async method call, with a buffer that looks fairly small. If you're planning to update that with data on the DriverKit side after the initial ExternalMethod returns, you may be in for a surprise: an output struct arguments that is not passed as IOMemoryDescriptor will be copied to the app side immediately on method return, not when the async completion is triggered.
So how do you fix this? For very small data, pass it in the async completion arguments themselves. For arbitrarily sized byte buffers, the only way I know of is to ensure the output struct argument is passed via IOMemoryDescriptor, which can be persistently memory-mapped in a shared mapping between the driver and the app process. OK, how do you pass it as a memory descriptor? Basically, the output struct must be larger than 4096 bytes. Yes, this essentially means that if you have to make your buffer unnaturally large.

Where does pIOContextForward member set to not NULL value?

I am trying to use this Windows IOCP sample code as a starting point in my own IOCP server development.
There is a structure _PER_IO_CONTEXT in IocpServer.h.
//
// data to be associated for every I/O operation on a socket
//
typedef struct _PER_IO_CONTEXT {
WSAOVERLAPPED Overlapped;
char Buffer[MAX_BUFF_SIZE];
WSABUF wsabuf;
int nTotalBytes;
int nSentBytes;
IO_OPERATION IOOperation;
SOCKET SocketAccept;
struct _PER_IO_CONTEXT *pIOContextForward;
} PER_IO_CONTEXT, *PPER_IO_CONTEXT;
It's used in another structure _PER_SOCKET_CONTEXT.
//
// data to be associated with every socket added to the IOCP
//
typedef struct _PER_SOCKET_CONTEXT {
SOCKET Socket;
LPFN_ACCEPTEX fnAcceptEx;
//
//linked list for all outstanding i/o on the socket
//
PPER_IO_CONTEXT pIOContext;
struct _PER_SOCKET_CONTEXT *pCtxtBack;
struct _PER_SOCKET_CONTEXT *pCtxtForward;
} PER_SOCKET_CONTEXT, *PPER_SOCKET_CONTEXT;
From comments we can guess that pIOContext could be used as linked list, pIOContextForward member serves for this purpose.
And it is even used during resources cleanup in IocpServer.Cpp:
//
// Free all i/o context structures per socket
//
pTempIO = (PPER_IO_CONTEXT)(lpPerSocketContext->pIOContext);
do {
pNextIO = (PPER_IO_CONTEXT)(pTempIO->pIOContextForward);
if( pTempIO ) {
//
//The overlapped structure is safe to free when only the posted i/o has
//completed. Here we only need to test those posted but not yet received
//by PQCS in the shutdown process.
//
if( g_bEndServer )
while( !HasOverlappedIoCompleted((LPOVERLAPPED)pTempIO) ) Sleep(0);
xfree(pTempIO);
pTempIO = NULL;
}
pTempIO = pNextIO;
} while( pNextIO );
But pIOContextForward member is never set to anything except NULL.
May be pIOContextForward implicitly set during operations on overlapped structures?
May be this member was assumed to be used, but code is not complete?
I want to understand how this code will handle multiple asynchronous tasks on one socket and it seams that pIOContextForward should be used to implement such functionality.
So my question is how pIOContextForward member is assigned with it's corresponding value?
And if this code is not complete, how can I elaborate it?

How to access user-context data set on epoll when calling epoll_wait

Below I add sockets to epoll and set an application-context index within epoll_event.data.u32.
When receiving packets, recv() requires the socket file descriptor. In all the examples events[i].data.fd is used.
However, events[i].data.fd and events[i].data.u32 are in a union, so how do I also access my user-context index events[i].data.u32? It looks like it is overwritten with the socket file descriptor?
// Initially
int epollFd = epoll_create1(0);
// Adding each socket, along with a user-context index for callbacks
struct epoll_event event;
event.events = EPOLLIN;
event.data.u32 = callbackIndex; // Here is the user-defined index
int sock = createSocket(port, address);
assert(epoll_ctl(epollFd, EPOLL_CTL_ADD, sock, &event));
// Later when receiving packets
struct epoll_event events[MAX_EVENTS];
while (true)
{
int event_count = epoll_wait(epollFd, events, MAX_EVENTS, 30000);
for (i = 0; i < event_count; i++)
{
int n = recv(events[i].data.fd, &buffer[0], sizeof(buffer), flags);
// How do I access the user-context index I set when adding the socket to epoll?
}
}
You tell epoll_ctl() which socket descriptor you want to listen for events for, and provide an epoll_event struct to associate with that listen operation.
Whenever epoll_wait() detects a registered event on a socket, it gives you back only the epoll_event struct that you had provided for that event, exactly as you had provided it. It does not tell you which socket triggered the event.
So, if you want to discover the socket, you have to either:
store the socket descriptor itself in the epoll_event, but then you can't use any other user-defined data.
store the socket descriptor somewhere else (ie, in an array, an object pool, etc) and then put identifying information needed to get back to the socket descriptor as user-defined data in the epoll_event (ie, array index, object pointer, etc).
Whatever you put in the epoll_event when calling epoll_ctl() is what you will get back from epoll_wait(). No more, no less.
Design of epoll is beautifully simple. The role of epoll_data_t is to provide lightweight mapping rather than storage. Notice that it has a void* ptr member, which allows you to map from fd (passed to epoll_ctl) to anything.
In your particular case, you could allocate a struct Context { int fd; uint32_t index; /*...*/ }; on the heap and point to that structure on EPOLL_CTL_ADD. You would have to also deallocate it after calling EPOLL_CTL_DEL by some object (e.g. container) which owns that context.
Since you are using C++, you could store a pointer to an abstract EventListener base class, reinterpret_cast from void* after epoll_wait to that class and dispatch event to an arbitrary derived handler.

How do you pause a thread?

Hello i'm trying to pause a thread, but for some reason it keeps crashing the game.
here is what i got
void Test(){
SuspendThread((PVOID)0x83593C24);//0x83593C24 The offset from the game
Scr_AddInt(1);
ResumeThread((PVOID)0x83593C24);
}
Basically i'm trying to pause than call Add Int than resume it
You need to use the thread handle that was returned when you created the thread. See documentation for CreateThread; SuspendThread; and ResumeThread.
In particular, from the documentation for CreateThread:
If the function succeeds, the return value is a handle to the new thread. If the function fails, the return value is NULL.
Example:
HANDLE thread_handle = CreateThread(/*args*/); // hold on to this value (and check for failure)
if (thread_handle == NULL)
{
// handle creation error
}
DWORD suspend_retval = SuspendThread(thread_handle);
if (suspend_retval == static_cast<DWORD>(-1))
{
// handle suspend error
}
Scr_AddInt(1); // original work
DWORD resume_retval = ResumeThread(thread_handle);
if (resume_retval == static_cast<DWORD>(-1))
{
// handle resume error
}
It may be worthwhile to create a wrapper class that encapsulates thread creation, suspension, resumption, and termination. This class can perform all error checking internally, and throw an exception when appropriate.

Serial asynchronous I/O in Windows 7/64

I have a multi-threaded Windows program which is doing serial port asynchronous I/O through "raw" Win API calls. It is working perfectly fine on any Windows version except Windows 7/64.
The problem is that the program can find and setup the COM port just fine, but it cannot send nor receive any data. No matter if I compile the binary in Win XP or 7, I cannot send/receive on Win 7/64. Compatibility mode, run as admin etc does not help.
I have managed to narrow down the problem to the FileIOCompletionRoutine callback. Every time it is called, dwErrorCode is always 0, dwNumberOfBytesTransfered is always 0. GetOverlappedResult() from inside the function always return TRUE (everything ok). It seems to set the lpNumberOfBytesTransferred correctly. But the lpOverlapped parameter is corrupt, it is a garbage pointer pointing at garbage values.
I can see that it is corrupt by either checking in the debugger what address the correct OVERLAPPED struct is allocated at, or by setting a temp. global variable to point at it.
My question is: why does this happen, and why does it only happen on Windows 7/64? Is there some issue with calling convention that I am not aware of? Or is the overlapped struct treated differently somehow?
Posting relevant parts of the code below:
class ThreadedComport : public Comport
{
private:
typedef struct
{
OVERLAPPED overlapped;
ThreadedComport* caller; /* add user data to struct */
} OVERLAPPED_overlap;
OVERLAPPED_overlap _send_overlapped;
OVERLAPPED_overlap _rec_overlapped;
...
static void WINAPI _send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
static void WINAPI _receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped);
...
};
Open/close is done in a base class that has no multi-threading nor asynchronous I/O implemented:
void Comport::open (void)
{
char port[20];
DCB dcbCommPort;
COMMTIMEOUTS ctmo_new = {0};
if(_is_open)
{
close();
}
sprintf(port, "\\\\.\\COM%d", TEXT(_port_number));
_hcom = CreateFile(port,
GENERIC_READ | GENERIC_WRITE,
0,
0,
OPEN_EXISTING,
0,
0);
if(_hcom == INVALID_HANDLE_VALUE)
{
// error handling
}
GetCommTimeouts(_hcom, &_ctmo_old);
ctmo_new.ReadTotalTimeoutConstant = 10;
ctmo_new.ReadTotalTimeoutMultiplier = 0;
ctmo_new.WriteTotalTimeoutMultiplier = 0;
ctmo_new.WriteTotalTimeoutConstant = 0;
if(SetCommTimeouts(_hcom, &ctmo_new) == FALSE)
{
// error handling
}
dcbCommPort.DCBlength = sizeof(DCB);
if(GetCommState(_hcom, &(DCB)dcbCommPort) == FALSE)
{
// error handling
}
// setup DCB, this seems to work fine
dcbCommPort.DCBlength = sizeof(DCB);
dcbCommPort.BaudRate = baudrate_int;
if(_parity == PAR_NONE)
{
dcbCommPort.fParity = 0; /* disable parity */
}
else
{
dcbCommPort.fParity = 1; /* enable parity */
}
dcbCommPort.Parity = (uint8)_parity;
dcbCommPort.ByteSize = _databits;
dcbCommPort.StopBits = _stopbits;
SetCommState(_hcom, &(DCB)dcbCommPort);
}
void Comport::close (void)
{
if(_hcom != NULL)
{
SetCommTimeouts(_hcom, &_ctmo_old);
CloseHandle(_hcom);
_hcom = NULL;
}
_is_open = false;
}
The whole multi-threading and event handling mechanism is rather complex, relevant parts are:
Send
result = WriteFileEx (_hcom, // handle to output file
(void*)_write_data, // pointer to input buffer
send_buf_size, // number of bytes to write
(LPOVERLAPPED)&_send_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_send_callback);
Receive
result = ReadFileEx (_hcom, // handle to output file
(void*)_read_data, // pointer to input buffer
_MAX_MESSAGE_LENGTH, // number of bytes to read
(OVERLAPPED*)&_rec_overlapped, // pointer to async. i/o data
(LPOVERLAPPED_COMPLETION_ROUTINE )&_receive_callback);
Callback functions
void WINAPI ThreadedComport::_send_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
_this->_data_sent = dwNumberOfBytesTransfered;
}
}
SetEvent(lpOverlapped->hEvent);
}
void WINAPI ThreadedComport::_receive_callback (DWORD dwErrorCode,
DWORD dwNumberOfBytesTransfered,
LPOVERLAPPED lpOverlapped)
{
if(dwErrorCode == 0) // no errors
{
if(dwNumberOfBytesTransfered > 0)
{
ThreadedComport* _this = ((OVERLAPPED_overlap*)lpOverlapped)->caller;
_this->_bytes_read = dwNumberOfBytesTransfered;
}
}
SetEvent(lpOverlapped->hEvent);
}
EDIT
Updated: I have spent most of the day on the theory that the OVERLAPPED variable went out of scope before the callback is executed. I have verified that this never happens and I have even tried to declare the OVERLAPPED struct as static, same problem remains. If the OVERLAPPED struct had gone out of scope, I would expect the callback to point at the memory location where the struct was previously allocated, but it doesn't, it points somewhere else, at an entirely unfamiliar memory location. Why it does that, I have no idea.
Maybe Windows 7/64 makes an internal hardcopy of the OVERLAPPED struct? I can see how that would cause this behavior, since I am relying on additional parameters sneaked in at the end of the struct (which seems like a hack to me, but apparently I got that "hack" from official MSDN examples).
I have also tried to change calling convention but this doesn't work at all, if I change it then the program crashes. (The standard calling convention causes it to crash, whatever standard is, cdecl? __fastcall also causes a crash.) The calling conventions that work are __stdcall, WINAPI and CALLBACK. I think these are all same names for __stdcall and I read somewhere that Win 64 ignores that calling convention anyhow.
It would seem that the callback is executed because of some "spurious disturbance" in Win 7/64 generating false callback calls with corrupt or irrelevant parameters.
Multi-thread race conditions is another theory, but in the scenario I am running to reproduce the bug, there is only one thread, and I can confirm that the thread calling ReadFileEx is the same one that is executing the callback.
I have found the problem, it turned out to be annoyingly simple.
In CreateFile(), I did not specify FILE_FLAG_OVERLAPPED. For reasons unknown, this was not necessary on 32-bit Windows. But if you forget it on 64-bit Windows, it will apparently still generate callbacks with the FileIOCompletionRoutine, but they have corrupted parameters.
I haven't found any documentation of this change of behavior anywhere; perhaps it was just an internal bug fix in Windows, since the older documentation also specifies that you must have FILE_FLAG_OVERLAPPED set.
As for my specific case, the bug appeared because I had a base class that assumed synchronous I/O, which has then been inherited by a class using asynchronous I/O.