I have a real old code from a service that uses named named pipes in message mode (PIPE_TYPE_MESSAGE) with overlapped i/o (FILE_FLAG_OVERLAPPED).
The code does the following:
ReadFile for 4 bytes with overlapped i/o (header + message length). The Client wrote this command with one call to WriteFile.
After the 4 bytes are read, a ReadFile call is made to read the rest of the message (with the known length) without specifying an OVERLAPPED structure.
After the command is executed the routine continues at stage 1. and awaits the next command.
When I read the documentation
Overlapped operations require a file, named pipe, or communications device that was created with the FILE_FLAG_OVERLAPPED flag. When a thread calls a function (such as the ReadFile function) to perform an overlapped operation, the calling thread must specify a pointer to an OVERLAPPED structure. (If this pointer is NULL, the function return value may incorrectly indicate that the operation completed.)
I have to assume that this code will not work or is at least be called incorrect...
In fact this code is 15 years old and runs on hundreds of machines and it works without problems.
So do I have to tell my boss and colleagues that this code is buggy and it is just luck that it works and this code need to be corrected?
yes, this code is incorrect but can work without errors.
ReadFile call ZwReadFile. it 5th parameter - IoStatusBlock - Pointer to an IO_STATUS_BLOCK structure - is mandatory and always must be not 0. for any file. for any I/O type. so ReadFile must pass pointer to IO_STATUS_BLOCK when it call ZwReadFile. if you pass not 0 pointer to OVERLAPPED - it pass this pointer as IO_STATUS_BLOCK to ZwReadFile. the first 2 members of OVERLAPPED corresponded to IO_STATUS_BLOCK. but if you pass 0 in place pointer of OVERLAPPED - ReadFile allocate local IO_STATUS_BLOCK iosb variable and pass it to ZwReadFile. the IO_STATUS_BLOCK memory must be valid until I/O not completed. because at the end of I/O kernel write final status to this memory. from another side - if you local variable for IO_STATUS_BLOCK - it became not valid (point to arbitrary memory in stack) after ReadFile return. in case synchronous I/O - no problem here, because ReadFile not return until I/O not completed. but in case asynchronous I/O - this was UB - ReadFile can return while I/O still in progress. so IO_STATUS_BLOCK became not valid before I/O end. and when I/O actually ended - will be overwritten memory at arbitrary place in your thread stack. this can have no any effect or can corrupt your stack. undefinded. depend from where you be (which stack pointer value) at this time
After reading more docs, I have to conclude that yes, the code can be said to be incorrect. Particularly these docs on ReadFile:
A pointer to an OVERLAPPED structure is required if the hFile parameter was opened with FILE_FLAG_OVERLAPPED, otherwise it can be NULL.
If hFile is opened with FILE_FLAG_OVERLAPPED, the lpOverlapped parameter must point to a valid and unique OVERLAPPED structure, otherwise the function can incorrectly report that the read operation is complete.
Bear in mind that the MS docs are updated over time, and the docs may have been less clear or incomplete at the time the code was written.
It probably happens to work because the pipe is in message mode. It's strange to have a length prefix header with message mode pipes in the first place, because the message mode handles message boundaries for you. In this specific scenario, the entire message would already be in the local OS so the incorrect synchronous read would always succeed.
Indeed, the protocol (with a separate header) sounds like it was designed to work with a byte stream abstraction such as a byte mode pipe or TCP/IP connection. It is possible that the protocol was originally designed for byte mode pipes and they switched to message mode pipes when it wasn't working (the synchronous ReadFile may behave unexpectedly on a byte mode pipe since the message may not be completely present yet).
Related
There are two reasons that can cause GetQueuedCompletionStatus() to fail (return FALSE), the first is because the completion port handle associated with it is closed while the call is outstanding, this will be the case if lpOverlapped is NULL.
The second reason (which is the one I care about) is if the IO operation (for example: WSARecv()) fails. This is what the documentation says about this situation:
If *lpOverlapped is not NULL and the function dequeues a completion
packet for a failed I/O operation from the completion port, the
function stores information about the failed operation in the
variables pointed to by lpNumberOfBytes, lpCompletionKey, and
lpOverlapped. To get extended error information, call GetLastError.
I do not find this to be very clear as to what the values of lpNumberOfBytes, lpCompletionKey, and lpOverlapped will be. Will these parameters contain the same values that I supplied when calling WSARecv()? I suppose that this is more likely because how else am I suppose to know what IO operation caused the failure!
If an I/O operation fails then lpCompletionKey and lpOverlapped will be the values that were supplied when you initiated the I/O operation using whichever API was used (WSASend(), WSARecv(), etc.). This is how you identify the 'per device' data and the 'per operation' data for the I/O operation in question.
lpNumberOfBytes is likely to be zero in error situations though I tend to deal with it the same as for the non error case as I never use the resulting value (or the buffer contents) during error handling anyway.
I want to use Overlapped I/O in my server, but I am unable to find many tutorials on the subject (most of the tutorials are about Overlapped I/O with Completion Ports, and I want to use a callback function).
My server will have a maximum of 400 clients connected at one time, and it only send and receive data at a long periods of time (each 30 seconds a few kilobytes worth of data is exchanged between the server and the clients).
The main reason why I want to use Overlapped I/O is because select() can only handle a maximum of 64 sockets (and I have 400!).
So I will tell you how I understand Overlapped I/O and correct me if I'm wrong:
If I want to receive data from one of the clients, I use WSARecv() and supply the socket handle, and a buffer to be filled with the received data, and also I supply a callback function. When the data is received and filled in the buffer, the callback function will be called, and I can process the data.
When I want to send data I use WSASend(), I also supply the socket handle and the callback function, and when the data is sent (not sure if when placed in the underlying sent buffer or actually placed on the wire), the callback will also be called telling me that data was sent, and I can send the next piece of data.
The one misunderstanding you appear to have is that OVERLAPPED callbacks are actually synchronous.
You said:
When the data is received and filled in the buffer, the callback function will be called
Reality:
When a call is made to an alertable wait function (e.g. SleepEx or MsgWaitForMultipleObjectsEx), if data has been received and filled in the buffer, the callback function will be called
As long as you are aware of that, you should be in good shape. I agree with you that overlapped I/O with callbacks is a great approach in your scenario. Because callbacks occur on the thread performing the I/O, you don't have to worry about synchronizing access from multiple threads, the way you would need to with completion ports and work items on the thread pool.
Oh, also make sure to check for WSA_IO_PENDING, because it's possible for operations to complete synchronously, if there's enough data already buffered (for receive) or enough space in the buffer (for send). In this case the callback will occur, but it is queued for the next alertable wait, it never runs immediately. Certain errors will be reported synchronously also. Others will come to your callback.
Also, it's guaranteed that your callback gets queued exactly once for every operation that returned 0 or WSA_IO_PENDING, whether that operation completes successfully, is cancelled, or with some other error. You can't reuse the buffer until that callback has happened.
The IO completion callback mechanism works fine, I've used it a few times, no problem. In 32-bit systems, you can put the 'this' for the socket-context instance into the hEvent field of the OVERLAPPED struct and retreive it in the callback. Not sure how to do it in 64-bit systems:(
As we all know, an echo server is a server that reads from a socket, and writes that very data into another socket.
Since Windows I/O Completion ports give you different ways to do things, I was wondering what is the best way (the most efficient) to implement an echo server. I'm sure to find someone who tested the ways I will describe here, and can give his/her contribute.
My classes are Stream which abstracts a socket, named pipe, or whatever, and IoRequest which abstracts both an OVERLAPPED structure and the memory buffer to do the I/O (of course, suitable for both reading and writing). In this way when I allocate an IoRequest I'm simply allocating memory for memory buffer for data + OVERLAPPED structure in one shot, so I call malloc() only once.
In addition to this, I also implement fancy and useful things in the IoRequest object, such as an atomic reference counter, and so on.
Said that, let's explore the ways to do the best echo server:
-------------------------------------------- Method A. ------------------------------------------
1) The "reader" socket completes its reading, the IOCP callback returns, and you have an IoRequest just completed with the memory buffer.
2) Let's copy the buffer just received with the "reader" IoRequest to the "writer" IoRequest. (this will involve a memcpy() or whatever).
3) Let's fire again a new reading with ReadFile() in the "reader", with the same IoRequest used for reading.
4) Let's fire a new writing with WriteFile() in the "writer".
-------------------------------------------- Method B. ------------------------------------------
1) The "reader" socket completes its reading, the IOCP callback returns, and you have an IoRequest just completed with the memory buffer.
2) Instead of copying data, pass that IoRequest to the "writer" for writing, without copying data with memcpy().
3) The "reader" now needs a new IoRequest to continue reading, allocate a new one or pass one already allocated before, maybe one just completed for writing before the new writing does happen.
So, in the first case, every Stream objects has its own IoRequest, data is copied with memcpy() or similar functions, and everything works fine.
In the second case the 2 Stream objects do pass IoRequest objects each other, without copying data, but its a little bit more complex, you have to manage the "swapping" of IoRequest objects between the 2 Stream objects, with the possible drawback to get synchronization problems (what about those completions do happen in different threads?)
My questions are:
Q1) Is avoiding copying data really worth it!?
Copying 2 buffers with memcpy() or similar, is very fast, also because the CPU cache is exploited for this very purpose.
Let's consider that with the first method, I have the possibility to echo from a "reader" socket to multiple "writer" sockets, but with the second one I can't do that, since I should create N new IoRequest objects for each N writers, since each WriteFile() needs its own OVERLAPPED structure.
Q2) I guess that when I fire a new N writings for N different sockets with WriteFile(), I have to provide N different OVERLAPPED structure AND N different buffers where to read the data.
Or, I can fire N WriteFile() calls with N different OVERLAPPED taking the data from the same buffer for the N sockets?
Is avoiding copying data really worth it!?
Depends on how much you are copying. 10 bytes, not so much. 10MB, then yes, it's worth avoiding the copying!
In this case, since you already have an object that contains the rx data and an OVERLAPPED block, it seems somewhat pointless to copy it - just reissue it to WSASend(), or whatever.
but with the second one I can't do that
You can, but you need to abstract the 'IORequest' class from a 'Buffer' class. The buffer holds the data, an atomic int reference-count and any other management info for all calls, the IOrequest the OVERLAPPED block and a pointer to the data and any other management information for each call. This information could have an atomic int reference-count for the buffer object.
The IOrequest is the class that is used for each send call. Since it contains only a pointer to the buffer, there is no need to copy the data and so it's reasonably small and O(1) to data size.
When the tx completions come in, the handler threads get the IOrequest, deref the buffer and dec the atomic int in it towards zero. The thread that manages to hit 0 knows that the buffer object is no longer needed and can delete it, (or, more likely, in a high-performance server, repool it for later reuse).
Or, I can fire N WriteFile() calls with N different OVERLAPPED taking
the data from the same buffer for the N sockets?
Yes, you can. See above.
Re. threading - sure, if your 'management data' can be reached from multiple completion-handler threads, then yes, you may want to protect it with a critical-section, but an atomic int should do for the buffer refcount.
How win32 manages instances of OVERLAPPED struct in context of two functions:
GetQueuedCompletionStatus
PostQueuedCompletionStatus
When I call GetQueuedCompletionStatus does win32 free instance of OVERLAPPED struct or I must do it by my self?
When I send data with PostQueuedCompletionStatus does win32 copy it to internal structs? When I must free memory of sent data?
Where I could find some picture with scheme of processing of OVERLAPPED data between GetQueuedCompletionStatus, PostQueuedCompletionStatus and IOCP queue?
The OVERLAPPED structure must exist from when a successful I/O operation (or manual PostQueuedCompletionStatus()) executes until the OVERLAPPED emerges from a call to GetQueuedCompletionStatus().
You are responsible for the lifetime of the structure.
You'll see from the MSDN docs that GetQueuedCompletionStatus() actually takes "a pointer to a variable that receives the address of the OVERLAPPED structure that was specified when the completed I/O operation was started.". What you actually get out of that call is a pointer to the original OVERLAPPED that you passed when you made the PostQueuedCompletionStatus() call (or initiated an overlapped I/O operation).
This is all actually very useful as the "normal" way to use the OVERLAPPED structure is to place it inside a larger structure which holds all of the 'per operation' information that you might need - so it's the ideal way to navigate directly from the limited information that you're given when you call GetQueuedCompletionStatus() to, for example, the data buffer that you used in your overlapped read call...
I find the best way to deal with OVERLAPPED structures is to a) embed them in the buffer you're using for read/write b) reference count them and c) return them to a pool for reuse when the ref count drops to 0.
I have some source code that you could download (here) which may make this a little easier to understand (it's a full IOCP server example so it's a little complex, but it works and shows how these things can be used).
You should pass a the address of a OVERLAPPED * to GetQueuedCompletionStatus. This gets filled in with the value passed to PostQueuedCompletionStatus.
You should not free this data in the PostQueuedCompletionStatus context. It should be done by the context using GetQueuedCompletionStatus. (Assuming it was allocated dynamically in the first place - there is no requirement that it is a dynamically allocated structure, it could be taken out of a fixed pool, or be allocated on the stack of a function that doesn't return until it has been signalled that the operation is complete).
I'm not sure there is such a picture.
This question already has an answer here:
Calling WSAGetLastError() from an IOCP thread return incorrect result
(1 answer)
Closed 7 years ago.
I'm writing a tcp server in Windows NT using completion ports to exploit asynchronous I/O.
I have a TcpSocket class, a TcpServer class and some (virtual functions) callbacks to call when an I/O operation is completed, e.g. onRead() for when a read is completed. I have also onOpen() for when the connection is established and onEof() for when the connection is closed, and so on.
I always have a pending read for the socket, so if the socket effectively gets data (the read will be completed with size > 0) it calls onRead(), instead if the client closes the socket from the client side (the read will be completed with size == 0) it calls onEof(), and the server is aware of when the client closes the socket with closesocket(server_socket); from its side.
All works gracefully, but I have noticed a thing:
when i call closesocket(client_socket); on the server's side endpoint of the connection, instead of the client side, (either with setting linger {true, 0} or not), the pending read will be completed as erroneous,
that is, the read size will not only be == 0, but also GetLastError() returns an error: 64, or 'ERROR_NETNAME_DELETED'. I have searched much about this on the web, but didn't find nothing interesting.
Then I asked myself: but is this a real error? I mean, can this really be considered an error?
The problem is that on the server side, the onError() callback will be called when I closesocket(client_socket); instead of the onEof(). So I thought this:
What about if I, when this 'ERROR_NETNAME_DELETED' "error" is received, call onEof() instead of onError() ?
Would that introduce some bugs or undefined behavior?
Another important point that made me ask this question is this:
When I have received this read completion with 'ERROR_NETNAME_DELETED', I have checked the OVERLAPPED
structure, in particular the overlapped->Internal parameter which contain the NTSTATUS error code
of the underlying driver. If we see a list of NTSTATUS error codes [ http://www.tenox.tc/links/ntstatus.html ]
we can clearly see that the 'ERROR_NETNAME_DELETED' is generated by the NTSTATUS 0xC000013B, which is an error, but it is called 'STATUS_LOCAL_DISCONNECT'. Well, it doesn't look like a name for an error. It seems more like `ERROR_IO_PENDING' which is an error, but also a status for a correct behavior.
So what about checking the OVERLAPPED structure's Internal parameter, and when this is == to 'STATUS_LOCAL_DISCONNECT' a call to the onEof() callback is performed? Would mess things up?
In addition, I have to say that from the server side, if I call DisconnectEx() before calling
closesocket(client_socket); I will not receive that error. But what about I don't want to call DisconnectEx() ? E.g. when the server is shutting down and doesn't want to wait all DisconnectEx() completions, but just want to close all client's connected.
It's entirely up to you how you treat an error condition. In your case this error condition is entirely to be expected, and it's perfectly safe for you to treat it as an expected condition.
Another example of this nature is when you call an API function but don't know how large a buffer to provide. So you provide a buffer that you hope will be big enough. But if the API call fails, you then check that the last error is ERROR_INSUFFICIENT_BUFFER. That's an expected error condition. You can then try again with a larger buffer.
It's up to you how to treat an error condition, but the question is a sign of potential problems in your code (from logic errors to undefined behavior).
The most important point is that you shouldn't touch SOCKET handle after closesocket. What do you do on EOF? It would be logical to closesocket on our side when we detect EOF, but that's what you cannot do in ERROR_NETNAME_DELETED handler, because closesocket already happened and the handle is invalid.
It's also profitable to imagine what happens if pending read completes (with real data available) just before closesocket, and your application detects it right after closesocket. You handle incoming data and... Do you send an answer to the client using the same socket handle? Do you schedule the next read on that handle? It would be all wrong, and there would be no ERROR_NETNAME_DELETED to tell you about it.
What happens if pending read completes with EOF in that very unfortunate moment, just before closesocket? If your regular OnEof callback is fired, and that callback does closesocket, it would be wrong again.
The problem you describe might hint at more serious problem if closesocket is done in one thread, while another thread waits for I/O completion. Are you sure that another thread is not calling WSARecv/ReadFile while the first thread is calling closesocket? That's undefined behavior, even though winsock makes it look as if it worked most of the time.
To summarize, the code handling completing (or failing) reads cannot be correct if it's unaware of socket handle being useless because it was closed. After closesocket, it's useful to wait for pending I/O completion because you can't reuse OVERLAPPED structure if you don't; but there's no point in handling this kind of completion as if it happened during normal operation, with socket being still open (error/status code is irrelevant).
You're calling the wrong method. You should be calling WSAGetLastError(). The result of GetLastError() after a Winsock API call is meaningless.