TCP IOCP won't receive after acceptex - c++

I'm trying to write an IOCP server. Basically, I have it accepting new connections. For the purpose of my testing, I'm running and connecting to 127.0.0.1.
I create the pseudo socket prior to calling AcceptEx(). Once a connection is accepted, the new pseudo socket is used for communication. This new socket is associated with an io completion port [CreateIoCompletionPort], I then assign it a few options, [SO_EXCLUSIVEADDRUSE] and [SO_CONDITIONAL_ACCEPT], and then I call WSARecv() to accept incoming data.
The problem is that once my remote connection connects to the server, it sends data, but that data is never received. I'm wondering if someone could offer some ideas as to why it's not receiving data? Perhaps my logic is flawed? I stepped through my code several times. no errors are recorded.
EDIT: Fixed the wording. I create the socket before AcceptEx() call.
Basic logic in my code:
// Create socket, associate with IOCP
WSASocket(af, type, proto, lpProtoInfo, g, dwFlags);
HANDLE hIOCP = GetPool()->GetQueueHandle();
CreateIoCompletionPort(hSource, hIOCP, 0, 0) != NULL;
// Server bind and listen
bind(m_shSocket, pAddr, nAddrLen);
listen(m_shSocket, nBacklog);
// Creation of the pseudo socket
SOCKET s = ::WSASocket(m_iSocketAf, m_iSocketType, m_iSocketProto, m_pWpi, m_SocketGroup, m_dwSocketFlags);
DWORD dwBytes;
BOOL bRet = m_fnAcceptEx(m_shSocket, s, chOutput, 0, sizeof(SOCKADDR_STORAGE) + 16, sizeof(SOCKADDR_STORAGE) + 16, &dwBytes, m_pcbAccept);
// ... New Connection comes in, it's accepted ...
// Associate new pseudo socket with IOCP
HANDLE hNewIOCP = GetPool()->GetQueueHandle();
CreateIoCompletionPort((HANDLE) s, hNewIOCP , 0, 0) != NULL;
// ... Remote socket sends ...
// ... Remote socket and Pseudo socket call WSARecv ...
// ... Pseudo socket does not receive ...
NOTE: I tried sending from the pseudo socket to the remote socket, same problem as sending data in the reverse way.

You need to post some code but your description doesn't make sense. That's NOT how AcceptEx() based servers operate.
With an AcceptEx() based server you create your accepted socket before you post the AcceptEx(). You then post the AcceptEx() with the listening socket and the new socket and a buffer which allows you to receive the remote address and, optionally, data.
So if you are describing your code in your original question then your code is wrong or you're not using AcceptEx(). I'm currently ignoring the 'few options' that you throw into the mix as they simply further confuse things at present without any code to analyse.
You might be interested in downloading my free IOCP based server framework, which includes working AcceptEx() and traditional Accept() based server code. You can get it from here: http://www.serverframework.com/products---the-free-framework.html

Are you calling GetQueuedCompletionStatus to get the data?
In case you are not doing this just to learn for yourself, I would also recommend that you use boost::asio - an excellent library that allows you to let someone else do the tedious code for handling the io completion ports.

I figured it out. I'm an idiot. I was sending zero bytes.

Related

UnrealEngine4: Recv function would keep blocking when TCP server shutdown

I use a blocking FSocket in client-side that connected to tcp server, if there's no message from server, socket thread would block in function FScoket::Recv(), if TCP server shutdown, socket thread is still blocking in this function. but when use blocking socket of BSD Socket API, thread would pass from recv function and return errno when TCP server shutdown, so is it the defect of FSocket?
uint32 HRecvThread::Run()
{
uint8* recv_buf = new uint8[RECV_BUF_SIZE];
uint8* const recv_buf_head = recv_buf;
int readLenSeq = 0;
while (Started)
{
//if (TcpClient->Connected() && ClientSocket->GetConnectionState() != SCS_Connected)
//{
// // server disconnected
// TcpClient->SetConnected(false);
// break;
//}
int32 bytesRead = 0;
//because use blocking socket, so thread would block in Recv function if have no message
ClientSocket->Recv(recv_buf, readLenSeq, bytesRead);
.....
//some logic of resolution for tcp msg bytes
.....
}
delete[] recv_buf;
return 0
}
As I expected, you are ignoring the return code, which presumably indicates success or failure, so you are looping indefinitely (not blocking) on an error or end of stream condition.
NB You should allocate the recv_buf on the stack, not dynamically. Don't use the heap when you don't have to.
There is a similar question on the forums in the UE4 C++ Programming section. Here is the discussion:
https://forums.unrealengine.com/showthread.php?111552-Recv-function-would-keep-blocking-when-TCP-server-shutdown
Long story short, in the UE4 Source, they ignore EWOULDBLOCK as an error. The code comments state that they do not view it as an error.
Also, there are several helper functions you should be using when opening the port and when polling the port (I assume you are polling since you are using blocking calls)
FSocket::Connect returns a bool, so make sure to check that return
value.
FSocket::GetLastError returns the UE4 Translated error code if an
error occured with the socket.
FSocket::HasPendingData will return a value that informs you if it
is safe to read from the socket.
FSocket::HasPendingConnection can check to see your connection state.
FSocket::GetConnectionState will tell you your active connection state.
Using these helper functions for error checking before making a call to FSocket::Recv will help you make sure you are in a good state before trying to read data. Also, it was noted in the forum posts that using the non-blocking code worked as expected. So, if you do not have a specific reason to use blocking code, just use the non-blocking implementation.
Also, as a final hint, using FSocket::Wait will block until your socket is in a desirable state of your choosing with a timeout, i.e. is readable or has data.

zeromq: reset REQ/REP socket state

When you use the simple ZeroMQ REQ/REP pattern you depend on a fixed send()->recv() / recv()->send() sequence.
As this article describes you get into trouble when a participant disconnects in the middle of a request because then you can't just start over with receiving the next request from another connection but the state machine would force you to send a request to the disconnected one.
Has there emerged a more elegant way to solve this since the mentioned article has been written?
Is reconnecting the only way to solve this (apart from not using REQ/REP but use another pattern)
As the accepted answer seem so terribly sad to me, I did some research and have found that everything we need was actually in the documentation.
The .setsockopt() with the correct parameter can help you resetting your socket state-machine without brutally destroy it and rebuild another on top of the previous one dead body.
(yeah I like the image).
ZMQ_REQ_CORRELATE: match replies with requests
The default behaviour of REQ sockets is to rely on the ordering of messages to match requests and responses and that is usually sufficient. When this option is set to 1, the REQ socket will prefix outgoing messages with an extra frame containing a request id. That means the full message is (request id, 0, user frames…). The REQ socket will discard all incoming messages that don't begin with these two frames.
Option value type int
Option value unit 0, 1
Default value 0
Applicable socket types ZMQ_REQ
ZMQ_REQ_RELAXED: relax strict alternation between request and reply
By default, a REQ socket does not allow initiating a new request with zmq_send(3) until the reply to the previous one has been received. When set to 1, sending another message is allowed and has the effect of disconnecting the underlying connection to the peer from which the reply was expected, triggering a reconnection attempt on transports that support it. The request-reply state machine is reset and a new request is sent to the next available peer.
If set to 1, also enable ZMQ_REQ_CORRELATE to ensure correct matching of requests and replies. Otherwise a late reply to an aborted request can be reported as the reply to the superseding request.
Option value type int
Option value unit 0, 1
Default value 0
Applicable socket types ZMQ_REQ
A complete documentation is here
The good news is that, as of ZMQ 3.0 and later (the modern era), you can set a timeout on a socket. As others have noted elsewhere, you must do this after you have created the socket, but before you connect it:
zmq_req_socket.setsockopt( zmq.RCVTIMEO, 500 ) # milliseconds
Then, when you actually try to receive the reply (after you have sent a message to the REP socket), you can catch the error that will be asserted if the timeout is exceeded:
try:
send( message, 0 )
send_failed = False
except zmq.Again:
logging.warning( "Image send failed." )
send_failed = True
However! When this happens, as observed elsewhere, your socket will be in a funny state, because it will still be expecting the response. At this point, I cannot find anything that works reliably other than just restarting the socket. Note that if you disconnect() the socket and then re connect() it, it will still be in this bad state. Thus you need to
def reset_my_socket:
zmq_req_socket.close()
zmq_req_socket = zmq_context.socket( zmq.REQ )
zmq_req_socket.setsockopt( zmq.RCVTIMEO, 500 ) # milliseconds
zmq_req_socket.connect( zmq_endpoint )
You will also notice that because I close()d the socket, the receive timeout option was "lost", so it is important set that on the new socket.
I hope this helps. And I hope that this does not turn out to be the best answer to this question. :)
There is one solution to this and that is adding timeouts to all calls. Since ZeroMQ by itself does not really provide simple timeout functionality I recommend using a subclass of the ZeroMQ socket that adds a timeout parameter to all important calls.
So, instead of calling s.recv() you would call s.recv(timeout=5.0) and if a response does not come back within that 5 second window it will return None and stop blocking. I had made a futile attempt at this when I run into this problem.
I'm actually looking into this at the moment, because I am retro fitting a legacy system.
I am coming across code constantly that "needs" to know about the state of the connection. However the thing is I want to move to the message passing paradigm that the library promotes.
I found the following function : zmq_socket_monitor
What it does is monitor the socket passed to it and generate events that are then passed to an "inproc" endpoint - at that point you can add handling code to actually do something.
There is also an example (actually test code) here : github
I have not got any specific code to give at the moment (maybe at the end of the week) but my intention is to respond to the connect and disconnects such that I can actually perform any resetting of logic required.
Hope this helps, and despite quoting 4.2 docs, I am using 4.0.4 which seems to have the functionality
as well.
Note I notice you talk about python above, but the question is tagged C++ so that's where my answer is coming from...
Update: I'm updating this answer with this excellent resource here: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ Socket programming is complicated so do checkout the references in this post.
None of the answers here seem accurate or useful. The OP is not looking for information on BSD socket programming. He is trying to figure out how to robustly handle accept()ed client-socket failures in ZMQ on the REP socket to prevent the server from hanging or crashing.
As already noted -- this problem is complicated by the fact that ZMQ tries to pretend that the servers listen()ing socket is the same as an accept()ed socket (and there is no where in the documentation that describes how to set basic timeouts on such sockets.)
My answer:
After doing a lot of digging through the code, the only relevant socket options passed along to accept()ed socks seem to be keep alive options from the parent listen()er. So the solution is to set the following options on the listen socket before calling send or recv:
void zmq_setup(zmq::context_t** context, zmq::socket_t** socket, const char* endpoint)
{
// Free old references.
if(*socket != NULL)
{
(**socket).close();
(**socket).~socket_t();
}
if(*context != NULL)
{
// Shutdown all previous server client-sockets.
zmq_ctx_destroy((*context));
(**context).~context_t();
}
*context = new zmq::context_t(1);
*socket = new zmq::socket_t(**context, ZMQ_REP);
// Enable TCP keep alive.
int is_tcp_keep_alive = 1;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE, &is_tcp_keep_alive, sizeof(is_tcp_keep_alive));
// Only send 2 probes to check if client is still alive.
int tcp_probe_no = 2;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE_CNT, &tcp_probe_no, sizeof(tcp_probe_no));
// How long does a con need to be "idle" for in seconds.
int tcp_idle_timeout = 1;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE_IDLE, &tcp_idle_timeout, sizeof(tcp_idle_timeout));
// Time in seconds between individual keep alive probes.
int tcp_probe_interval = 1;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE_INTVL, &tcp_probe_interval, sizeof(tcp_probe_interval));
// Discard pending messages in buf on close.
int is_linger = 0;
(**socket).setsockopt(ZMQ_LINGER, &is_linger, sizeof(is_linger));
// TCP user timeout on unacknowledged send buffer
int is_user_timeout = 2;
(**socket).setsockopt(ZMQ_TCP_MAXRT, &is_user_timeout, sizeof(is_user_timeout));
// Start internal enclave event server.
printf("Host: Starting enclave event server\n");
(**socket).bind(endpoint);
}
What this does is tell the operating system to aggressively check the client socket for timeouts and reap them for cleanup when a client doesn't return a heart beat in time. The result is that the OS will send a SIGPIPE back to your program and socket errors will bubble up to send / recv - fixing a hung server. You then need to do two more things:
1. Handle SIGPIPE errors so the program doesn't crash
#include <signal.h>
#include <zmq.hpp>
// zmq_setup def here [...]
int main(int argc, char** argv)
{
// Ignore SIGPIPE signals.
signal(SIGPIPE, SIG_IGN);
// ... rest of your code after
// (Could potentially also restart the server
// sock on N SIGPIPEs if you're paranoid.)
// Start server socket.
const char* endpoint = "tcp://127.0.0.1:47357";
zmq::context_t* context;
zmq::socket_t* socket;
zmq_setup(&context, &socket, endpoint);
// Message buffers.
zmq::message_t request;
zmq::message_t reply;
// ... rest of your socket code here
}
2. Check for -1 returned by send or recv and catch ZMQ errors.
// E.g. skip broken accepted sockets (pseudo-code.)
while (1):
{
try
{
if ((*socket).recv(&request)) == -1)
throw -1;
}
catch (...)
{
// Prevent any endless error loops killing CPU.
sleep(1)
// Reset ZMQ state machine.
try
{
zmq::message_t blank_reply = zmq::message_t();
(*socket).send (blank_reply);
}
catch (...)
{
1;
}
continue;
}
Notice the weird code that tries to send a reply on a socket failure? In ZMQ, a REP server "socket" is an endpoint to another program making a REQ socket to that server. The result is if you go do a recv on a REP socket with a hung client, the server sock becomes stuck in a broken receive loop where it will wait forever to receive a valid reply.
To force an update on the state machine, you try send a reply. ZMQ detects that the socket is broken, and removes it from its queue. The server socket becomes "unstuck", and the next recv call returns a new client from the queue.
To enable timeouts on an async client (in Python 3), the code would look something like this:
import asyncio
import zmq
import zmq.asyncio
#asyncio.coroutine
def req(endpoint):
ms = 2000 # In milliseconds.
sock = ctx.socket(zmq.REQ)
sock.setsockopt(zmq.SNDTIMEO, ms)
sock.setsockopt(zmq.RCVTIMEO, ms)
sock.setsockopt(zmq.LINGER, ms) # Discard pending buffered socket messages on close().
sock.setsockopt(zmq.CONNECT_TIMEOUT, ms)
# Connect the socket.
# Connections don't strictly happen here.
# ZMQ waits until the socket is used (which is confusing, I know.)
sock.connect(endpoint)
# Send some bytes.
yield from sock.send(b"some bytes")
# Recv bytes and convert to unicode.
msg = yield from sock.recv()
msg = msg.decode(u"utf-8")
Now you have some failure scenarios when something goes wrong.
By the way -- if anyone's curious -- the default value for TCP idle timeout in Linux seems to be 7200 seconds or 2 hours. So you would be waiting a long time for a hung server to do anything!
Sources:
https://github.com/zeromq/libzmq/blob/84dc40dd90fdc59b91cb011a14c1abb79b01b726/src/tcp_listener.cpp#L82 TCP keep alive options preserved for client sock
http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ How does keep alive work
https://github.com/zeromq/libzmq/blob/master/builds/zos/README.md Handling sig pipe errors
https://github.com/zeromq/libzmq/issues/2586 for information on closing sockets
https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
https://github.com/zeromq/libzmq/issues/976
Disclaimer:
I've tested this code and it seems to be working, but ZMQ does complicate testing this a fair bit because the client re-connects on failure? If anyone wants to use this solution in production, I recommend writing some basic unit tests, first.
The server code could also be improved a lot with threading or polling to be able to handle multiple clients at once. As it stands, a malicious client can temporarily take up resources from the server (3 second timeout) which isn't ideal.

IOCP AcceptEx not creating completion upon connect

I am currently trying some new libraries (IOCP) for socket programming. And I've stumbled upon the AcceptEx functionality to enable async connections.
As the documentation says:
The AcceptEx function uses overlapped I/O, unlike the accept function. If your application uses AcceptEx, it can service a large number of clients with a relatively small number of threads. As with all overlapped Windows functions, either Windows events or completion ports can be used as a completion notification mechanism.
But I am not receving any completion when a client connects. I do however get a completion when the client sends data..
This is my code:
DWORD dwBytes;
GUID GuidAcceptEx = WSAID_ACCEPTEX;
int iResult = WSAIoctl(m_hSocket, SIO_GET_EXTENSION_FUNCTION_POINTER,
&GuidAcceptEx, sizeof (GuidAcceptEx),
&m_lpfnAcceptEx, sizeof (m_lpfnAcceptEx),
&dwBytes, NULL, NULL);
if (iResult == SOCKET_ERROR)
{
CloseSocket();
}
And then:
WSAOVERLAPPED olOverlap;
memset(&olOverlap, 0, sizeof (olOverlap));
char lpOutputBuf[1024];
int outBufLen = 1024;
DWORD dwBytes;
BOOL bRet = m_lpfnAcceptEx( m_hSocket, hSocket, lpOutputBuf,
outBufLen - ((sizeof (sockaddr_in) + 16) * 2),
sizeof (sockaddr_in) + 16, sizeof (sockaddr_in) + 16,
&dwBytes, &olOverlap);
if ( bRet == FALSE )
{
DWORD dwRet = WSAGetLastError();
if( dwRet != WSA_IO_PENDING )
{
return dwRet;
}
}
Any suggestion of what to do to receive completions?
EDIT:
I bind the hSocket to the completionport after m_lpfnAcceptEx()
Firstly the WSAOVERLAPPED and data buffer that you're declaring on the stack above your call to AcceptEx() will not be in existence when a completion occurs (unless you are calling GetQueuedCompletionStatus() in the same function, which would be a trifle odd). You need to dynamically allocate them or pool them.
Secondly you state that you associate the socket to the completion port after you call AcceptEx(). That's wrong. You need to do these things before you call AcceptEx().
Create a socket with WSA_FLAG_OVERLAPPED set.
Bind it to the address you want to listen on.
Call listen on it with your desired backlog.
Load AcceptEx() dynamically using the listening socket and a call to WSAIoctl (not strictly necessary and the code you show should work but this way you can be sure you get your listening socket from the same underlying winsock provider and that it supports AcceptEx().
Load GetAcceptExSockaddrs() in the same way as you load AcceptEx() - you'll need it once the accept completes.
Associate the listening socket to your IOCP.
Now you can post a number of AcceptEx() calls using the listening socket and new 'accept' socket which you create like this:
Create a socket with WSA_FLAG_OVERLAPPED set.
Associate the socket to your IOCP.
As stated above you need to ensure that the buffer and the OVERLAPPED are unique per call and last until the completion occurs.
When the completion occurs you have to do the following....
Call setsockopt() with SO_UPDATE_ACCEPT_CONTEXT on the accepted socket using the listening socket as the data...
Deblock your addresses using GetAcceptExSockaddrs().
Process any data (if you allocated enough space in the buffer for data).
Note that by design AcceptEx() can be used to accept a new connection and return the initial data from that connection in one operation (this leads to slightly better performance in situations where you know you will always want some data before you can start doing things but is horribly complex to manage if you want to defend aginst the denial of service attack that can be launched simply by connecting and NOT sending data - I wrote about this here).
If you do not want AcceptEx() to wait for data to arrive then simply provide a data buffer that is ONLY big enough for the addresses to be returned and pass 0 as the 'buffer size'. This will cause the AcceptEx() to operate like an overlaped accept() and return as soon as the connection is established.
Note that Martin James' initial comment to your question is in fact the answer you're looking for. Don't pass outBufLen - ((sizeof (sockaddr_in) + 16) * 2), pass 0.

Winsock not sending in a while loop

I am very new to networking and have an issue with sending messages during a while loop.
To my knowledge I should do something along the lines of this:
Create Socket()
Connect()
While
Do logic
Send()
End while
Close Socket()
However it sends once and returns -1 there after.
The code will only work when I create the socket in the loop.
While
Create Socket()
Connect()
Do logic
Send()
Close Socket()
End while
Here is a section of the code I am using but doesn't work:
//init winsock
WSAStartup(MAKEWORD(2, 0), &wsaData);
//open socket
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
//connect
memset(&serveraddr, 0, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = inet_addr(ipaddress);
serveraddr.sin_port = htons((unsigned short) port);
connect(sock, (struct sockaddr *) &serveraddr, sizeof(serveraddr));
while(true) {
if (send(sock, request.c_str(), request.length(), 0)< 0 /*!= request.length()*/) {
OutputDebugString(TEXT("Failed to send."));
} else {
OutputDebugString(TEXT("Activity sent."));
}
Sleep(30000);
}
//disconnect
closesocket(sock);
//cleanup
WSACleanup();
The function CheckForLastError() returns:10053
WSAECONNABORTED
Software caused connection abort.
An established connection was aborted by the software in your host computer, possibly due to a data transmission time-out or protocol error
Thanks
I have been looking for a solution to this problem too. I am having the same problem with my server. When trying to send a response from inside the loop, the client seems never to receive it.
As I understand the problem, according to user207421's suggestions, when you establish a connection between a client and a server, the protocol should have enough information to let the client know when the server has finished sending the response. If you see this example, you have a minimum HTTP server that responds to requests. In this case, you can use a browser or an application like Postman. And if you see the response message, you will see a header called Connection. Setting its value to close tells the client which one is the last message from the server for that request. The message is being sent, but the client keeps waiting, maybe because there is no closing element the client can recognize. I was also missing the Content-Length header. My HTTP response message was wrong, and the client was lost.
This diagram shows what needs to be outside the loop and what needs to be inside.
To understand how and why your program fails,you have to understand the functions you use.
Some of them are blocking functions and some are them not. Some of them need previous calles of other functions and some of them don't.
Now from what i understand we are talking about a client here,not a server.
The client has only non blocking functions in this case. That means that whenever you call a function,it will be executed without waiting.
So send() will send data the second it is called and the stream will go on to the next line of code.
If the information to be sent was not yet ready...you will have a problem,since nothing will be sent.
To solve it you could use some sort of a delay. The problem with delays is that they are Blocking functions meaning your stream will stop once it hits the delay. To solve it you can create a thread and lock it untill the information is ready to be sent.
But that would do the job for one send(). You will send the info and thats that.
If you want to hold the communication and send repeatedly info,you will need to create a while loop. once you have a while loop you dont have to worry about anything. That is because you can verify that the information is ready with a stream control and you can use send over and over again before terminating the connection.
Now the question is what is happening on the server side of things?
"ipaddress" should hold the ip of the server. The server might reject your request to connect.Or worst he might accept your request but he is listening with diffrent settings in relation to your client.Meaning that maybe the server is not reciving (does not have recv() function)information and you are trying to send info... that might resault in errors/crashes and what not.

Socket is invalid while hooking WSASend/WSARecv on the server

I am hooking WSASend, and WSARecv in C++ using the same method I've used to hook the client's WSASend and WSARecv functions. In the client I am able to get the IP, Port, and Socket from the SOCKET structure passed by WSASend/WSARecv; however, for the server when I try to use getpeername or getsockname() they both return the error 10057 (Socket not connected)...
I'm fairly sure that the hook is correct on the server, since it prints the bytes successfully, and I'm also sure the socket SHOULD be valid seeing how client and server establish a successful connection.
Is there a way to resolve this problem by any other alternative methods? I've been looking around the internet to find a solution, but I haven't seen anyone with the same problem.
I've tried this:
sockaddr *address = new sockaddr;
int peer_len;
getpeername(s, address, &peer_len);
int err = WSAGetLastError();
if(err==0)
{
char *Str = inet_ntoa(((sockaddr_in*)address)->sin_addr);
printf("[%s", Str);
printf(":%d]",ntohs(((sockaddr_in*)address)->sin_port));
}
else
{
printf("Error %i\n",err);
}
(Using both getpeername and getsockname)Both result in the same socket not connected error.
I'm planning on using the packets the C++ dll gets and forward the information to the C# dll since it'll be easier to manage on that (for me anyways), but I'd need to distinguish each packet with it's socket id.
You can only do that on the connected socket, i.e. the one returned from the accept() call, not on the listening "server" socket.