CSocket does not work on LAN - c++

I'm learning network-programming and try to develop a simple socket application that used CSocket class from MFC. And found myself hitting a wall that need some help.
I want the server side listens on a certain port, example 1001. Then from other computer, on the same sub-net, it should successfully telnet to that port. My program works correctly on localhost, but fail on the LAN, although I have opened that port on firewall of listener.
Here is my code from listener:
//CListenSocket is derived from CSocket
CListenSocket myListener;
myListener.Create(1001);
myListener.Listen();
//OnAccept()
//CConnectSocket is also derived from CSocket
CConnectSocket myConnect;
myListener.Accept(myConnect);
I built the release version using VS2008, here is the screenshot of the configuration properties:
So at this stage, when I run the program, netstat -an show this line:
TCP 0.0.0.0:1001 0.0.0.0:0 LISTENING
Then on that machine telnet 127.0.0.1 1001, this line appears.
TCP 127.0.0.1:1001 127.0.0.1:2681 ESTABLISHED
So I think my code is correct. After that I tried from other machine with the same sub-net, and the telnet fail:
Connecting To 192.168.2.199...Could not open connection to the host, on port 1001: Connect failed
Note: that my listener is on 192.168.2.199, and the connector is on 192.168.2.3. Both nodes can successfully ping and sharing file with each other. I also add both Inbound Rule and Outbound Rule for the program on my firewall, here is the properties of the rule:
For more information: On my listener node, there is Apache HTTP server installed on it, so I have the other node telnet to port 80, and it works...
So where did I miss? please help, thank you in advance.
EDITION 1:
Attempt for troubleshooting
So after hitting my head to the table for a while, I quickly make a decision for not use telnet for client anymore. But make a small client program to catch errors:
//CClientSocket is derived from CSocket
CClientSocket clientSocket;
clientSocket.Create();
int iConnect = clientSocket.Connect(ipAddress,1001) //ipAddress is a variable of MFC's text box on GUI.
switch (iConnect)
{
case 0:
{
DWORD errorNumber = ::GetLastError(); //catch error code
CString s_errorNumber;
s_errorNumber.Format("%d",errorNumber); //format to CString for easy echo
MessageBox("Connection fail :"+s_errorNumber)
clientSocket.ShutDown(CAsyncSocket::both);
clientSocket.Close();
break;
case SOCKET_ERROR:
if (::GetLastError() != WSAEWOULDBLOCK)
clientSocket.ShutDown(CAsyncSocket::both);
else
clientSocket.AsyncSelect();
break;
default:
{
MessageBox("Connection Established.");
}
break;
}
And the error number is: 10061. I checked this code on MSDN an it is WSAECONNREFUSED-Connection refused.
Now we know the problem here must be somewhere in the listener's firewall... still hiting my head to the table.

Related

Winsock sendto returns error 10049 (WSAEADDRNOTAVAIL) for broadcast address after network adapter is disabled or physically disconnected

I am working on a p2p application and to make testing simple, I am currently using udp broadcast for the peer discovery in my local network. Each peer binds one udp socket to port 29292 of the ip address of each local network interface (discovered via GetAdaptersInfo) and each socket periodically sends a packet to the broadcast address of its network interface/local address. The sockets are set to allow port reuse (via setsockopt SO_REUSEADDR), which enables me to run multiple peers on the same local machine without any conflicts. In this case there is only a single peer on the entire network though.
This all works perfectly fine (tested with 2 peers on 1 machine and 2 peers on 2 machines) UNTIL a network interface is disconnected. When deactivacting the network adapter of either my wifi or an USB-to-LAN adapter in the windows dialog, or just plugging the usb cable of the adapter, the next call to sendto will fail with return code 10049. It doesn't matter if the other adapter is still connected, or was at the beginning, it will fail. The only thing that doesn't make it fail is deactivating wifi through the fancy win10 dialog through the taskbar, but that isn't really a surprise because that doesn't deactivate or remove the adapter itself.
I initially thought that this makes sense because when the nic is gone, how should the system route the packet. But: The fact that the packet can't reach its target has absolutely nothing to do with the address itsself being invalid (which is what the error means), so I suspect I am missing something here. I was looking for any information I could use to detect this case and distinguish it from simply trying to sendto INADDR_ANY, but I couldn't find anything. I started to log every bit of information which I suspected could have changed, but its all the same on a successfull sendto and the one that crashes (retrieved via getsockopt):
250 16.24746[886] [debug|debug] local address: 192.168.178.35
251 16.24812[886] [debug|debug] no remote address
252 16.25333[886] [debug|debug] type: SOCK_DGRAM
253 16.25457[886] [debug|debug] protocol: IPPROTO_UDP
254 16.25673[886] [debug|debug] broadcast: 1, dontroute: 0, max_msg_size: 65507, rcv_buffer: 65536, rcv_timeout: 0, reuse_addr: 1, snd_buffer: 65536, sdn_timeout: 0
255 16.25806[886] [debug|debug] Last WSA error on socket was WSA Error Code 0: The operation completed successfully.
256 16.25916[886] [debug|debug] target address windows formatted: 192.168.178.255
257 16.25976[886] [debug|debug] target address 192.168.178.255:29292
258 16.26138[886] [debug|assert] ASSERT FAILED at D:\Workspaces\spaced\source\platform\win32_platform.cpp:4141: sendto failed with (unhandled) WSA Error Code 10049: The requested address is not valid in its context.
The nic that got removed is this one:
1.07254[0] [platform|info] Discovered Network Interface "Realtek USB GbE Family Controller" with IP 192.168.178.35 and Subnet 255.255.255.0
And this is the code that does the sending (dlog_socket_information_and_last_wsaerror generates all the output that is gathered using getsockopt):
void send_slice_over_udp_socket(Socket_Handle handle, Slice<d_byte> buffer, u32 remote_ip, u16 remote_port){
PROFILE_FUNCTION();
auto socket = (UDP_Socket*) sockets[handle.handle];
ASSERT_VALID_UDP_SOCKET(socket);
dlog_socket_information_and_last_wsaerror(socket);
if(socket->is_dummy)
return;
if(buffer.size == 0)
return;
DASSERT(socket->state == Socket_State::created);
u64 bytes_left = buffer.size;
sockaddr_in target_socket_address = create_socket_address(remote_ip, remote_port);
#pragma warning(push)
#pragma warning(disable: 4996)
dlog("target address windows formatted: %s", inet_ntoa(target_socket_address.sin_addr));
#pragma warning(pop)
unsigned char* parts = (unsigned char*)&remote_ip;
dlog("target address %hhu.%hhu.%hhu.%hhu:%hu", parts[3], parts[2], parts[1], parts[0], remote_port);
int sent_bytes = sendto(socket->handle, (char*) buffer.data, bytes_left > (u64) INT32_MAX ? INT32_MAX : (int) bytes_left, 0, (sockaddr*)&target_socket_address, sizeof(target_socket_address));
if(sent_bytes == SOCKET_ERROR){
#define LOG_WARNING(message) log_nonreproducible(message, Category::platform_network, Severity::warning, socket->handle); return;
switch(WSAGetLastError()){
//#TODO handle all (more? I guess many should just be asserted since they should never happen) cases
case WSAEHOSTUNREACH: LOG_WARNING("socket %lld, send failed: The remote host can't be reached at this time.");
case WSAECONNRESET: LOG_WARNING("socket %lld, send failed: Multiple UDP packet deliveries failed. According to documentation we should close the socket. Not sure if this makes sense, this is a UDP port after all. Closing the socket wont change anything, right?");
case WSAENETUNREACH: LOG_WARNING("socket %lld, send failed: the network cannot be reached from this host at this time.");
case WSAETIMEDOUT: LOG_WARNING("socket %lld, send failed: The connection has been dropped, because of a network failure or because the system on the other end went down without notice.");
case WSAEADDRNOTAVAIL:
case WSAENETRESET:
case WSAEACCES:
case WSAEWOULDBLOCK: //can this even happen on a udp port? I expect this to be fire-and-forget-style.
case WSAEMSGSIZE:
case WSANOTINITIALISED:
case WSAENETDOWN:
case WSAEINVAL:
case WSAEINTR:
case WSAEINPROGRESS:
case WSAEFAULT:
case WSAENOBUFS:
case WSAENOTCONN:
case WSAENOTSOCK:
case WSAEOPNOTSUPP:
case WSAESHUTDOWN:
case WSAECONNABORTED:
case WSAEAFNOSUPPORT:
case WSAEDESTADDRREQ:
ASSERT(false, tprint_last_wsa_error_as_formatted_message("sendto failed with (unhandled) ")); break;
default: ASSERT(false, tprint_last_wsa_error_as_formatted_message("sendto failed with (undocumented) ")); //The switch case above should have been exhaustive. This is a bug. We either forgot a case, or maybe the docs were lying? (That happened to me on android. Fun times. Well. Not really.)
}
#undef LOG_WARNING
}
DASSERT(sent_bytes >= 0);
total_bytes_sent += (u64) sent_bytes;
bytes_left -= (u64) sent_bytes;
DASSERT(bytes_left == 0);
}
The code that generates the address from ip and port looks like this:
sockaddr_in create_socket_address(u32 ip, u16 port){
sockaddr_in address_info;
address_info.sin_family = AF_INET;
address_info.sin_port = htons(port);
address_info.sin_addr.s_addr = htonl(ip);
memset(address_info.sin_zero, 0, 8);
return address_info;
}
The error seems to be a little flaky. It reproduces 100% of the time until it decides not to anymore. After a restart its usually back.
I am looking for a solution to handle this case correctly. I could of course just re-do the network interface discovery when the error occurs, because I "know" that I don't give any broken IPs to sendto, but that would just be a heuristic. I want to solve the actual problem.
I also don't quite understand when error 10049 is supposed to fire exactly anyway. Is it just if I pass an ipv6 address to a ipv4 socket, or send to 0.0.0.0? There is no flat out "illegal" ipv4 address after all, just ones that don't make sense from context.
If you know what I am missing here, please let me know!
This is a issue people have been facing up for a while , and people suggested to read the documentation provided by Microsoft on the following issue .
"Btw , I don't know whether they are the same issues or not but the error thrown back the code are same, that's why I have attached a link for the same!!"
https://learn.microsoft.com/en-us/answers/questions/537493/binding-winsock-shortly-after-boot-results-in-erro.html
I found a solution (workaround?)
I used NotifyAddrChange to receive changes to the NICs and thought it for some reason didn't trigger when I disabled the NIC. Turns out it does, I'm just stupid and stopped debugging too early: There was a bug in the code that diffs the results from GetAdaptersInfo to the last known state to figure out the differences, so the application missed the NIC disconnecting. Now that it observes the disconnect, it can kill the sockets before they try to send on the disabled NIC, thus preventing the error from happening. This is not really a solution though, since there is a race condition here (NIC gets disabled before send and after check for changes), so I'll still have to handle error 10049.
The bug was this:
My expectation was that, when I disable a NIC, iterating over all existing NICs would show the disabled NIC as disabled. That is not what happens. What happens is that the NIC is just not in the list of existing NICs anymore, even though the windows dialog will still show it (as disabled). That is somewhat suprising to me but not all that unreasonable I guess.
Before I had these checks to detect changes in the NICs:
Did the NIC exist before, was enabled and is now disabled -> disable notification
Did the NIC exist before, was disabled and is now enabled -> enable notification
Did the NIC not exist before, is not enabled -> enable notification
And the fix was adding a fourth one:
Is there an existing NIC that was not in the list of NICs anymore -> disable notification
I'm still not 100% happy that there is the possibility of getting a somewhat ambiguous error on a race condition, but I might call it a day here.

FTP NLST results in '425: Can't open data connection for transfer' only on some client machines

I'm currently running a FileZilla FTP server on a network. My issue is that on seemingly random machines, when the user navigates to a directory (which they are able to do) and attempts to ls (i.e. data transfer) their end hangs waiting for a response, while the server reports this 425: Can't open data connection for transfer mentioned above. This result varies depending on the client machine used, where some (either local or remote) are able to proceed and others stuck here. I understand that this is because simple FTP commands like CWDing operate on the 20/21 ports, whereas FTP data transfer operate on some other port number, which in turn may be blocked by a firewall somewhere along the chain. My question is, how do I account for these varying ports (if this truly is the issue), as as best I know they could be anything above 1024?
My end goal with this project is to implement a very simple FTP solution, ideally using WinINet, however, so far I've run into the same problem:
BOOL CWebFileFinder::FindFile(const CString& URL)
{
CString ServerName;
CString strObject;
INTERNET_PORT nPort;
DWORD dwServiceType = AFX_INET_SERVICE_FTP;
if (AfxParseURL(URL, dwServiceType, ServerName, strObject, nPort))
{
m_Connection = m_Session.GetFtpConnection(ServerName, m_Username, m_Password, nPort/*, true*/); // results in findfile still failing
if (m_Connection)
{
m_Connection->SetCurrentDirectory("sms"); // CDs into this dir
m_Finder = new CFtpFileFind(m_Connection);
if (m_Finder)
{
More = m_Finder->FindFile(_T("*.*")); // hangs here
}
}
}
catch (CException* pEx)
{
CString str;
LPTSTR error = str.GetBuffer(255);
pEx->GetErrorMessage(error, 255);
pEx->Delete();
str.ReleaseBuffer();
}
return More;
}
As far as I can see, either I need to call to open this data port prior to the LIST, or find the firewalls blocking these ports and create a rule to prevent that (What ports does Wininet listen on for Active FTP data connection?). Of course I could also be just completely off-base – Any insights at all would be greatly appreciated!
Your FTP server seems to require an encrypted connection (TLS/SSL).
WinInet does not support encrypted FTP.
See C++/Win32 The basics of FTP security and using SSL.

C++ OpenSSL Fails to perform handshake when accepting in non-blocking mode. What is the proper way?

I'm trying to implement OpenSSL into my application which uses raw C sockets and the only issue I'm having is the SSL_accept / SSL_connect part of the code which starts the KeyExchange phase but does not seem to complete it on the serverside.
I've had a look at countless websites and Q&A's here on StackOverflow to get myself through the OpenSSL API since this is basically the first time I'm attempting to implement SSL into an application but the only thing I could not find yet was how to properly manage failed handshakes.
Basically, running process A which serves as a server will listen for incoming connections. Once I run process B, which acts as a client, it will successfully connect to process A but SSL_accept (on the server) fails with error code -2 SSL_ERROR_WANT_READ.
According to openssl handshake failed, the problem is "easily" worked around by calling SSL_accept within a loop until it finally returns 1 (It successfully connects and completes the handshake). However, I do not believe that this is the proper way of doing things as it looks like a dirty trick. The reason for why I believe it is a dirty trick is because I tried to run a small application I found on https://www.cs.utah.edu/~swalton/listings/articles/ (ssl_client and ssl_server) and magically, everything works just fine. There are no multiple calls to SSL_accept and the handshake is completed right away.
Here's some code where I'm accepting the SSL connection on the server:
if (SSL_accept(conn.ssl) == -1)
{
fprintf(stderr, "Connection failed.\n");
fprintf(stderr, "SSL State: %s [%d]\n", SSL_state_string_long(conn.ssl), SSL_state(conn.ssl));
ERR_print_errors_fp(stderr);
PrintSSLError(conn.ssl, -1, "SSL_accept");
return -1;
}
else
{
fprintf(stderr, "Connection accepted.\n");
fprintf(stderr, "Server -> Client handshake completed");
}
This is the output of PrintSSLError:
SSL State: SSLv3 read client hello B [8465]
[DEBUG] SSL_accept : Failed with return -1
[DEBUG] SSL_get_error() returned : 2
[DEBUG] Error string : error:00000002:lib(0):func(0):system lib
[DEBUG] ERR_get_error() returned : 0
[DEBUG] errno returned : Resource temporarily unavailable
And here's the client side snippet which connects to the server:
if (SSL_connect(conn.ssl) == -1)
{
fprintf(stderr, "Connection failed.\n");
ERR_print_errors_fp(stderr);
PrintSSLError(conn.ssl, -1, "SSL_connect");
return -1;
}
else
{
fprintf(stderr, "Connection established.\n");
fprintf(stderr, "Client -> Server handshake completed");
PrintSSLInfo(conn.ssl);
}
The connection is successfully enstablished client-side (SSL_connect does not return -1) and PrintSSLInfo outputs:
Connection established.
Cipher: DHE-RSA-AES256-GCM-SHA384
SSL State: SSL negotiation finished successfully [3]
And this is how I wrap the C Socket into SSL:
SSLConnection conn;
conn.fd = fd;
conn.ctx = sslContext;
conn.ssl = SSL_new(conn.ctx);
SSL_set_fd(conn.ssl, conn.fd);
The code snippet here resides within a function that takes a file-descriptor of the accepted incoming connection on the raw socket and the SSL Context to use.
To initialize the SSL Contexts I use TLSv1_2_server_method() and TLSv1_2_client_method(). Yes, I know that this will prevent clients from connecting if they do not support TLS 1.2 but this is exactly what I want. Whoever connects to my application will have to do it through my client anyway.
Either way, what am I doing wrong? I'd like to avoid loops in the authentication phase to avoid possible hang ups/slow downs of the application due to unexpected infinite loops since OpenSSL does not specify how many attempts it might take.
The workaround that worked, but that I'd like to avoid, is this:
while ((accept = SSL_accept(conn.ssl)) != 1)
And inside the while loop I check for the return code stored inside accept.
Things I've tried to workaround the SSL_ERROR_WANT_READ error:
Added usleep(50) inside the while loop (still takes several cycles to complete)
Added SSL_do_handshake(conn.ssl) after SSL_connect and SSL_accept (didn't change anything on the end-result)
Had a look at the code shown on roxlu.com (search on Google for "Using OpenSSL with memory BIOs - Roxlu") to guide me through the handshaking phase but since I'm new to this, and I don't directly use BIOs in my code but simply wrap my native C sockets into SSL, it was kind of confusing. I'm also unable to re-write the Networking part of the application as it'd would be too much work for me right now.
I've done some tests with the openssl command-line as well to troubleshoot the issue but it gives no error. The handshake appears to be successful as no errors such as:
24069864:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:s3_pkt.c:656
appear. Here's the whole output of the command
openssl s_client -connect IP:Port -tls1_2 -prexit -msg
http://pastebin.com/9u1bfuf4
Things to note:
1. I'm using the latest OpenSSL version 1.0.2h
2. Application runs on a Unix system
3. Using self-signed certificates to encrypt the network traffic
Thanks everyone who's going to help me out.
Edit:
I forgot to mention that the sockets are in non-blocking mode since the application serves multiple clients in one-go. Though, client-side they are in blocking mode.
Edit2:
Leaving this here for future reference: jmarshall.com/stuff/handling-nbio-errors-in-openssl.html
You have clarified that the socket question is non-blocking.
Well, that's your answer. Obviously, when the socket is in a non-blocking mode, the handshake cannot be immediately completed. The handshake involves an exchange of protocol packets between the client and the server, with each one having to wait to receive the response from its peer. This works fine when the socket is in its default blocking mode. The library simply read()s and write()s, which blocks and waits until the message gets succesfully read or written. This obviously can't happen when the socket is in the non-blocking mode. Either the read() or write() immediately succeeds, or fails, if there's nothing to read or if the socket's output buffer is full.
The manual pages for SSL_accept() and SSL-connect() explain the procedure you must implement to execute the SSL handshake when the underlying socket is in a non-blocking mode. Rather than repeating the whole thing here, you should read the manual pages yourself. The capsule summary is to use SSL_get_error() to determine if the handshake actually failed, or if the library wants to read or write to/from the socket; and in that eventuality call poll() or select(), accordingly, then call SSL_accept() and SSL_connect() again.
Any other approach, like sprinkling silly sleep() calls, here and there, will result in an unreliable house of cards, that will fail randomly.

Libnodave - daveStart() Error using TCP Connection

I have established connection to a Siemens S7-300 PLC (simulated via PlcSIM) using the libnodave library. There are no issues connecting and writing data to the PLC. However, I am unable to change the status of the PLC from Start/Stop. I am attempting to use the following libnodave methods for such actions:
int daveStatus = daveStart(dc);
int daveStatus = daveStop(dc);
Both function calls return the same Error: 33794
nodave.c Cites the error as the following:
case 0x8402: return "CPU already in RUN or already in STOP ?";
The use of the daveStart() and daveStop() functions can be viewed in the example testS7online.c:
if(doStop) {
daveStop(dc);
}
if(doRun) {
daveStart(dc);
}
In the examples the start/stop functions are only called when MPI connections to the PLC are made. Does anyone know if the start/stop functions are supported for use with TCP connections? If so, any suggestions as to what may be causing my error?
I have just tried dc.start() and dc.stop() using libnodave 8.4 and NetToPlcSim tool. It worked perfectly. Possibly you don't use NetToPlcSim tool that makes connection to PLCSim via TCP/IP (that is 127.0.0.1 port 102 obviously) hence dc can't even connect. So if your lines don't work, then u must be doing something wrong.

Socket in use error when reusing sockets

I am writing an XMLRPC client in c++ that is intended to talk to a python XMLRPC server.
Unfortunately, at this time, the python XMLRPC server is only capable of fielding one request on a connection, then it shuts down, I discovered this thanks to mhawke's response to my previous query about a related subject
Because of this, I have to create a new socket connection to my python server every time I want to make an XMLRPC request. This means the creation and deletion of a lot of sockets. Everything works fine, until I approach ~4000 requests. At this point I get socket error 10048, Socket in use.
I've tried sleeping the thread to let winsock fix its file descriptors, a trick that worked when a python client of mine had an identical issue, to no avail.
I've tried the following
int err = setsockopt(s_,SOL_SOCKET,SO_REUSEADDR,(char*)TRUE,sizeof(BOOL));
with no success.
I'm using winsock 2.0, so WSADATA::iMaxSockets shouldn't come into play, and either way, I checked and its set to 0 (I assume that means infinity)
4000 requests doesn't seem like an outlandish number of requests to make during the run of an application. Is there some way to use SO_KEEPALIVE on the client side while the server continually closes and reopens?
Am I totally missing something?
The problem is being caused by sockets hanging around in the TIME_WAIT state which is entered once you close the client's socket. By default the socket will remain in this state for 4 minutes before it is available for reuse. Your client (possibly helped by other processes) is consuming them all within a 4 minute period. See this answer for a good explanation and a possible non-code solution.
Windows dynamically allocates port numbers in the range 1024-5000 (3977 ports) when you do not explicitly bind the socket address. This Python code demonstrates the problem:
import socket
sockets = []
while True:
s = socket.socket()
s.connect(('some_host', 80))
sockets.append(s.getsockname())
s.close()
print len(sockets)
sockets.sort()
print "Lowest port: ", sockets[0][1], " Highest port: ", sockets[-1][1]
# on Windows you should see something like this...
3960
Lowest port: 1025 Highest port: 5000
If you try to run this immeditaely again, it should fail very quickly since all dynamic ports are in the TIME_WAIT state.
There are a few ways around this:
Manage your own port assignments and
use bind() to explicitly bind your
client socket to a specific port
that you increment each time your
create a socket. You'll still have
to handle the case where a port is
already in use, but you will not be
limited to dynamic ports. e.g.
port = 5000
while True:
s = socket.socket()
s.bind(('your_host', port))
s.connect(('some_host', 80))
s.close()
port += 1
Fiddle with the SO_LINGER socket
option. I have found that this
sometimes works in Windows (although
not exactly sure why):
s.setsockopt(socket.SOL_SOCKET,
socket.SO_LINGER, 1)
I don't know if this will help in
your particular application,
however, it is possible to send
multiple XMLRPC requests over the
same connection using the
multicall method. Basically
this allows you to accumulate
several requests and then send them
all at once. You will not get any
responses until you actually send
the accumulated requests, so you can
essentially think of this as batch
processing - does this fit in with
your application design?
Update:
I tossed this into the code and it seems to be working now.
if(::connect(s_, (sockaddr *) &addr, sizeof(sockaddr)))
{
int err = WSAGetLastError();
if(err == 10048) //if socket in user error, force kill and reopen socket
{
closesocket(s_);
WSACleanup();
WSADATA info;
WSAStartup(MAKEWORD(2,0), &info);
s_ = socket(AF_INET,SOCK_STREAM,0);
setsockopt(s_,SOL_SOCKET,SO_REUSEADDR,(char*)&x,sizeof(BOOL));
}
}
Basically, if you encounter the 10048 error (socket in use), you can simply close the socket, call cleanup, and restart WSA, the reset the socket and its sockopt
(the last sockopt may not be necessary)
i must have been missing the WSACleanup/WSAStartup calls before, because closesocket() and socket() were definitely being called
this error only occurs once every 4000ish calls.
I am curious as to why this may be, even though this seems to fix it.
If anyone has any input on the subject i would be very curious to hear it
Do you close the sockets after using it?