Why the OS is changing the assigned outgoing port of my packets? - c++

My C++ software is creating syn packets (using boost) to my server with specific outgoing ports (according to the IANA port-assignment standards).
I am picking the outgoing ports for internal purposes.
For some reason, after I checked my application on many machines, with one specific machine am having the below issue:
The outgoing port which is being used isn't the one I assigned - Looks like the OS (Windows 10) is changing it.
What can be the issue?
Below is the relevant code I am using for assigning specific outgoing port:
std::string exceptionFormat = "exception. Error message: ";
error_code socket_set_option_error_code;
socket->set_option(tcp::socket::reuse_address(true), socket_set_option_error_code);
if (socket_set_option_error_code) {
throw SocketException("Got socket reuse set option " + exceptionFormat + socket_set_option_error_code.message());
}
const auto source_endpoint = tcp::endpoint(tcp::v4(), source_port);
error_code bind_socket_error_code;
socket->bind(source_endpoint, bind_socket_error_code);
if (bind_socket_error_code) {
throw SocketException("Got socket bind " + exceptionFormat + bind_socket_error_code.message());
}

Apparently, there were 2 antivirus installed on the machine while one of them changed the outgoing port (Kaspersky).

Tha packets might be flowing through NAT module (NAPT) or firewall which could also be one main reason due to which the port numbers can change.

Related

Winsock sendto returns error 10049 (WSAEADDRNOTAVAIL) for broadcast address after network adapter is disabled or physically disconnected

I am working on a p2p application and to make testing simple, I am currently using udp broadcast for the peer discovery in my local network. Each peer binds one udp socket to port 29292 of the ip address of each local network interface (discovered via GetAdaptersInfo) and each socket periodically sends a packet to the broadcast address of its network interface/local address. The sockets are set to allow port reuse (via setsockopt SO_REUSEADDR), which enables me to run multiple peers on the same local machine without any conflicts. In this case there is only a single peer on the entire network though.
This all works perfectly fine (tested with 2 peers on 1 machine and 2 peers on 2 machines) UNTIL a network interface is disconnected. When deactivacting the network adapter of either my wifi or an USB-to-LAN adapter in the windows dialog, or just plugging the usb cable of the adapter, the next call to sendto will fail with return code 10049. It doesn't matter if the other adapter is still connected, or was at the beginning, it will fail. The only thing that doesn't make it fail is deactivating wifi through the fancy win10 dialog through the taskbar, but that isn't really a surprise because that doesn't deactivate or remove the adapter itself.
I initially thought that this makes sense because when the nic is gone, how should the system route the packet. But: The fact that the packet can't reach its target has absolutely nothing to do with the address itsself being invalid (which is what the error means), so I suspect I am missing something here. I was looking for any information I could use to detect this case and distinguish it from simply trying to sendto INADDR_ANY, but I couldn't find anything. I started to log every bit of information which I suspected could have changed, but its all the same on a successfull sendto and the one that crashes (retrieved via getsockopt):
250 16.24746[886] [debug|debug] local address: 192.168.178.35
251 16.24812[886] [debug|debug] no remote address
252 16.25333[886] [debug|debug] type: SOCK_DGRAM
253 16.25457[886] [debug|debug] protocol: IPPROTO_UDP
254 16.25673[886] [debug|debug] broadcast: 1, dontroute: 0, max_msg_size: 65507, rcv_buffer: 65536, rcv_timeout: 0, reuse_addr: 1, snd_buffer: 65536, sdn_timeout: 0
255 16.25806[886] [debug|debug] Last WSA error on socket was WSA Error Code 0: The operation completed successfully.
256 16.25916[886] [debug|debug] target address windows formatted: 192.168.178.255
257 16.25976[886] [debug|debug] target address 192.168.178.255:29292
258 16.26138[886] [debug|assert] ASSERT FAILED at D:\Workspaces\spaced\source\platform\win32_platform.cpp:4141: sendto failed with (unhandled) WSA Error Code 10049: The requested address is not valid in its context.
The nic that got removed is this one:
1.07254[0] [platform|info] Discovered Network Interface "Realtek USB GbE Family Controller" with IP 192.168.178.35 and Subnet 255.255.255.0
And this is the code that does the sending (dlog_socket_information_and_last_wsaerror generates all the output that is gathered using getsockopt):
void send_slice_over_udp_socket(Socket_Handle handle, Slice<d_byte> buffer, u32 remote_ip, u16 remote_port){
PROFILE_FUNCTION();
auto socket = (UDP_Socket*) sockets[handle.handle];
ASSERT_VALID_UDP_SOCKET(socket);
dlog_socket_information_and_last_wsaerror(socket);
if(socket->is_dummy)
return;
if(buffer.size == 0)
return;
DASSERT(socket->state == Socket_State::created);
u64 bytes_left = buffer.size;
sockaddr_in target_socket_address = create_socket_address(remote_ip, remote_port);
#pragma warning(push)
#pragma warning(disable: 4996)
dlog("target address windows formatted: %s", inet_ntoa(target_socket_address.sin_addr));
#pragma warning(pop)
unsigned char* parts = (unsigned char*)&remote_ip;
dlog("target address %hhu.%hhu.%hhu.%hhu:%hu", parts[3], parts[2], parts[1], parts[0], remote_port);
int sent_bytes = sendto(socket->handle, (char*) buffer.data, bytes_left > (u64) INT32_MAX ? INT32_MAX : (int) bytes_left, 0, (sockaddr*)&target_socket_address, sizeof(target_socket_address));
if(sent_bytes == SOCKET_ERROR){
#define LOG_WARNING(message) log_nonreproducible(message, Category::platform_network, Severity::warning, socket->handle); return;
switch(WSAGetLastError()){
//#TODO handle all (more? I guess many should just be asserted since they should never happen) cases
case WSAEHOSTUNREACH: LOG_WARNING("socket %lld, send failed: The remote host can't be reached at this time.");
case WSAECONNRESET: LOG_WARNING("socket %lld, send failed: Multiple UDP packet deliveries failed. According to documentation we should close the socket. Not sure if this makes sense, this is a UDP port after all. Closing the socket wont change anything, right?");
case WSAENETUNREACH: LOG_WARNING("socket %lld, send failed: the network cannot be reached from this host at this time.");
case WSAETIMEDOUT: LOG_WARNING("socket %lld, send failed: The connection has been dropped, because of a network failure or because the system on the other end went down without notice.");
case WSAEADDRNOTAVAIL:
case WSAENETRESET:
case WSAEACCES:
case WSAEWOULDBLOCK: //can this even happen on a udp port? I expect this to be fire-and-forget-style.
case WSAEMSGSIZE:
case WSANOTINITIALISED:
case WSAENETDOWN:
case WSAEINVAL:
case WSAEINTR:
case WSAEINPROGRESS:
case WSAEFAULT:
case WSAENOBUFS:
case WSAENOTCONN:
case WSAENOTSOCK:
case WSAEOPNOTSUPP:
case WSAESHUTDOWN:
case WSAECONNABORTED:
case WSAEAFNOSUPPORT:
case WSAEDESTADDRREQ:
ASSERT(false, tprint_last_wsa_error_as_formatted_message("sendto failed with (unhandled) ")); break;
default: ASSERT(false, tprint_last_wsa_error_as_formatted_message("sendto failed with (undocumented) ")); //The switch case above should have been exhaustive. This is a bug. We either forgot a case, or maybe the docs were lying? (That happened to me on android. Fun times. Well. Not really.)
}
#undef LOG_WARNING
}
DASSERT(sent_bytes >= 0);
total_bytes_sent += (u64) sent_bytes;
bytes_left -= (u64) sent_bytes;
DASSERT(bytes_left == 0);
}
The code that generates the address from ip and port looks like this:
sockaddr_in create_socket_address(u32 ip, u16 port){
sockaddr_in address_info;
address_info.sin_family = AF_INET;
address_info.sin_port = htons(port);
address_info.sin_addr.s_addr = htonl(ip);
memset(address_info.sin_zero, 0, 8);
return address_info;
}
The error seems to be a little flaky. It reproduces 100% of the time until it decides not to anymore. After a restart its usually back.
I am looking for a solution to handle this case correctly. I could of course just re-do the network interface discovery when the error occurs, because I "know" that I don't give any broken IPs to sendto, but that would just be a heuristic. I want to solve the actual problem.
I also don't quite understand when error 10049 is supposed to fire exactly anyway. Is it just if I pass an ipv6 address to a ipv4 socket, or send to 0.0.0.0? There is no flat out "illegal" ipv4 address after all, just ones that don't make sense from context.
If you know what I am missing here, please let me know!
This is a issue people have been facing up for a while , and people suggested to read the documentation provided by Microsoft on the following issue .
"Btw , I don't know whether they are the same issues or not but the error thrown back the code are same, that's why I have attached a link for the same!!"
https://learn.microsoft.com/en-us/answers/questions/537493/binding-winsock-shortly-after-boot-results-in-erro.html
I found a solution (workaround?)
I used NotifyAddrChange to receive changes to the NICs and thought it for some reason didn't trigger when I disabled the NIC. Turns out it does, I'm just stupid and stopped debugging too early: There was a bug in the code that diffs the results from GetAdaptersInfo to the last known state to figure out the differences, so the application missed the NIC disconnecting. Now that it observes the disconnect, it can kill the sockets before they try to send on the disabled NIC, thus preventing the error from happening. This is not really a solution though, since there is a race condition here (NIC gets disabled before send and after check for changes), so I'll still have to handle error 10049.
The bug was this:
My expectation was that, when I disable a NIC, iterating over all existing NICs would show the disabled NIC as disabled. That is not what happens. What happens is that the NIC is just not in the list of existing NICs anymore, even though the windows dialog will still show it (as disabled). That is somewhat suprising to me but not all that unreasonable I guess.
Before I had these checks to detect changes in the NICs:
Did the NIC exist before, was enabled and is now disabled -> disable notification
Did the NIC exist before, was disabled and is now enabled -> enable notification
Did the NIC not exist before, is not enabled -> enable notification
And the fix was adding a fourth one:
Is there an existing NIC that was not in the list of NICs anymore -> disable notification
I'm still not 100% happy that there is the possibility of getting a somewhat ambiguous error on a race condition, but I might call it a day here.

Apache Thrift timeout

I'm using apache thrift in version 0.13.0.
As soon as the time between two calls is approximately 1.5 seconds the connection will be closed.
The timeout varies from 1.3 to 1.8 seconds.
keepAlive is set in server and client. I tried different for rx, tx but this did not change anything.
My client code used for testing is below.
The client is using windows and the server is running linux.
for (int i = 0; i < 100'000;i+=50){
remote_method();
auto sleep = std::chrono::milliseconds(i);
std::cout << "Sleep: " << i << "\n";
std::this_thread::sleep_for(sleep);
}
Thrift will throw an exception in the code snippet below, which is located in TSocket.cpp
// Timed out!
if (errno_copy == THRIFT_ETIMEDOUT) {
throw TTransportException(TTransportException::TIMED_OUT, "THRIFT_ETIMEDOUT");
}
It looks like something is resetting the connection after this time.
If the method is called with a high frequency no timeout occurs.
Thrift is working correctly, other socket based communication is showing this behavior as well. The root cause was the VMware virtual machine the server was running in. The network mode (briged, NAT, or host only) did not make a difference. By moving the server to a physical machine the problem has been solved. Most likely the network configuration of the linux is faulty.

ACE with multiple app instances on same pc - only first gets the message

I'm trying to create application where multiple instances will run on same machine and they will communicate together via UDP via the same port.
I was reading many threads on StackOverflow about it that it should be possible.
Though, when I open connection from each application instance I can see that each instance sends a message but only first instance (if first is closed then second...) receives that message.
I'm using ACE library for the communication. Excerpt from code:
ACE_SOCK_Dgram_Mcast dgram;
ACE_INET_Addr *listenAddress = new ACE_INET_Addr(12345, ACE_LOCALHOST);
dgram.open(*listenAddress);
ACE_INET_Addr peer_address;
char buffer[1024];
dgram.send(buffer, 256);
while (true)
{
if (dgram.recv(buffer, 256, peer_address, 0, &receiveLoopTimeout) != -1)
{
std::cout << "Received" << std::endl;
}
}
I also found out that if I call "dgram.join(*listenAddress)" then I get error, code ENODEV from the first instance of the app.
I'm not sure I understand what you are trying to do... send a message multicast so multiple receivers get it, or allow multiple processes to receive on the same UDP port unicast... I'm guessing the former.
You're using the ACE_SOCK_Dgram_Mcast class but with unicast addressing and operations. So only one instance will receive that message.
Check the ACE_wrappers/tests/Multicast_Test.cpp for examples of how to send and receive multicast.

Receiving Data for Multiple Hosts via Linux Sockets

I have a rather strange question. Lately, I have been tasked with developing software to simulate a large (hundreds of nodes and up) network. To make a long story short, we have a head-end server that communicates with each host through a predictable IP addressing scheme via Linux sockets using a mixture of broadcast and unicast. The head-end will issue a request to a given client and will (sometimes) receive data pertaining to the command executed. All data / commands are sent via UDP on a well-defined port.
Now, for testing purposes, we would like to use the original server binary in a virtual environment an still receive reasonable data. For example, we would like to issue a reset command to a particular node and receive a fake notification back. The broadcast bit is easy, as I simply have to listen in on the proper broadcast address and act accordingly. The unicast is what has me stuck.
The Question
Is it possible to receive UDP requests for a large number of discrete hosts via a single (or a reduced) number of Linux sockets? All hosts are on the same subnet and all IP addresses / hosts / network topology are known ahead of time.
Desired Output
Ultimately, we would like to have an app that runs on a host on the network and responds as if it were each of these discrete 'virtualized' hosts based on input datagrams.
Do note that I am not asking for someone to write me a program. I am just simply looking for some direction as to the 'vehicle' by which this can be accomplished.
Possible Solutions
RAW Sockets: This has promise as I can trap all inbound data via a
single socket and punt it off to a worker thread for processing and
response. Unfortunately, I only receive packets that are
destined for my host IP and none of the 'fake' IPs.
Abuse IP aliases on Linux, one for each host: This seems to be the most direct approach but it feels like duck hunting with a bazooka. It has the added benefit of appearing to 'be' the host for any other forms of communication, I just worry that creating 400+ aliases might be a bit much for our bastard-child of a Linux environment. As an added complication, the hosts do change based on configuration and can be in any manner of states (up, down, command processing, etc.).
The source code of the server is to be treated as immutable for the purpose of our testing. I fully expect this will be impossible with the constraints given, but someone may have an idea of how to accomplish this as, quite frankly, I have never done anything of this sort before.
Thank you in advance for any assistance.
Personally, I would use your second option - add all the IP addresses to the host, then bind to INADDR_ANY address. This would mean you could use just one socket.
An alternative is to set the IP_TRANSPARENT socket option on your socket, which will then allow your application to bind to non-local addresses (you would route the networks containing those addresses through the machine that your application is running on). This method does require one socket per address, though.
So, using a combination of both of caf's solutions, I was able to have my cake and eat it too. I was also heavily influenced by
Python/iptables: Capturing all UDP packets and their original destination
which is a Python example, but does show how I can 'cheat' the packets back to a single interface, negating the need for maintenance of many sockets. That question is well worth the read and contains a lot of good information. For compactness, though, I will restate part of it below.
Hopefully it can help someone else down the road.
Part 1 - Host Configuration
As stated in the above question, we can use a combination of iptables and ip routes to redirect the packets to loopback for processing. This was not stated in my original question, but it is acceptable for the 'simulator' to run on the head-end host itself and not be a discrete node on the network. To do this, we mark each packet via iptables and then route it to lo based on said mark.
iptables -A OUTPUT -t mangle -p udp --dport 27333 -j MARK --set-mark 1
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
In my case, I only need traffic to a certain port so my iptables rule has been adjusted accordingly from the original.
Part 2 - Software
As caf stated in his post, the real trick is to use IP_TRANSPARENT and a raw socket. Raw sockets are necessary in order to get the original source / destination IP addresses. One gotchya that took me a while was the use of IPPROTO_UDP in the call to socket(). Even though this is a raw socket, it will strip out the Ethernet header. A lot of code online shows the calculation of the IP header offset using something similar to the following:
struct iphdr* ipHeader = (struct iphdr *)(buf + sizeof(ethhdr));
Offsetting by ethhdr (which is stripped) will give you some rather entertaining garbage data. With that particular header removed, the necessary IP header is simply the first structure in the buffer.
The Test Code
Below you will find a proof-of-concept example. It is no way fully functional or complete. In particular, no checking in done on the incoming packets for malicious data (ex. format string exploits in the payload, pointer math problems, malformed / malicious packets, etc).
Note that the code binds to lo specifically. This does not mean that we will only get packets destined for one of our 'fake' hosts (other services use loobpack, too). Additional checking / filtering is required to get only the packets we want.
#include <arpa/inet.h>
#include <netinet/if_ether.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/udp.h>
#include <sys/socket.h>
#include <stdio.h>
#include <string>
int main(int argc, char *argv[]) {
//Set up listening socket
struct sockaddr_in serverAddr;
struct iphdr* ipHeader;
struct udphdr* udpHeader;
int listenSock = 0;
char data[65536];
static int is_transparent = 1;
std::string device = "lo";
//Initialize listening socket
if ((listenSock = socket(AF_INET, SOCK_RAW, IPPROTO_UDP)) < 0) {
printf("Error creating socket\n");
return 1;
}
setsockopt(listenSock, SOL_IP, IP_TRANSPARENT, &is_transparent, sizeof(is_transparent));
setsockopt(listensock, SOL_SOCKET, SO_BINDTO_DEVICE, device.c_str(), device.size());
memset(&serverAddr, 0x00, sizeof(serverAddr));
memset(&data, 0x00, sizeof(data));
//Setup server address
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = htonl(INADDR_ANY);
serverAddr.sin_port = htons(27333);
//Bind and listen
if (bind(listenSock, (struct sockaddr *) &serverAddr, sizeof(serverAddr)) < 0) {
printf("Error binding socket\n");
return 1;
}
while (1) {
//Accept connection
recv(listenSock, data, 65536, 0);
//Get IP header
ipHeader = (struct iphdr*)(data);
//Only grab UDP packets (17 is the magic number for UDP protocol)
if ((unsigned int)ipHeader->protocol == 17) {
//Get UDP header information
udpHeader = (struct udphdr*)(data + (ipHeader->ihl * 4));
//DEBUG
struct sockaddr_in tempDest;
struct sockaddr_in tempSource;
char* payload = (char*)(data + ipHeader->ihl * 4) + sizeof(struct udphdr));
memset(&tempSource, 0x00, sizeof(tempSource));
memset(&tempDest, 0x00, sizeof(tempDest));
tempSource.sin_addr.s_addr = ipHeader->saddr;
tempDest.sin_addr.s_addr = ipHeader->daddr;
printf("Datagram received\n");
printf("Source IP: %s\n", inet_ntoa(tempSource.sin_addr));
printf("Dest IP : %s\n", inet_ntoa(tempDest.sin_addr));
printf("Data : %s\n", payload);
printf("Port : %d\n\n", ntohs(udpHeader->dest));
}
}
}
Further Reading
Some very helpful links are below.
http://www.binarytides.com/packet-sniffer-code-in-c-using-linux-sockets-bsd-part-2/
http://bert-hubert.blogspot.com/2012/10/on-binding-datagram-udp-sockets-to-any.html

Socket in use error when reusing sockets

I am writing an XMLRPC client in c++ that is intended to talk to a python XMLRPC server.
Unfortunately, at this time, the python XMLRPC server is only capable of fielding one request on a connection, then it shuts down, I discovered this thanks to mhawke's response to my previous query about a related subject
Because of this, I have to create a new socket connection to my python server every time I want to make an XMLRPC request. This means the creation and deletion of a lot of sockets. Everything works fine, until I approach ~4000 requests. At this point I get socket error 10048, Socket in use.
I've tried sleeping the thread to let winsock fix its file descriptors, a trick that worked when a python client of mine had an identical issue, to no avail.
I've tried the following
int err = setsockopt(s_,SOL_SOCKET,SO_REUSEADDR,(char*)TRUE,sizeof(BOOL));
with no success.
I'm using winsock 2.0, so WSADATA::iMaxSockets shouldn't come into play, and either way, I checked and its set to 0 (I assume that means infinity)
4000 requests doesn't seem like an outlandish number of requests to make during the run of an application. Is there some way to use SO_KEEPALIVE on the client side while the server continually closes and reopens?
Am I totally missing something?
The problem is being caused by sockets hanging around in the TIME_WAIT state which is entered once you close the client's socket. By default the socket will remain in this state for 4 minutes before it is available for reuse. Your client (possibly helped by other processes) is consuming them all within a 4 minute period. See this answer for a good explanation and a possible non-code solution.
Windows dynamically allocates port numbers in the range 1024-5000 (3977 ports) when you do not explicitly bind the socket address. This Python code demonstrates the problem:
import socket
sockets = []
while True:
s = socket.socket()
s.connect(('some_host', 80))
sockets.append(s.getsockname())
s.close()
print len(sockets)
sockets.sort()
print "Lowest port: ", sockets[0][1], " Highest port: ", sockets[-1][1]
# on Windows you should see something like this...
3960
Lowest port: 1025 Highest port: 5000
If you try to run this immeditaely again, it should fail very quickly since all dynamic ports are in the TIME_WAIT state.
There are a few ways around this:
Manage your own port assignments and
use bind() to explicitly bind your
client socket to a specific port
that you increment each time your
create a socket. You'll still have
to handle the case where a port is
already in use, but you will not be
limited to dynamic ports. e.g.
port = 5000
while True:
s = socket.socket()
s.bind(('your_host', port))
s.connect(('some_host', 80))
s.close()
port += 1
Fiddle with the SO_LINGER socket
option. I have found that this
sometimes works in Windows (although
not exactly sure why):
s.setsockopt(socket.SOL_SOCKET,
socket.SO_LINGER, 1)
I don't know if this will help in
your particular application,
however, it is possible to send
multiple XMLRPC requests over the
same connection using the
multicall method. Basically
this allows you to accumulate
several requests and then send them
all at once. You will not get any
responses until you actually send
the accumulated requests, so you can
essentially think of this as batch
processing - does this fit in with
your application design?
Update:
I tossed this into the code and it seems to be working now.
if(::connect(s_, (sockaddr *) &addr, sizeof(sockaddr)))
{
int err = WSAGetLastError();
if(err == 10048) //if socket in user error, force kill and reopen socket
{
closesocket(s_);
WSACleanup();
WSADATA info;
WSAStartup(MAKEWORD(2,0), &info);
s_ = socket(AF_INET,SOCK_STREAM,0);
setsockopt(s_,SOL_SOCKET,SO_REUSEADDR,(char*)&x,sizeof(BOOL));
}
}
Basically, if you encounter the 10048 error (socket in use), you can simply close the socket, call cleanup, and restart WSA, the reset the socket and its sockopt
(the last sockopt may not be necessary)
i must have been missing the WSACleanup/WSAStartup calls before, because closesocket() and socket() were definitely being called
this error only occurs once every 4000ish calls.
I am curious as to why this may be, even though this seems to fix it.
If anyone has any input on the subject i would be very curious to hear it
Do you close the sockets after using it?