Socket Exception: "There are no more endpoints available from the endpoint mapper" - c++

I am using winsock and C++ to set up a server application. The problem I'm having is that the call to listen results in a first chance exception. I guess normally these can be ignored (?) but I've found others having the same issue I am where it causes the application to hang every once in a while. Any help would be greatly appreciated.
The first chance exception is:
First-chance exception at 0x*12345678* in MyApp.exe: 0x000006D9: There are no more endpoints available from the endpoint mapper.
I've found some evidence that this could be cause by the socket And the code that I'm working with is as follows. The exception occurs on the call to listen in the fifth line from the bottom.
m_accept_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (m_accept_fd == INVALID_SOCKET)
{
return false;
}
int optval = 1;
if (setsockopt (m_accept_fd, SOL_SOCKET, SO_REUSEADDR,
(char*)&optval, sizeof(optval)))
{
closesocket(m_accept_fd);
m_accept_fd = INVALID_SOCKET;
return false;
}
struct sockaddr_in local_addr;
local_addr.sin_family = AF_INET;
local_addr.sin_addr.s_addr = INADDR_ANY;
local_addr.sin_port = htons(m_port);
if (bind(m_accept_fd, (struct sockaddr *)&local_addr,
sizeof(struct sockaddr_in)) == SOCKET_ERROR)
{
closesocket(m_accept_fd);
return false;
}
if (listen (m_accept_fd, 5) == SOCKET_ERROR)
{
closesocket(m_accept_fd);
return false;
}

On a very busy server, you may be running out of Sockets. You may have to adjust some TCPIP parameters. Adjust these two in the registry:
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters
MaxUserPort REG_DWORD 65534 (decimal)
TcpTimedWaitDelay REG_DWORD 60 (decimal)
By default, there's a few minutes delay between releasing a network port (socket) and when it can be reused. Also, depending on the OS version, there's only a few thousand in the range that windows will use. On the server, run this at a command prompt:
netstat -an
and look at the results (pipe to a file is easiest: netstat -an > netstat.txt). If you see a large number of ports from 1025->5000 in Timed Wait Delay status, then this is your problem and it's solved by adjusting up the max user port from 5000 to 65534 using the registry entry above. You can also adjust the delay by using the registry entry above to recycle the ports more quickly.
If this is not the problem, then the problem is likely the number of pending connections that you have set in your Listen() method.

The original problem has nothing to do with winsock. All the answers above are WRONG. Ignore the first-chance exception, it is not a problem with your application, just some internal error handling.

Are you actually seeing a problem, e.g., does the program end because of an unhandled exception?
The debugger may print the message even when there isn't a problem, for example, see here.

Uhh, maybe it's because you're limiting greatly the maximum number of incoming connections?
listen (m_accept_fd, 5)
// Limit here ^^^
If you allow a greater backlog, you should be able to handle your problem. Use something like SOMAXCONN instead of 5.
Also, if your problem is only on server startup, you might want to turn off LINGER (SO_LINGER) to prevent connections from hanging around and blocking the socket...

This won't answer your question directly, but since you're using C++, I would recommend using something like Boost::Asio to handle your socket code. This gives you a nice abstraction over the winsock API, and should allow you to more easily diagnose error conditions.

Related

Linux socket C/C++ - What is the best way to check if ip/port is already in use?

I have a system that can start multiple instances.
Every instance has a client and a server.
They are connected over socket/TCP
Every instance is started by starting a client.
The client starts (checks if IP is available, if not increase the IP by 1, checks again ...) -
The client starts the server with the free IP and connects to it. (for legacy reasons has to be like this)
Instance numbers 2, 3, 4, 5 work without issues.
...
Instance number 6. -> Fails on checking if the first IP in the range is available.
To check if IP is already in use, I do not close the socket on the server side so that it can accept the additional connection.
On the client-side, I check if I can connect to the server-side with the following code:
bool CheckIPInUse(char *ip)
{
bool ret = false;
int port = 12345;
int sock;
struct sockaddr_in serv_addr;
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(port);
// **non blocking** because I want the check to be fast.
sock = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0);
inet_pton(AF_INET, ip, &serv_addr.sin_addr);
int ret_conn = connect(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr));
if (ret_conn == 0){
fprintf(stdout, "connected");
ret = true;
}
else if (ret_conn < 0 && (errno != EINPROGRESS)){
fprintf(stdout, "failed to connect");
}
else
{
int check_if_connected = 10;
while (check_if_connected--)
{
socklen_t len = sizeof(serv_addr);
int ret_getpeer = getpeername(sock, (struct sockaddr *)&serv_addr, &len);
if (ret_getpeer == 0)
{
fprintf(stdout, "connected");
ret = true;
break;
}
usleep(100000);
}
}
close(sock);
return ret;
}
This works for the first 5 instances.
6th instance fails to connect to the first IP in range and tries to start the server with IP which is already in use. (always the 6th).
Is there any better way to check programmatically if IP/Port is already busy?
Any ideas on what to check. for failure in the instance number 6?
The only way to check if an ip/port on a server is available is to bind() to it. If it worked, it was available (but not any more).
Any approach that involves a test connect()ion first, to see if it fails, or anything along the lines of poking somewhere in /proc to see which IPs and ports are in use -- nothing along these lines will ever be 100% foolproof. That's because even if you reach the conclusion that the port is available, it may no longer be by the time you get around to try to bind() to it.
Now, you can take, as a starting position, that a particular IP and/or port range is reserved for your application's use, and you only wish to arbitrate IP/port allocation between different instances of your application. In that case you can do that pretty much whatever you want, you're not limited to attempting to actually start instances of your application, and hope for the best. One simplistic approach is to use lock files in /var/tmp to represent all possible IP/port combination, and have your application try, in turn, to acquire a lock on the corresponding lock file, first, and once it's official, and the lock file is acquired, then the corresponding IP/port then can be established at your leisure, but the lock file must remain locked until the IP/port is no longer in use.
But in terms of attempting to check if a socket port is available, or not, the only way to do it is to bind() it, because that, by definition, is what it does. You could attempt to implement a multi-layered approach, like trying to connect() first, and then attempt to bind() it, and if the bind() fails, then keep looking for a free port. But that's creating extra complexity, without much of a benefit.
Did you check that the server did not meet its maximum backlog length ?
You may be getting "connection refused" if the server you are trying to connect to
has more pending connections then the defined backlog.
So if multiple clients are testing at the same time, one of them may encounter this.
The most probable cause of your problem is that your client is getting a connect from the server due to the listen queue. The best way to avoid this problem is to close the socket on which you call accept(2) once all the instances are in use, and reopen it again when any of the server instances are finished.
The listen queue makes the kernel to accept (send the SYN/ACK segment) connections on the otherwise not yet open socket waiting, and this will make the connection establishment quicker for the next server instances if many such connections are entering in the system. All those connections are handled in the accept(2) socket, so the best way to accept five such connections is to close the accept socket as soon as the last connection has been established (this will not avoid the problem if a connection happens to enter the server in the time between one accept(2) and the next, but the connection so established will be closed as soon as the accept socket is still open)
In my opinion, you should have a master server process that forks new processes to handle the different connection and closes the accept socket as soon as it reaches the full capacity. Once one of the servers attending the connections closes one of them, it should reopen the accept socket and accept a new connection.
IMHO, also the most robust way of implementing such a system is to allow the extra connections to get in, but not attend them, so the connection remains open in case a new client happens to enter, and it can close it if the server doesn't attend it in a timeout interval. Having a sixth client already connected, but waiting for the server to say hello, will leave you in a state in which you can start talking to the server as soon as the last service ends.

close on socket not releasing file descriptor

When conducting a stress test on some server code I wrote, I noticed that even though I am calling close() on the descriptor handle (and verifying the result for errors) that the descriptor is not released which eventually causes accept() to return an error "Too many open files".
Now I understand that this is because of the ulimit, what I don't understand is why I am hitting it if I call close() after each synchronous accept/read/send cycle?
I am validating that the descriptors are in fact there by running a watch with lsof:
ctsvr 9733 mike 1017u sock 0,7 0t0 3323579 can't identify protocol
ctsvr 9733 mike 1018u sock 0,7 0t0 3323581 can't identify protocol
...
And sure enough there are about 1000 or so of them. Further more, checking with netstat I can see that there are no hanging TCP states (no WAIT or STOPPED or anything).
If I simply do a single connect/send/recv from the client, I do notice that the socket does stay listed in lsof; so this is not even a load issue.
The server is running on an Ubuntu Linux 64-bit machine.
Any thoughts?
So using strace (thanks Gearoid), which I have no idea how I ever lived without, I noted I was in fact closing the descriptors.
However. And for the sake of posterity I lay bare my foolish mistake:
Socket::Socket() : impl(new Impl) {
impl->fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
....
}
Socket::ptr_t Socket::accept() {
auto r = ::accept(impl->fd, NULL, NULL);
...
ptr_t s(new Socket);
s->impl->fd = r;
return s;
}
As you can see, my constructor allocated a socket immediately, and then I replaced the descriptor with the one returned by accept - creating a leak. I had refactored the accept code from a standalone Acceptor class into the Socket class without changing this.
Using strace I could easily see socket() being run each time which lead to my light bulb moment.
Thanks all for the help!
Have you ever called perror() after close()?
I think the returned string will give you some help;
You are most probably hanging on a recv() or send() command. Consider setting a timeout using setsockopt .
I noticed a similar output on lsof when the socket was closed on the other end but my thread was keeping the socket open hanging on the recv() command waiting for data.

Why would connect() give EADDRNOTAVAIL?

I have in my application a failure that arose which does not seem to be reproducible. I have a TCP socket connection which failed and the application tried to reconnect it. In the second call to connect() attempting to reconnect, I got an error result with errno == EADDRNOTAVAIL which the man page for connect() says means: "The specified address is not available from the local machine."
Looking at the call to connect(), the second argument appears to be the address to which the error is referring to, but as I understand it, this argument is the TCP socket address of the remote host, so I am confused about the man page referring to the local machine. Is it that this address to the remote TCP socket host is not available from my local machine? If so, why would this be? It had to have succeeded calling connect() the first time before the connection failed and it attempted to reconnect and got this error. The arguments to connect() were the same both times.
Would this error be a transient one which, if I had tried calling connect again might have gone away if I waited long enough? If not, how should I try to recover from this failure?
Check this link
http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html
EDIT: Yes I meant to add more but had to cut it there because of an emergency
Did you close the socket before attempting to reconnect? Closing will tell the system that the socketpair (ip/port) is now free.
Here are additional items too look at:
If the local port is already connected to the given remote IP and port (i.e., there's already an identical socketpair), you'll receive this error (see bug link below).
Binding a socket address which isn't the local one will produce this error. if the IP addresses of a machine are 127.0.0.1 and 1.2.3.4, and you're trying to bind to 1.2.3.5 you are going to get this error.
EADDRNOTAVAIL: The specified address is unavailable on the remote machine or the address field of the name structure is all zeroes.
Link with a bug similar to yours (answer is close to the bottom)
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4294599
It seems that your socket is basically stuck in one of the TCP internal states and that adding a delay for reconnection might solve your problem as they seem to have done in that bug report.
This can also happen if an invalid port is given, like 0.
If you are unwilling to change the number of temporary ports available (as suggested by David), or you need more connections than the theoretical maximum, there are two other methods to reduce the number of ports in use. However, they are to various degrees violations of the TCP standard, so they should be used with care.
The first is to turn on SO_LINGER with a zero-second timeout, forcing the TCP stack to send a RST packet and flush the connection state. There is one subtlety, however: you should call shutdown on the socket file descriptor before you close, so that you have a chance to send a FIN packet before the RST packet. So the code will look something like:
shutdown(fd, SHUT_RDWR);
struct linger linger;
linger.l_onoff = 1;
linger.l_linger = 0;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_LINGER,
(char *) &linger, sizeof(linger));
close(fd);
The server should only see a premature connection reset if the FIN packet gets reordered with the RST packet.
See TCP option SO_LINGER (zero) - when it's required for more details. (Experimentally, it doesn't seem to matter where you set setsockopt.)
The second is to use SO_REUSEADDR and an explicit bind (even if you're the client), which will allow Linux to reuse temporary ports when you run, before they are done waiting. Note that you must use bind with INADDR_ANY and port 0, otherwise SO_REUSEADDR is not respected. Your code will look something like:
int opts = 1;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
(char *) &opts, sizeof(int));
struct sockaddr_in listen_addr;
listen_addr.sin_family = AF_INET;
listen_addr.sin_port = 0;
listen_addr.sin_addr.s_addr = INADDR_ANY;
// todo: test for error
bind(fd, (struct sockaddr *) &listen_addr, sizeof(listen_addr));
// todo: test for addr
// saddr is the struct sockaddr_in you're connecting to
connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
This option is less good because you'll still saturate the internal kernel data structures for TCP connections as per netstat -an | grep -e tcp -e udp | wc -l. However, you won't start reusing ports until this happens.
I got this issue. I got it resolve by enabling tcp timestamp.
Root cause:
After connection close, Connections will go in TIME_WAIT state for some
time.
During this state if any new connections comes with same IP and PORT,
if SO_REUSEADDR is not provided during socket creation then socket bind()
will fail with error EADDRINUSE.
But even though after providing SO_REUSEADDR also sockect connect() may
fail with error EADDRNOTAVAIL if tcp timestamp is not enable on both side.
Solution:
Please enable tcp timestamp on both side client and server.
echo 1 > /proc/sys/net/ipv4/tcp_timestamps
Reason to enable tcp_timestamp:
When we enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If we enable tcp_timestamps, it will make sure that those collisions cannot happen. However, we need TCP timestamps to be enabled on both ends. See the definition of tcp_twsk_unique for the gory details.
reference:
https://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1
Another thing to check is that the interface is up. I got confused by this one recently while using network namespaces, since it seems creating a new network namespace produces an entirely independent loopback interface but doesn't bring it up (at least, with Debian wheezy's versions of things). This escaped me for a while since one doesn't typically think of loopback as ever being down.

Socket in use error when reusing sockets

I am writing an XMLRPC client in c++ that is intended to talk to a python XMLRPC server.
Unfortunately, at this time, the python XMLRPC server is only capable of fielding one request on a connection, then it shuts down, I discovered this thanks to mhawke's response to my previous query about a related subject
Because of this, I have to create a new socket connection to my python server every time I want to make an XMLRPC request. This means the creation and deletion of a lot of sockets. Everything works fine, until I approach ~4000 requests. At this point I get socket error 10048, Socket in use.
I've tried sleeping the thread to let winsock fix its file descriptors, a trick that worked when a python client of mine had an identical issue, to no avail.
I've tried the following
int err = setsockopt(s_,SOL_SOCKET,SO_REUSEADDR,(char*)TRUE,sizeof(BOOL));
with no success.
I'm using winsock 2.0, so WSADATA::iMaxSockets shouldn't come into play, and either way, I checked and its set to 0 (I assume that means infinity)
4000 requests doesn't seem like an outlandish number of requests to make during the run of an application. Is there some way to use SO_KEEPALIVE on the client side while the server continually closes and reopens?
Am I totally missing something?
The problem is being caused by sockets hanging around in the TIME_WAIT state which is entered once you close the client's socket. By default the socket will remain in this state for 4 minutes before it is available for reuse. Your client (possibly helped by other processes) is consuming them all within a 4 minute period. See this answer for a good explanation and a possible non-code solution.
Windows dynamically allocates port numbers in the range 1024-5000 (3977 ports) when you do not explicitly bind the socket address. This Python code demonstrates the problem:
import socket
sockets = []
while True:
s = socket.socket()
s.connect(('some_host', 80))
sockets.append(s.getsockname())
s.close()
print len(sockets)
sockets.sort()
print "Lowest port: ", sockets[0][1], " Highest port: ", sockets[-1][1]
# on Windows you should see something like this...
3960
Lowest port: 1025 Highest port: 5000
If you try to run this immeditaely again, it should fail very quickly since all dynamic ports are in the TIME_WAIT state.
There are a few ways around this:
Manage your own port assignments and
use bind() to explicitly bind your
client socket to a specific port
that you increment each time your
create a socket. You'll still have
to handle the case where a port is
already in use, but you will not be
limited to dynamic ports. e.g.
port = 5000
while True:
s = socket.socket()
s.bind(('your_host', port))
s.connect(('some_host', 80))
s.close()
port += 1
Fiddle with the SO_LINGER socket
option. I have found that this
sometimes works in Windows (although
not exactly sure why):
s.setsockopt(socket.SOL_SOCKET,
socket.SO_LINGER, 1)
I don't know if this will help in
your particular application,
however, it is possible to send
multiple XMLRPC requests over the
same connection using the
multicall method. Basically
this allows you to accumulate
several requests and then send them
all at once. You will not get any
responses until you actually send
the accumulated requests, so you can
essentially think of this as batch
processing - does this fit in with
your application design?
Update:
I tossed this into the code and it seems to be working now.
if(::connect(s_, (sockaddr *) &addr, sizeof(sockaddr)))
{
int err = WSAGetLastError();
if(err == 10048) //if socket in user error, force kill and reopen socket
{
closesocket(s_);
WSACleanup();
WSADATA info;
WSAStartup(MAKEWORD(2,0), &info);
s_ = socket(AF_INET,SOCK_STREAM,0);
setsockopt(s_,SOL_SOCKET,SO_REUSEADDR,(char*)&x,sizeof(BOOL));
}
}
Basically, if you encounter the 10048 error (socket in use), you can simply close the socket, call cleanup, and restart WSA, the reset the socket and its sockopt
(the last sockopt may not be necessary)
i must have been missing the WSACleanup/WSAStartup calls before, because closesocket() and socket() were definitely being called
this error only occurs once every 4000ish calls.
I am curious as to why this may be, even though this seems to fix it.
If anyone has any input on the subject i would be very curious to hear it
Do you close the sockets after using it?

socket programming in client

I wrote this program in C++ and on Linux platform.
I wrote a client and server socket program.
In that client program, I wrote socket function and immediately after that I am doing some other actions not at all depending on socket (I wrote 2 for loops for some other logic).
After that I prepared the structures required for the socket and I wrote the connect function...in that I am getting error as unable to connect because connect is returning -1..
But for the same program, if I write that for loop's logic above the socket function and immediately after that structures, and connect function, then it is working fine..
What might be the reason I am not able to get? Please help me in this aspect. Here is my code
here index1 and index 2 are simple integer variables..The configstring is a char array contains 127.0.0.1:7005(address and port number)...address and port are char array variables to store address and port number..
struct sockaddr_in s1;
for(index1=0;configstring[index1]!=':';index1++)
{
address[index1] = configstring[index1];
}
address[index1++]='\0';
for(index2=0;configstring[index1]!='\0';index1++,index2++)
{
port[index2] = configstring[index1];
}
port[index2++]='\0';
int port_num = changeto_int(port);
if((sock_fd = socket(AF_INET,SOCK_STREAM,0)) == -1)
{
printf("unable to create a socket\n");
return 0;
}
s1.sin_family=AF_INET;
s1.sin_port=htons(port_num);
s1.sin_addr.s_addr=inet_addr(address);
memset(s1.sin_zero, '\0', sizeof s1.sin_zero);
int errno;
if(connect(sock_fd,(struct sockaddr *)&s1,sizeof(s1)) == -1)
{
printf("error:unable to connect\n");
printf("Error in connect(): %s\n", strerror( errno));
return -1;
}
First, never do something like this:
int errno;
errno is already defined for you.
More than that I suggest you to use perror() instead of
printf("Error in connect(): %s\n", strerror( errno));
Third, you can't call printf and than strerror( errno) because printf whould change
value of errno to success.
Third, I'd sugget to take a look on examples in internet and start from them.
I'd suggest to read man select_tut there are many good written code examples
of how to do and what.
Have you tried calling strerror on errno? connect() returning -1 would mean that errno has been set and should have more information about your error.
printf("Error in connect(): %s\n", strerror(errno));
Have you considered simply that your receiver is not listening properly for connections?
As others said, use perror to check errno and print some usable debug to the console.
Without your sample code there is no way to help you. There could be a million reasons. Perhaps there's a firewall on your machine blocking connections? Perhaps the server isn't listening, or is on an incorrect port (you did convert to network byte order didn't you?). Perhaps the client is connecting to a wrong address or port. Maybe you haven't set up your structures correctly.
I recommend reading Beej's Socket Programming Doo-Daa for a good introduction to sockets on Unix (and it follows on to Windows).
struct sockaddr_in s1;
could you please try memset of s1, at the beginning of your program.
I have experienced some thing similar to this.
Could you print debugging info about the address and port string ?
Remove the errno thing, include and use perror.
Compile with -Wall
Judging from your comment that perror() returns "socket operation on non socket"... How are your address and port variables declared? Is it possible that port[index2++]='\0' somehow overwrite onto sock_fd or such?
try adding :
inet_pton(AF_INET, your IP address, (void *)&server_address);
before making connection to the server.
Also,I have a hunch that the problem behind the scene could be around serverside.
Low level socket programming is tedious and error prone. You would be well advised to start using frameworks like Boost or ACE that shield you from these low level details and allow you to program platform independent.