I'm sorry if my question was answered already, but I cannot find it yet.
I'm using C++ and connection pool to connect to a PostgreSQL database in a Win32 console application. It runs OK at the beginning. However, after a while the program received an error: "Server closed the connection unexpectedly. This probably means the server terminated abnormally before or while processing the request".
When I open the PostgreSQL log file, it shows message: "unexpected EOF on client connection, could not receive data from client: No connection could be made because the target machine actively refused it."
Thank you for any help.
This really sounds like a network problem. I would be looking first at firewalls, then switches. I don't think a cable or a bad network card could cause a problem like this.
What sounds like is going on is that a connection is getting reset. If you eliminate network issues, then the next area to blame is the connection pooling software. Look at switching this out and see if the problem persists.
Related
I have a server application (unimrcpserver.exe) that is answering requests from client processes. This server process listens to several ports.
with netstat -a command I get the following lines for my process.
TCP 192.168.10.65:2544 MERTB-PC:0 LISTENING
TCP 192.168.10.65:2554 MERTB-PC:0 LISTENING
TCP 192.168.10.65:9060 MERTB-PC:0 LISTENING
(netstat output is long I only put relevant lines here)
Normally when the system works I make requests to the server from these ports and each of them works fine.
When I was doing stress tests I saw a situation where the system no longer responded the my requests that I make through the port 2554.
netstat -a still gives me the above lines so the server is somehow still listening to this port. When I run telnet on the same machine it gives an error :
telnet 192.168.10.65 2554
Connecting To 192.168.10.65...Could not open connection to the host, on port 2554: Connect failed
I also wrote a simple program with c++ to get the exact error message that the system generates to a connect() request. This time I get the following error:
No connection could be made because the target machine actively refused it
Additional info: Everything is on the same Windows machine. Firewall is disabled. This situation occurred only once when I am doing stress tests that sends multiple requests at the same time. Before the situation occurred the system handled to around 13000 requests, which took around half an hour.
So the question is : How can this situation occur? The port is being reported as "LISTENING" with netstat but I cannot connect to it. If it can be caused by a programming error what kind of an error can cause this kind of behavior?
A new connection can be "actively refused" under several conditions:
there is no LISTENING socket on the IP:Port being connected to.
there is a LISTENING socket, but its backlog of pending connections is full, so it cannot accept a new connection at that moment.
A firewall is blocking it. Though the firewall is more likely to use a different error, if it sends an error at all.
Since there is a LISTENING socket, #2 is the most likely/common case. If so, it means the server app is not accepting clients from its backlog fast enough, if at all.
A client cannot differentiate between these conditions. All it can do is detect the connect failure - WSAECONNREFUSED or ECONNREFUSED, depending on platform - and try again later.
So the question is : How can this situation occur? The port is being reported >>as "LISTENING" with netstat but I cannot connect to it. If it can be caused >>by a programming error what kind of an error can cause this kind of behavior?
Yes,It could be caused by a programming error on the server. I have seen it happening when the server's listening thread is deadlocked. The socket's state is "listening" but if the listening thread has some global state and is blocked on other threads waiting on a mutex to be released you will encounter this.
Also, like others here stated if the CPU is loaded due to your stress test and that might cause the server to refuse connections since the threads might be busy processing and the listening thread never got a chance to accept the connection.
There are so many questions about this issue but none has been able to address my issue specifically and I have yet to find any valid explanation of the error itself:
The underlying connection was closed: The connection was closed unexpectedly
In our situation we are making a call to a 3rd Party API via SSL. On my local PC I can connect to that API make a request and get a response back, but on an IIS Production server I get this error. The API is using OAuth to authenticate.
What exactly does it mean. Is the request leaving our server and rejected by the remote server, or is it not even leaving our server and our system is preventing it from making the request.
Some more information incase anyone may know what the issue is:
No known changes to any networking, servers, routing, security (apparently)
No code changes recently
According to our own internal logging, the issue started off as an ocassional 403 Error-Forbidden then we saw a number of Cannot Connect to Remote Server. Eventually it failed with The underlying connection was closed: The connection was closed unexpectedly.
Can someone please explain what the actual error means? If anyone has experienced this in a similar situation and can shed some light, that would be greatly appreciated.
The underlying connection was closed: The connection was closed unexpectedly
This just says, that (probably the remote end) closed the TCP connection which underlies the SSL connection. Usually an SSL alert should be sent back on SSL related errors, but some stacks instead close the connection. It might also be that the peer does not expect SSL at all and thus closes the connection because of invalid data.
On my local PC I can connect to that API make a request and get a response back, but on an IIS Production server I get this error.
It is hard to say what the problem might be, but if this is not only the same API but also the same server then the problem must be related to differences in the client. This can be support in ciphers, TLS versions, client certificates etc which can be different between machines. If this is not even the same server you should make sure that the problem is not server related by contacting the non-working server with the working client.
A good idea is also to make a TCP dump (wireshark) and compare the handshakes.
More detailed problem analysis can only be done when you provide more details about the problem, see http://noxxi.de/howto/ssl-debugging.html#hdr2.2 on what might be useful information.
I'm getting a "Socket Error: Connection reset by peer" message using the tradeclient c++ demo code from the quickfix download.
another user commented that it was related to network issues. if anyone has the solution it would be appreciated.
QuickFix C++ Socket Error Connection Reset By Peer?
<20141221-17:32:11.049, FIX.4.4:myusername->hostusername, event>
(Created session)
<20141221-17:32:11.056, FIX.4.4:myusername-> hostusername, event>
(Connecting to fix.hostusername.com on port 5001)
<20141221-17:32:11.221, FIX.4.4:myusername-> hostusername, outgoing>
(8=FIX.4.49=10735=A34=149=myusername =20141221-17:32:11.21856= hostusername 98=0108=30141=Y10=000)
<20141221-17:32:11.221, FIX.4.4:myusername-> hostusername, event>
(Initiated logon request)
<20141221-17:32:11.253, FIX.4.4:myusername-> hostusername, event>
(Socket Error: Connection reset by peer.)
<20141221-17:32:11.253, FIX.4.4:myusername-> hostusername, event>
(Disconnecting)
think i found the reason. the host I'm trying to connect with is using quickfix java which supports SSL. the quickfix c++ client doesn't seem to support the SSL enable tag in the session settings. finally had to resort to wireshark to determine this. i searched all over the web and many people were reporting this similar error. i hope this post saves them any anyone in the future from debugging endlessly to solve this "Socket Error: Connection reset by peer" error.
Two reasons I am aware of for "Socket Error: Connection reset by peer" are:-
1) Your SenderCompId/TargetCompId does not match with that of other side. In that case just make sure you are using correct one.
2) Other one is that sequence number expected by server is something different what you are sending. In that case just try with ResetOnLogon field ( in your registry file )to No and check if that resolves the issue.
There can be many reasons for this error. However, I doubt it that network is responsible for this error as connection request has been sent to server properly. You could search through the internet for wider range of answers.
Could well be firewall, have you the right IP and port, and permission to get there?
think i found the reason. the host I'm trying to connect with is using quickfix java which supports SSL. the quickfix c++ client doesn't seem to support the SSL enable tag in the session settings. finally had to resort to wireshark to determine this. i searched all over the web and many people were reporting this similar error. i hope this post saves them any anyone in the future from debugging endlessly to solve this "Socket Error: Connection reset by peer" error. – geiger zaehler
We got this error message when we had not correctly imported the security certificate.
Ok let me be clear. I'm using TCP and that should mean a connection shouldn't interrupt unless closed or due to network problems.
So here's my issue:
Utilizing my sockets works perfectly.
After 5 - 10 min of innactivity they stop responding (the connection is still alive [checked with netstat -n]).
It tells me that data is send (but the other side doesn't receive it and I'm sure it waiting for it.)
If I keep sending, eventually it will give me WSA error 10038 (invalid socket handle).
EDIT after a few more tries of sending, it gave me error 10058 (An established connection was aborted by the software in your host machine. )
I'm confused completely. I haven't closed the socket nor done anything to it other than inactivity. If I use it nonstop for 10 - 20 minutes, it works perfectly.
With error 10058, it's practically certain that a gateway (a proxy, or a firewall, or a router, with or without NAT) is timing out its relay of your connection.
Basically, you are not directly connected with your peer. Instead, the gateway is in between, and explicitly transfering data between its connection with you and its connection with your peer. Since sockets are a limited resource, the gateway has an eviction policy where it shuts down what look like inactive connections. If you look dead, boom, you are dead.
Your only option is to remain active, which typically means working in some kind of "heartbeat" into your application protocol. Nasty, but them's the breaks.
Unless you really know what you are doing, do not play around with TCP's SO_KEEPALIVE.
A NAT firewall may be eating your connection without telling you. Try enabling TCP keepalive.
I wrote a server that is listening for incomming TCP connections and clients connecting to it. When I shut down the server and restart it on the same port, I sometimes get the error message EADDRINUSE when calling bind(...) (error code: 98 on Linux). This happens even though I am setting the option to reuse the socket.
The error does not happen all the time, but it seems that it happens more often when clients are connected to the server and sending data while it shuts down. I guess the problem is that there are still pending connections while the server is shut down (related topic: https://stackoverflow.com/questions/41602/how-to-forcibly-close-a-socket-in-time-wait).
On the server side, I am using boost::asio::ip::tcp::acceptor. I initialize it with the option "reuse_address" (see http://beta.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/basic_socket_acceptor.html). Here is the code snippet:
using boost::asio::ip::tcp;
acceptor acceptor::acceptor(io_service);
endpoint ep(ip::tcp::v4(), port);
acceptor.open(ep.protocol());
acceptor.set_option(acceptor::reuse_address(true));
acceptor.bind(ep);
acceptor.listen();
The acceptor is closed with:
acceptor.close();
I also tried using acceptor.cancel() before that, but it had the same effect. When this error occurred, I cannot restart the server on the same port for quite some time. Restarting the network helps, but is not a permanent solution.
What am I missing?
Any help would be greatly appreciated! :)
These were originally a comment to the question.
does your server fork child processes? Also, are you sure the socket is in TIME_WAIT state? You might want to grab the netstat -ap output when this happens
When you solve these problems "by force", it seems you are calling problems on your head, do not you?
There is a reason the default behavior requires you to wait, otherwise the network could for example confuse the ACK from the previous connection to be ACK for the new connection.
I would not allow this "solution" to be included in release builds in my team.
Remember, when the probability of error is very low, testing is extremely difficult!