I'm Working in an embedded linux environment.
it launches a telnet daemon on startup which watches on a particular port and launches a program when a connection is received.
i.e.
telnetd -l /usr/local/bin/PROGA -p 1234
PROGA - will output some data at irregular intervals. When it is not outputting data, every X period of time it sends out a 'heartbeat' type string to let the client know that we are still active i.e. "heartbeat\r\n"
After a random amount of time, the client (use a linux version of telnet, launched by: telnet xxx.xxx.xxx.xxx 1234) will fail to receive the 'heartbeat\r\n'
The data the client sees:
heartbeat
heartbeat
heartbeat
...
heartbeat
[nothing, should have received heartbeat]
[nothing forever]
heartbeat is sent:
result = printf("%s", heartbeat);
checking result, it is always the length of heartbeat. Logging to syslog shows us that the printf() is executing with success at the proper intervals
I've since added in a tcdrain and fflush which both return success, but do not seem to help the situation.
Any help would be appreciated.
**UDPATE: got a wireshark capture from the server side. Very Clearly the heartbeat is being sent continuously. No Hicups, no delays. Found something interesting on the client though. The client in this test case (telnet on Ubuntu 9.04) seems to suddenly stop receiving heartbeat (as describes above). Wireshark confirms this, big pause in packets. Well, once the client had stopped receiving the heartbeat, pressing any keystroke (on the client) seems to trigger a spew of data from the client's buffer (all heartbeats). Wireshark on the client also shows this massive amount of data all in one packet.
Unfortunately I don't really know what this means. It this a line mode on/off thing? Line endings (\r\n) are very clearly coming through.
**Update 2: running netcat instead of telnetd, the problem is not reproducible.
The first thing I would do is get out Wireshark and try to find out if the server is truly sending the message. It would be instructive to run Wireshark at the server as well as third party PC. Is there anything different about the last heartbeat?
Edit. Well, that was an interesting find on your client.
It seems like there's some sort of terminal thing in the way. You may want to use the netcat program rather than telnetd. netcat is designed for sending arbitrary data over a TCP session in raw mode, without any special formatting, and it has the ability to hook up an arbitrary process to a socket. On a Windows machine you can use PuTTY in raw mode to accomplish the same thing.
It may still be worth examining traffic with a third party between your client and server. The kernel may be optimizing away writes to the network and internally buffering data. That's the only way to ensure that what see is what's really happening on the wire.
Related
I am measuring the latency of a TCP/IP connection on windows over the loopback interface and getting about 4ms for the time from a message is sent to a response is received.
For RPC purposes there is a TCF layer on top of TCP/IP.
The messages sent and received contain only a single character as payload in addition to the TCF framing.
The "server" which handles the commands are implemented in C++ using boost asio.
The "client" sending commands is a Python script that uses the Python TCF reference implementation.
I have tried setting the socket options to TCP_NODELAY to disable the Nagle algorithm and experimented with various buffersizes for the socket, but the roundtrip time remains at about 4ms. I was expecting it to be quite a bit lower.
Profiling on the C++ side shows that it spends about 50% of it's execution time waiting for commands, so the next step will be to try and replace the python script with a C++ implementation, but it would be nice to know what one can expect for the roundtrip time on the loopback interface.
This SO, question:
Linux Loopback performance with TCP_NODELAY enabled
is related but did not quite answer my question.
You can establish a lower bound on the latency with ping localhost. The number it reports is one packet sent, one received.
If your TCP message is sent on an existing connection, you might get nearly that latency.
If your measured time included the TCP connection setup, you might get 10x that latency.
How can I check if a remote UDP port is open by using native C++? Since UDP is connection-less, calling connect() is not helpful. I cannot try binding it since it is not local. nmap cannot also indicate. (however netstat can find out, but I think it looks at internal information about open ports/files). Is there anyway to detect it? If I go a layer down on network level, is it possible to send a ICMP message by C++ to check port-unreachable status? I mean, would that give enough information on port status?
Platform is Linux.
I assume that you are trying to determine whether or not a UDP port on a remote machine is being passed through a firewall and/or has an application running on it.
You cannot reliably determine this. The closest you can come is to try sending a series of small datagrams to that address and port, spaced about 1 second apart for about 10 seconds.
If there are no firewalls blocking the port and no application is running, then the remote system might send back ICMP_UNREACH_PORT (port unreachable). If there are no blocking firewalls and the remote system is down, a router might send back ICMP_UNREACH_HOST or ICMP_UNREACH_NET. If a firewall is blocking you, it might send back ICMP_UNREACH_FILTER_PROHIB, but most firewalls don't send back anything.
The odds of getting any of those back are pretty slim because most firewalls block that sort of ICMP feedback. Even if an ICMP message does come back, linux generally does not let you see it unless you are running as root. Some operating systems will report ICMP errors as a failure of the next sendto() to the same address/port, which is why you need to repeat the message several times. But some do not, in which case you must open a specific ICMP port and parse any return messages.
Even if you do somehow get an ICMP message, understand that they are not reliable. For example, you could get ICMP_UNREACH_PORT even though an application is not only listening, but actively sending you data. (That's rare, but I've seen it happen.)
If an application is running on the given port and if you know what that application is and if you know how to craft a message which will cause that application to respond to you, then doing so and getting a response is the best indication that the port is open. But getting no response means nothing: maybe the port is blocked, maybe the application is not running, or maybe it just didn't like your message.
Bottom line: no, not really.
There is no bulletproof way to check if a remote port is ready to receive your UDP datagrams. Since UDP is connectionless you can just tell if the remote host is answering something meaningful to you. There may be ways to get an hint (as port scanners do) but that is nothing I would rely on in production code.
i am developing client server application in windows using c++ and winsock lib it work fine but if it is on network and once server listening started and if i remove network cable then server doesn't shows any error in any thread so where server socket knows network cable is unplugged.
if any body knows please help me.
While it should be possible to detect that the network cable is unplugged on the host, you will still have the same problem if the network is disrupted somewhere else between your server and the clients.
One common (if not the most common) way to solve this is to have a "keep-alive" message being sent. If no reply to that message is received within some timeout you simply close the connection and release all resources associated with it.
Edit
A "keep-alive" message is like using the "ping" command to see if a remote machine can be reached. It is simply a message that is sent, either by the server or the client (it doesn't matter who initiate it) to see if the other end of the connection is alive and can be reached.
It can be as simple as sending the string "Are you there?" and expecting a reply containing "Yes I am". If you send it once every minute, and don't get a reply withing (for example) one minute, you can consider the connection being dead. The other end, that receives the "Are you there?", knows it will get the message once every minute. If it hasn't arrived for two minutes then the sender is no longer reachable.
If the protocol can't be modified to add such messages, then see if some other message can be used instead.
Also, remember that the best and some cases only way to know if something is wrong with a connection is to attempt to read from the socket.
You can unplug a network and then plug it back in, or your Wi-Fi laptop can lose reception for a second and then pick it back up. It would be frustrating if such resumable cases were treated as an error in all the programs we use.
From this Winsock "newbie" FAQ:
The previous question deals with detecting when a protocol connection is dropped normally, but what if you want to detect other problems, like unplugged network cables or crashed workstations? In these cases, the failure prevents notifying the remote peer that something is wrong. My feeling is that this is usually a feature, because the broken component might get fixed before anyone notices, so why demand that the connection be reestablished?
If you feel you have a "special needs" situation you can be aggressive with timeouts. But I wouldn't do that unless there was a really good reason.
System Background:
Its basically a client/server application. Server is an embedded device and Client is a windows app developed in C++.
Issue: After a runtime of about a week, communication breaks between client/server,
because of this the server is not able to connect back to the client and needs a restart to recover. Looks like System is experiencing Socket re-connection problem. Also The network sometimes experiences intermittent failures.
Abrupt Termination at remote end
Port locking
Want some suggestions on how to cleanup the socket or shutdown cleanly so that re-connection happens properly. Other alternate solutions?
Thanks,
Hussain
It does not sound like you are in a position to easily write a stress test app to reproduce this more quickly out of band, which is what I would normally suggest. A pragmatic solution might be to periodically restart the server and client at a time when you think the system is least busy, or when problems arise. This sounds like cheating but many production systems I have been involved with take this approach to maximize system uptime.
My preferred solution here would be to abstract the server and client socket code (hopefully your design allows this to be done without too much work) and use it to implement client and server test apps that can be used to stress test only the socket code by simulating a lot of normal socket traffic in a short space of time - this helps identify timing windows and edge cases that could cause problems over time, and might speed up the process of obtaining a debuggable repro - you can simulate network error in your test code by dropping the socket on the client or server periodically.
A further step to take on the strategic front would be to ensure that you have good diagnostics in your socket handlers on client and server side. Track socket open and close, with special focus on your socket error and reconnect paths given you know the network is unreliable. Make sure the logs are output sequential with a timestamp. Something as simple as this might quickly show you what error or conditions trigger your problems. You can quickly make sure the logs are correct and complete using the test apps I mentioned above.
One thing you might want to check is that you are not being hit by lack of ability to reuse addresses. Sometimes when a socket gets closed, it cannot be immediately reused for a reconnect attempt as there is still residual activity on one or other end. You may be able to get around this (based on my Windows/Winsock experience) by experimenting with SO_REUSEADDR and SO_LINGER on your sockets. however, my first focus in your case would be on ensuring the socket code on client and server handles all errors and mainline cases correctly, before worrying about this.
A common issue is that when a connection is dropped, it is kept opened by the OS in TIME_WAIT state. If you want to restart the server socket, it will not be able to reopen the same port directly because it is still present for the OS.
To avoid that, you need to set the parameter SO_REUSEADDR so that the OS allows you to reuse the port if it is in TIME_WAIT state for a server socket.
Example:
int optval=1;
// set SO_REUSEADDR on a socket to true (1):
setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);
I'm experiencing something similar with encrypted connections. I believe in my case it is because the client dropped the connection and reconnected in less than the 4 minute FIN_WAIT period. The initial connection is recycled (by the os) and the server doesn't see the drop out. The SSL authentication is lost when the client loses connection so the client tries to re-authenticate. This is during what the servers considers the middle of a conversation. The server then hangs up on the client. I think the server ssl code considers this a man in the middle attack or just gets confused and closes the connection.
I need a little help if someone's got a minute.
I've written a web server using IO completion ports, but I am having some trouble sending out large files. Web pages seem to load fine, but during large file transfers, WSASend() fails after a few minutes with error "The specified network name is no longer available."
Right now, my server just closes the associated connection when any overlapped operation fails. Is this the right thing to do? or should I retry failed overlapped operations a few times before I close the socket? I am using tcp/stream sockets.
(fixed) I am also receiving what seems like random 0 byte packets from WSARecv. I am not sure what to make of this, or if the problem is related.(/fixed)
Thanks for any help
edit: now that the server properly handles connections, and has a much more comprehensive log, it seems like Len is right. The client is closing the connection for some reason.
The log:
Initializing Windows Sockets...
Forwarding port 80...
Starting server...
Waiting for incoming connections...
Socket 1128: Client connected.
Socket 1128: Request received
Socket 1128: Sent response
Socket 1128: Error 64: SendChunk() failed. //WSASend()
Socket 1128: Closing connection - GetQueueCompletionStatus == FALSE
so the question is now, why would the client close the connection? It takes anywhere from 2-5 minutes to happen. I have decreased the buffer size to 4098 bytes per send, and only send the next chunk when the first has completed.
Thanks again for any ideas on this.
p.s. I even just implemented a retry function so that it will retry a failed overlapped IO operation five times before giving up....still no luck =(
A zero length packet returned from recv indicates client on the other end has closed the connection.
Which answers why your subsequent send to the client failed.
http://www.opengroup.org/onlinepubs/009695399/functions/recv.html
If no messages are available to be
received and the peer has performed an
orderly shutdown, recv() shall return
0.
Are you doing anything to impose some form of flow control on your data transmission?
If not then you are probably using up resources which is causing the send to fail.
For example, if you are simply issuing LOTS of WSASend() calls one after the other rather than pacing them based on when they complete then each one will use system resources (non-paged pool and/or lock pages which go towards the 'locked pages limit'). You'll then likely eventually fail with ENOBUFS or similar errors.
What you need to do is build a flow control system that works off of the send completions so that you only ever have a known number of sends outstanding at a time.
See these questions for more detail:
Implement a good performing "to-send" queue with TCP
Limiting TCP sends with a "to-be-sent" queue and other design issues
Finally figured it out.
from Rogers Internet Terms of Service:
Without limitation, you may not use (or allow anyone else to use) our Services to:
(xvi) operate a server in connection with the Services, including, without limitation, >mail, news, file, gopher, telnet, chat, Web, or host configuration servers, multimedia >streamers or multi-user interactive forums;
how lame is that? O_o
good news: server works fine =)
edit- called Rogers. They verified that they are cutting me off, and told me that I need a business account to run a web server.