Winsock send() issue with single byte transmissions - c++

I'm experiencing a frustrating behaviour of windows sockets that I cant find any info on, so I thought I'd try here.
My problem is as follows:
I have a C++ application that serves as a device driver, communicating with a serial device connected
through a serial to TCP/IP converter.
The serial protocol requires a lot of single byte messages to be communicated between the device and
my software. I noticed that these small messages are only sent about 3 times after startup, after which they are no longer actually transmitted (checked with wireshark). All the while, the send() method keeps returning > 0, indicating that the message has been copied to it's send buffer.
I'm using blocking sockets.
I discovered this issue because this particular driver eventually has to drop it's connection when the send buffer is completely filled (select() fails due to this after about 5 hours, but it happens much sooner when I reduce SO_SNDBUF size).
I checked, and noticed that when I call send with messages of 2 bytes or larger, transmission never fails.
Any input would be very much appreciated, I am out of ideas how to fix this.

This is a rare case when you should set TCP_NODELAY so that the sends are written individually, not coalesced. But I think you have another problem as well. Are you sure you're reading everything that's being sent back? And acting on it properly? It sounds like an application protocol problem to me.

Related

Interminent Delays in C++ Tcp Communication in Linux

I have a device which sends data every 20 milliseconds over TCP. I have an application which connects to this device, starts the socket communication. My Application listens on a seperate thread and reads the data as fast as data is ready, puts data aside, and some other thread processes it. Device is directly connected to the computer via ethernet cable.
I see a strange problem and I am trying to understand the reason why, Almost once in every minute, it takes approximately 50 milliseconds to receive a packet from the device. I do a blocking read which will try reading for a second, and will finish as fast as data is ready, normally it takes approximately 20 ms as I would expect, but like I said before there are times it takes 50 ms even though it is very rare(1 in 3000). What I noticed is the packets after late packet arrives immediately, so it makes me think that there's some delay on the network layer. I also examined the timestamps of the packets(which is given by the device), they are consistenly increasing by 20 ms's.
Is it normal to see delays like that when the device is directly connected to the computer, Since it is TCP there might be lots of effort under the hood(CRC checks, out of order packages, retransmissions, etc). I still want to find an alternative way to prevent this delay than accepting the fact that it might happen.
Any insights will be greatly appreciated.
It's probably result of Nagle's algorithm which is turned on by default in TCP/IP socket.
Use setsockopt() to set the TCP_NODELAY flag on socket that sends data to turn it off.

Send/Recv Socket Blocking Issues

another question about my beloved sockets.
I'll first explain what my case is. After that I will tell you whats bothering me.
I have a client and a server. Both Applications are written in C++ with the winsock2 implementation. The connection runs over TCP and WLAN. WLan is very important, because its probably causing the issue and is definetly going to be the communicationchannel.
I'm connecting two sockets to the server. A SendSocket and a ReceiveSocket. I'm constantly sending video data to the server through the sendsocket. The data is processed and gets send back to the client and gets displayed. Each socket got his own thread.
The Videodata is encoded, so I achieve like 500kB/s. Lets see this rate as fixed, without explanation.
Perfect communication viewed by the client:
Send Data
Recv Data
Send Data
Recv Data
...
This is for like 100 frames the case.
But every couple of frames, the stream freezes for like 4 frames and continues after that. (4 frames are like 500ms)
Thats the issue, i'm facing.
What happens to the stream is the following:
Send Data
Recv Data
Send Data
Send Data
Send Data1 -> blocked send
Recv Data
Recv Data
Send Data2 -> not blocked anymore.
The Data gets properly sent on server side.
Since WLan is not duplex (as far as I know), I thought, that the send calls are prioritized for some reason. And after that the Receive calls are prioritized, so the send call blocks until the recv calls are done.
Maybe you can tell me, what is happening in the lower layer, which could cause the problem.
Btw. I'm definetly not sure, if its not just a bandwidth issue, but I thought WLAN should be able to handle 500kB/s. This 500kB/s are both upstream and downstream together.
Important notice: If I set the framerate to a factor of 1/5, it does not fix the issue.
I know it's hard to fix this issue with this insight. I would be happy, if you could share your knowledge, so I may be able to fix it myself.
EDIT: Its perfectly fine, if the client recv hangs a litte. But it must not block the send. The server needs data continuosly.
A blocked send means either that the socket send buffer is full, which means either (a) that the socket receiver buffer at the receiver is full, which means the receiver isn't reading as fast as you're sending; or else (b) that there are network losses that are causing the sender to retry. In either case there is nothing you can do about it at the sending end.
Someone is bound to mention non-blocking I/O as a solution, but it isn't: at the point where a blocking sender blocks, a non-blocking sender will get -1 from send() witch 'errno == EAGAIN/EWOULDBLOCK', which doesn't solve the actual problem at all.
Alright then. It was definetly a wlan issue. I tested over the eduroam wlan at my university. I don't know, if anybody knows it. Now I tested it with a simple router and it worked fine. Seems like the eduroam wlan does have some trouble with bandwidth or direction changes. I won't look into that...

Forced server-side socket close without SO_LINGER > 0 can lose data, right?

I'm writing a cross-platform client application that uses sockets, written in C++. I'm having problems where the server is doing a hard close on the socket when it's done sending me info.
I've been reading other posts on this topic, and I'm not so much interested in the rights or wrong of this approach, but it's seems the server is either explicitly setting SO_LINGER=0, or that's the default behavior on that system (not sure, it's a Linux box).
I can see (in Wireshark) that the data was sent to me followed within milli-seconds by an RST, indicating a hard close by the server. I personally don't agree with this approach as it should be up to the client to shutdown the socket.
Server team are saying there's nothing wrong with that approach (doing a hard close rather than shutdown), it's typical on servers to avoid accumulating TIMED_WAIT sockets. On Windows my select() returns indicating there's something to read (while I haven't read any of this "in transit" data yet).
However, because of the quick arrival of the RST, on Windows recv() returns -1 and I'm seeing a 10054 for the error code (connection reset by peer). This wouldn't be too bad if I could at least get the data that was sent, but it seems that once my client's socket stack sees the RST any unread bytes are no longer made available to me.
On Linux (client), there's no problem. It seems the TCP stack is behaving slightly differently, in that I can read the outstanding bytes before the RST is honoured. I'm having trouble convincing the server guys they have a bug, given that it works for a Linux client.
First off, am I correct? Is this a server-side issue? I can't see that the client end is doing anything wrong, so it must be right?
It seems the server team are adamant that they want to perform the close, and they don't want to in have TIMED_WAITs, so I was going to push for them to add a SO_LINGER of, say 2 seconds? Does that sound like it will solve my problem? From what I understand this will stop the server from sending out a RST so soon after sending data, and should give me a chance to read the outstanding bytes.
Found a definitive answer to my own question:
"...Upon reception of RST segment, the receiving side will immediately abort the connection. This statement has more implications than just meaning that you will not be able to receive or send any more data to/from this connection. It also implies that any unread data still in the TCP reception buffer will be lost..." It cites the book "TCP/IP Internetworking Volume II". I don't have that book, so I can only take his word for it. Doesn't seems to discard data on Linux, only Windows...
Olivier Langlois's blog
The side-effect of fiddling with SO_LINGER to force a reset is that all pending data is lost. The fact that you don't receive it is all the proof you need that the server team is wrong to do this.
RFC 793 cited below says 'this command [ABORT] causes all pending SENDs and RECEIVEs to be aborted, ... and a special RESET message to be sent to the TCP on the other side of the connection.' See also W.R. Stevens, TCP/IP Illustrated, Vol. 1, p. 287: 'Aborting a connection provides two features to the application: (1) any queued data is thrown away and the reset is sent immediately, and (2) the receiver of the RST can tell that the other end did an abort instead of a normal close'. There is similar wording, along with an extract from the BSD code that implements it, in Vol. 2.
The TIME_WAIT state only occurs on a socket which sends a FIN before it has received one: see RFC 793. So the server should be waiting for a FIN from the client, with a suitable timeout, rather than resetting. This will also permit the client to do connection pooling.

Facing an issue with recv() and send() winsock api. Recv() hangs while receving the last packet

I am facing an issue with recv() and send() winsock api. Recv() hangs while receving the last packet.
Problem Description:-
System A's app is writing data over a non-blocking socket and system B's app is receiving data over a blocking socket in chunks of 64k.
It seems that while reading probably the last packet of 64k, which may less than or equal to 64k, the receive freezes. I am not sure if the receive of the last packet or send of the last packet is an issue, but I am observing this issue intermittently in our legacy applications.
Has anyone faced a similar issue before? If yes, then can please provide your inputs.
If not, then can you please provide some trouble-shooting techniques to narrow down to the root cause.
Just for information I have win2k3 servers.
Thanks,
Varun
Wireshark is a great tool for troubleshooting networking code. It'll show you exactly what packets are entering and leaving your network interface in near real time.
As to your specific issue: are you saying that the last chunk of data might be shorter than 64k? If so, your protocol should include some message length information so the receiver
knows how much data to look for.
A couple of guesses...
If you are using UDP, perhaps one or more packets are being dropped en route (which UDP is permitted to do whenever it feels like). In that case, your receiver might end up waiting for data that is simply never going to arrive; to fix this you would need to either implement some way of automatically resending the lost data, or (if you don't strictly need all the data), some way for the sender to notify the receiver that he is done transmitting, so the receiver might as well stop waiting. (of course you would need to handle the case where this notification gets dropped, as well... it can get complicated if you want 100% robustness)
If you are using TCP, perhaps you are not carefully checking the values returned by send() on the sending side? If you are assuming that send() will always send the number of bytes you asked it to, you might end up thinking send() sent all the bytes when in fact it only sent some (or none) of them... so the sender would think the transmission was complete, while the receiver would be stuck waiting for data that isn't going to arrive.
You might have a problem with the server sending data down the wire faster than the receiver is able to read it. You could try increasing the receive buffer:
int nSocketBuffer = 131072; // 128k
if (setsockopt(m_sSocket,SOL_SOCKET,SO_RCVBUF,(LPCSTR)&nSocketBuffer,sizeof(int)) == SOCKET_ERROR)
{
// socket error
return false;
}

setsockopt TCP_NODELAY question on Windows Mobile

I have a problem on Windows Mobile 6.0.
I would like to create a TCP connection which does not
use the Nagle algorithm, so it sends my data when I call
"send" function, and does not buffer calls, having too
small amount of data.
I tried the following:
BOOL b = TRUE;
setsockopt(socketfd, IPPROTO_TCP, TCP_NODELAY, (char*)(&b), sizeof(BOOL));
It works fine on desktop. But on Windows Mobile, if I
set this value, than I make a query for it, the returned
value is 8. And the network traffic analysis shows that the
nothing changed.
Is there any way to force a flush to my socket?
It seems to me that TCP_NODELAY option is not supported for windows mobile edition. Check the MSDN documentation, it might have something to that effect, but I remember a while back struggling with setting few socket options including TCP_NODELAY and setting send and receive buffer, and the setsockopt call would fail. Check that setsockopt returns false, if not, get the ::WSAGetLastError() and see if that leads you anywhere. In my case, I remember having to do without these options as they were not supported. I was working on Windows Mobile 5.
Are you setting the option on both ends of the connection and after the connection has been established? I just had someone test it and it worked fine with TCP over ActiveSync, significantly improving the command-response cycle time in the test app (about a 4x improvement in fact).
The server is given I cannot modify it, but i.e.: our Symbian client
works fine with it, with this option.
I tried setting this option both before and after creating the connection
but nothing changed.
I use TCP over Windows Mobile Device Center(since I use Vista).
It just occurred to me, (this is a wild guess and probably not likely) but maybe you're having a delayed ack problem due to your send buffer being smaller than the size of the data you're writing. Nagle may have nothing to do with it.
Does the receiving side send any data back immediately? If not, your peer will delay it's ack for up to 200ms waiting to piggy back it's ack on some data to make better use of bandwidth.
When the send buffer on the socket is smaller than the data in this case the call to write will block until the ack has been received and all the data sent.
For example if your send buffer is 8192 bytes and you send 8193 bytes and your peer sends no data back then your write will block for 200ms ( or however long your peers implementation delays the ack) effectively making it look like Nagle is killing you even when it's disabled.
If this is the case you could either increase the send buffer size or have your peer always send you back a null byte to force the ack to be sent immediately.
Otherwise, I would maybe try playing around with NTttcp_x86 a bit to model your applications send / receive patterns and see if maybe something else is going on.