UDP send() to localhost under Winsock throwing away packets? - c++

Scenario is rather simple... not allowed to use "sendto()" so using "send()" instead...
Under winsock2.2, normal operation on an brand new i7 machine running Windows 7 Professional...
Using SOCK_DGRAM socket, Client and Server console applications connect over localhost (127.0.0.1) to test things ...
Have to use packets of constant size...
Client socket uses connect(), Server socket uses bind()...
Client sends N packets using series of BLOCKING send() calls. Server only uses ioctlsocket call with FIONREAD, running in a while loop to constantly printf() number of bytes awaiting to be received...
PACKETS GET LOST UNLESS I PUT SLEEP() WITH CONSIDERABLE AMMOUNT OF TIME... What I mean is the number of bytes on the receiving socket differs between runs if I do not use SLEEP()...
Have played with changing buffer sizes, situation did not change much, except now there is no overflow, but the problem with the delay remains the same ...
I have seen many discussions about the issue between send() and recv(), but in this scenario, recv() is not even involved...
Thoughts anyone?
(P.S. The constraints under which I am programming are required for reasons beyond my control, so no WSA, .NET, MFC, STL, BOOST, QT or other stuff)
It is NOT an issue of buffer overflow for three reasons:
Both incoming and outgoing buffers are set and checked to be
significantly larger than ALL of the information being sent.
There is no recv(), only checking of the incoming buffer via ioctl() call, recv() is called long after, upon user input.
When Sleep() of >40ms is added between send()-s, the whole thing works, i.e. if there was an overflow no ammount of
Sleep() would have helped (again, see point (2) )

PACKETS GET LOST UNLESS I PUT SLEEP() WITH CONSIDERABLE AMMOUNT OF
TIME... What I mean is the number of bytes on the receiving socket
differs between runs if I do not use SLEEP()...
This is expected behavior; as others have said in the comments, UDP packets can and do get dropped for any reason. In the context of localhost-only communication, however, the reason is usually that a fixed-size packet buffer somewhere is full and can't hold the incoming UDP packet. Note that UDP has no concept of flow control, so if your receiving program can't keep up with your sending program, packet loss is definitely going to occur as soon as the buffers get full.
As for what to do about it, the insert-a-call-to-sleep() solution isn't particularly good because you have no good way of knowing what the "right" sleep-duration ought to be. (To short a sleep() and you'll still drop packets; too long a sleep() and you're transferring data more slowly than you might otherwise do; and of course the "best" value will likely vary from one computer to the next, or one moment to the next, in non-obvious ways).
One thing you could do is switch to a different transport protocol such as TCP, or (since you're only communicating within localhost), a simple pipe or socketpair. These protocols have the lossless FIFO semantics that you are looking for, so they might be the right tool for the job.
Assuming you are required to use UDP, however, UDP packet loss will be a fact of life for you, but there are some things you can do to reduce packet loss:
send() in blocking mode, or if using non-blocking send(), be sure to wait until the UDP socket select()'s as ready-for-write before calling send(). (I know you said you send() in blocking mode; I'm just including this for completeness)
Make your SO_RCVBUF setting as large as possible on the receiving UDP socket(s). The larger the buffer, the lower the chance of it filling up to capacity.
In the receiving program, be sure that the thread that calls recv() does nothing else that would ever hold it off from getting back to the next recv() call. In particular, no blocking operations (even printf() is a blocking operation that can slow your thread down, especially under Windows where the DOS prompt is infamous for slow scrolling under load)
Run your receiver's network recv() loop in a separate thread that does nothing else but call recv() and place the received data into a FIFO queue (or other shared data structure) somewhere. Then another thread can do the less time-critical work of examining and parsing the data in the FIFO, without fear of causing a dropped packet.
Run the UDP-receive thread at the highest priority you can convince the OS to let you run at. The fewer other tasks that can hold of the UDP-receive thread, the fewer opportunities for packets to get dropped during those hold-off periods.
Just keep in mind that no matter how clever you are at reducing the chances for UDP packet loss, UDP packet loss will still happen. So regardless you need to come up with a design that allows your programs to still function in a reasonably useful manner even when packets are lost. This could be done by implementing some kind of automatic-resend mechanism, or (depending on what you are trying to accomplish) by designing the protocol such that packet loss can simply be ignored.

Related

Any way to know how many bytes will be sent on TCP before sending?

I'm aware that the ::send within a Linux TCP server can limit the sending of the payload such that ::send needs to be called multiple times until the entire payload is sent.
i.e. Payload is 1024 bytes
sent_bytes = ::send(fd, ...) where sent_bytes is only 256 bytes so this needs to be called again.
Is there any way to know exactly how many bytes can be sent before sending? If the socket will allow for the entire message, or that the message will be fragmented and by how much?
Example Case
2 messages are sent to the same socket by different threads at the same time on the same tcp client via ::send(). In some cases where messages are large multiple calls to ::send() are required as not all the bytes are sent at the initial call. Thus, go with the loop solution until all the bytes are sent. The loop is mutexed so can be seen as thread safe, so each thread has to perform the sending after the other. But, my worry is that beacuse Tcp is a stream the client will receive fragments of each message and I was thinking that adding framing to each message I could rebuild the message on the client side, if I knew how many bytes are sent at a time.
Although the call to ::send() is done sequentially, is the any chance that the byte stream is still mixed?
Effectively, could this happen:
Server Side
Message 1: "CiaoCiao"
Message 2: "HelloThere"
Client Side
Received Message: "CiaoHelloCiaoThere"
Although the call to ::send() is done sequentially, is the any chance that
the byte stream is still mixed?
Of course. Not only there's a chance of that, it is pretty much going to be a certainty, at one point or another. It's going to happen at one point. Guaranteed.
sent to the same socket by different threads
It will be necessary to handle the synchronization at this level, by employing a mutex that each thread locks before sending its message and unlocking it only after the entire message is sent.
It goes without sending that this leaves open a possibility that a blocked/hung socket will result in a single thread locking this mutex for an excessive amount of time, until the socket times out and your execution thread ends up dealing with a failed send() or write(), in whatever fashion it is already doing now (you are, of course, checking the return value from send/write, and handling the exception conditions appropriately).
There is no single, cookie-cutter, paint-by-numbers, solution to this that works in every situation, in every program, that needs to do something like this. Each eventual solution needs to be tailored based on each program's unique requirements and purpose. Just one possibility would be a dedicated execution thread that handles all socket input/output, and all your other execution threads sending their messages to the socket thread, instead of writing to the socket directly. This would avoid having all execution thread wedged by a hung socket, at expense of grown memory, that's holding all unsent data.
But that's just one possible approach. The number of possible, alternative solutions has no limit. You will need to figure out which logic/algorithm based solution will work best for your specific program. There is no operating system/kernel level indication that will give you any kind of a guarantee as to the amount of a send() or write() call on a socket will accept.

Why need to sleep(1) to allow socket to drain?

I downloaded the source code for a simple static web server from
http://www.ibm.com/developerworks/systems/library/es-nweb/sidefile1.html
However, I'm confused by line 130:
#ifdef LINUX
sleep(1); /* to allow socket to drain */
#endif
exit(1);
Since there is no close for the socket, does it mean I need to wait the client close the socket?
Regardless of the author's intent, it is needless and incorrect. exit() is sufficient. When close() is called on a TCP socket, or exit() is called to terminate the process, unless SO_LINGER socket option has been set to a non-default setting, the kernel will keep the socket(s) in a wait state and attempt to deliver any undelivered / buffered data. You can see this with netstat, and is the reason that fast restarting a TCP server that isn't written for fast restart will have a problem reopening the port quickly (there is a proper way to accomplish this too).
I disagree with a couple of things in the accepted answer.
close() and exit() should have the same effect on the socket, traditionally it has only been a matter of style whether to close sockets if you were about to exit.
It should have nothing to do with overflowing a TCP send buffer, since it happens after all the writes. Full write buffer will return an error immediately by the write() return code; sleeping at the end will be irrelevant to that.
sleep(1) should have no effect on the socket buffer or reliable data delivery. If anything, this code throttles the web server child processes after the writes, so really has no good effect, and could actually increase the potential of a denial of service attack.
I am describing default operation. The defaults can be changed via the many options.
For the "bible" on socket programming, see W. Richard Steven's UNIX Network Programming - Networking APIs: Sockets and XTI where he covers this in detail.
That looks like a bit of sloppy code to me.
If a process with an open socket terminates, and the socket has some unwritten data, the kernel is going to tear down the socket without flushing out the unsent data.
When you write something to a socket, the written data will not necessarily be transmitted immediately. The kernel maintains a small buffer that collects the data being written to a socket. Or a pipe, too. It's more efficient to have the process go on, and then the kernel will take care of actually transmitting the written data, when it has time to do that.
A process can obviously write data to a socket much faster than it can be transmitted over a typical network interface, and the size of the internal socket buffer is limited, so if the process keeps writing data to the socket, at some point it will fill up the internal buffer, and will have to wait until the kernel actually transmits the data, and removes the written data from the internal buffer, before there's room to write more.
[*] I am omitting some technical details, such as that the data isn't considered written until the receiver ACKs it.
Anyway, the intent of that sleep() call appears to be to allow some time for the internal buffer to actually be transmitted, before the process terminates, because if it does before the actual data gets written, the kernel won't bother sending it and terminate the socket, as I just mentioned.
This is a bad example. This is not the right way to do these kinds of things. The socket should simply be close()d. This will correctly flush things out and make sure that everything goes where it's supposed to go. I can't see any valid reason why that example simply didn't properly close the socket, instead of engaging in this kind of hackery.

How to handle multiple clients from one UDP Socket?

I'm dealing right now with an issue, where I don't know the right/best solution to.
Consider the following example:
Imagine you have one Socket, like this:
SOCKET s = socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP);
On this socket, which I will refer to as "ServerSocket", there are incoming many udp packets from many different ip's+port's (clients).
Since it seems not a good idea to create multiple threads blocking in a recvfrom() on this socket, I came to the idea that (maybe) one dedicated thread, that just blocks on recvfrom() puts those ip+port+msg combinations into some kind of "global queue" (std::queue, guarded by a mutex).
So far, so well.
I know about IOCP and the first question about it is: Does it make sense to use IOCP for that kind of problem / on one socket? I came to the problem that, even if the UDP packets (which we all know is not guaranteed by the protocol itself) come in on the socket in the right order, there will be the issue of thread-ordering.
For example, if I'd use IOCP with four threads and four outstanding overlapped wsarecvfrom(), package 1 2 3 4 might be reordered by the thread sheduler e.g. to 3 4 1 2.
If one uses only one outstanding wsarecvfrom(), everything works as expected, because there is just one thread at a time handling the wsarecvfrom(), putting that message into the clients queue and posting the next overlapped wsarecvfrom().
Furthermore, I'd like to emulate functions like recvmsg() and sendmsg() in blocking mode, but the problem here is, e.g. if you have thousands of clients, you can not open 1000's of threads which all have their dedicated recvmsg() blocking on e.g. a condition variable of a clients message queue.
This is an issue as well, since clients might get deleted, by receiving a package, which might contain something like "CLOSE_CONNECTION", to emulate closesocket() like TCP uses it.
I need to use UDP, because the data the user sends is time critical, but it doesn't have to be reliable; only the status messages should be as reliable as possible, like e.g. "CONNECT_REQUEST", if a client "connects" (like tcp does it, which we all know, udp doesn't do, so we have to write it ourselfs, if necessary).
In-order for client-messages would be needed as well.
To sum this all up, the following criteria should be given:
- In-order messages for the client's message part is needed
- reliability for client's messages is NOT necessary (only for the status packages, like "ACK_PACKAGE" etc. ... we're talking about newest message > important than reliability of having the message received)
- many clients have to be managed and things like disconnections (soft/hard, like if a client plugs the networkcable or something ...) have to be detected (threadpool of timers?)
So my final question will be: What is the best approach to reach a goal like that? With TCP, it would be easier, because one IOCP thread could listen to one accept()ed TCP socket, so there wouldn't be that thread reordering problem. With one UDP socket, you can't do it that way, so maybe there must be something like overlapped request, but just for one ... well, "self defined" event.
You're correct in that an IOCP based server using multiple threads to service the IOCP can and will require explicit sequencing to ensure that the results from multiple concurrent reads are processed in the correct sequence. This is equally true of TCP connections (see here).
The way that I usually deal with this problem with TCP is to have a per connection counter which is a value added as meta-data to each buffer used for a recv on that connection. You then simply ensure that the buffers are processed in sequence as the sequence of issued reads is the sequence of read completions out of the IOCP (it's just the scheduling of the multiple threads reading from the IOCP that causes the problem).
You can't take this approach with UDP if you have a single 'well known port' that all peers send to as your sequence numbers have no 'connection' to be associated with.
In addition, an added complication with UDP is that the routers between you and your peer may contrive to resequence or duplicate any datagrams before they get to you anyway. It's unlikely but if you don't take it into account then that's bound to be the first thing that happens when you're demoing it to someone important...
This leads to the fact that to sequence UDP you need a sequence number inside the data portion of the datagram. You then get the problem where UDP datagrams can be lost and so that sequence number is less useful for ensuring all inbound data is processed in sequence and only useful in ensuring that you never process any datagrams out of sequence. That is, if you have a sequence number in your datagram all you can do with it is make sure you never process a datagram from that peer with a sequence number less than or equal to the one you last processed (in effect you need to discard potentially valid data).
This is actually the same problem you'd have with a single threaded system with a single peer, though you'd likely get away without being this strict right up until the important demo when you get a network configuration that happens to result in duplicate datagrams or out of sequence datagrams (both quite legal).
To get more reliability out of the system you need to build more of a protocol on top of UDP. Perhaps take a look at this question and the answers to it. And then be careful not to build something slower and less good and less polite to other network users than TCP...

TCP WSASend Completion Criteria

I am unable to find the specification of what it means that a TCP WSASend call completes. Does the completion of a WSASend operation require that an ACK response be received?
This question is relevant for slower networks with a 200ms - 2s ping timeout. Will it take 200ms - 2s for the WSASend completion callback to be invoked (or whatever completion mechanism is used)? Or perhaps only on some packets will Windows wait for an ACK and consider the WSASend operation complete much faster for all other packets?
The exact behavior makes a big difference with regard to buffer life cycle management and in turn has a significant impact on performance (locking, allocation/deallocation, and reference counting).
WSASend does not guarantee the following:
That the data was sent (it might have been buffered)
That it was received (it might have been lost)
That the receiving application processed it (the sender cannot ever know this by principle)
It does not require a round-trip. In fact, with nagling enabled small amounts of data are always buffered for 200ms hoping that the application will send more. WSASend must return quickly so that nagling has a chance to work.
If you require confirmation, change the application protocol so that you get a confirmation back. No other way to do it.
To clarify, even without nagling (TCP_NODELAY) you do not get an ACK for your send operation. It will be sent out to the network but the remote side does not know that it should ACK. TCP has no way to say "please ACK this data immediately". Data being sent does not mean it will ever be received. The network could drop a second after the data was pushed out to a black hole.
It's not documented. It will likely be different depending on whether you have turned off send buffering. However, you always need to pay attention to the potential time that it will take to get a WSASend() completion, especially if you're using asynchronous calls. See this article of mine for details.
You get a WSASend() completion when the TCP stack has finished with your buffer. If you have NOT turned off send buffering by setting SO_SNDBUF to zero then it likely means you will get a completion once the stack copies your data into its buffers. If you HAVE turned off send buffering then it likely means that you will get a completion once you get an ACK (simply because the stack should need your buffer for any potential retransmissions). However, it's not documented.

Efficiency between select() and recv with MSG_PEEK. Asynchronous

I would like to know what would be most efficient when checking for incoming data (asynchronously). Let's say I have 500 connections. I have 3 scenarios (that I can think of):
Using select() to check FD_SETSIZE sockets at a time, then iterating over all of them to receive the data. (Wouldn't this require two calls to recv for each socket returned? MSG_PEEK to allocate a buffer then recv() it again which would be the same as #3)
Using select() to check one socket at a time. (Wouldn't this also be like #3? It requires the two calls to recv.)
Use recv() with MSG_PEEK one socket at a time, allocate a buffer then call recv() again. Wouldn't this be better because we can skip all the calls to select()? Or is the overhead of one recv() call too much?
I've already coded the situations to 1 and 2, but I'm not sure which one to use. Sorry if I'm a bit unclear.
Thanks
FD_SETSIZE is typically 1024, so you can check all of 500 connections at once. Then, you will perform the two recv calls only on those which are ready -- say, for a very busy system, half a dozen of them each time around, for example. With the other approaches you need about 500 more syscalls (the huge amount of "failing" recv or select calls you perform on the many hundreds of sockets which will not be ready at any given time!-).
In addition, with approach 1 you can block until at least one connection is ready (no overhead in that case, which won't be rare in systems that aren't all that busy) -- with the other approaches, you'll need to be "polling", i.e., churning, continuously, burning huge amounds of CPU to no good purpose (or, if you sleep a while after each loop of checks, then you'll have a delay in responding despite the system not being at all busy -- eep!-).
That's why I consider polling to be an anti-pattern: frequently used, but nevertheless destructive. Sometimes you have absolutely no alternative (which basically tells you that you're having to interact with very badly designed systems -- alas, sometimes in this imperfect life you do have to!-), but when any decent alternative does exist, doing polling nevertheless is really a very bad design practice and should be avoided.
you can simply do some efficiency simulation on 3 scenario where:
Scenario A (0/500 incoming data)
for solution #1, you only invoke single select()
for solution #2, you need 500 select()
for solution #3, you need 500 recv()
Scenario B (250/500 incoming data)
for solution #1, single select() + (500 recv())
for solution #2, 500 select() + (500 recv())
for solution #3, 750 recv()
**assume skipping socket with no buffer size # no incoming data
answer is obvious :)
...most efficient when checking for
incoming data (asynchronously). Let's
say I have 500 connections. I have 3
scenarios (that I can think of):
Using select() to check FD_SETSIZE
sockets at a time, then iterating over
all of them to receive the data.
(Wouldn't this require two calls to
recv for each socket returned? MSG_PEEK to allocate a buffer then recv() it again which would be the same as #3)
I trust you're carefully constructing your fd set with only the descriptors that are currently connected...? You then iterate over the set and only issue recv() for those that have read or exception/error conditions (the latter difference being between BSD and Windows implementations). While it's ok functionally (and arguably elegant conceptually), in most real-world applications you don't need to peek before recv-ing: even if you're unsure of the message size and know you could peek it from a buffer, you should consider whether you can:
process the message in chunks (e.g. read whatever's a good unit of work - maybe 8k, process it, then read the next <=8k into the same buffer...)
read into a buffer that's big enough for most/all messages, and only dynamically allocate more if you find the message is incomplete
Using select() to check one socket at a time. (Wouldn't this also be like #3? It requires the two calls to recv.)
Not good at all. If you stay single-threaded, you'd need to put a 0 timeout value on select and spin like crazy through the listenig and client descriptors. Very wasteful of CPU time, and will vastly degrade latency.
Use recv() with MSG_PEEK one socket at a time, allocate a buffer then call recv() again. Wouldn't this be better because we can skip all the calls to select()? Or is the overhead of one recv() call too much?
(Ignoring that it's better to try to avoid MSG_PEEK) - how would you know which socket to MSG_PEEK or recv() on? Again, if you're single threaded, then either you'd block on the first peek/recv attempt, or you use non-blocking mode and then spin like crazy through all the descriptors hoping a peek/recv will return something. Wasteful.
So, stick to 1 or move to a multithreaded model. For the latter, the simplest approach to begin with is to have the listening thread loop calling accept, and each time accept yields a new client descriptor it should spawn a new thread to handle the connection. These client-connection handling threads can simply block in recv(). That way, the operating system itself does the monitoring and wake-up of threads in response to events, and you can trust that it will be reasonably efficient. While this model sounds easy, you should be aware that multi-threaded programming has lots of other complications - if you're not already familiar with it you may not want to try to learn that at the same time as socket I/O.