Efficiently send a stream of UDP packets - c++

I know how to open an UDP socket in C++, and I also know how to send packets through that. When I send a packet I correctly receive it on the other end, and everything works fine.
EDIT: I also built a fully working acknowledgement system: packets are numbered, checksummed and acknowledged, so at any time I know how many of the packets that I sent, say, during the last second were actually received from the other endpoint. Now, the data I am sending will be readable only when ALL the packets are received, so that I really don't care about packet ordering: I just need them all to arrive, so that they could arrive in random sequences and it still would be ok since having them sequentially ordered would still be useless.
Now, I have to transfer a big big chunk of data (say 1 GB) and I'd need it to be transferred as fast as possible. So I split the data in say 512 bytes chunks and send them through the UDP socket.
Now, since UDP is connectionless it obviously doesn't provide any speed or transfer efficiency diagnostics. So if I just try to send a ton of packets through my socket, my socket will just accept them, then they will be sent all at once, and my router will send the first couple and then start dropping them. So this is NOT the most efficient way to get this done.
What I did then was making a cycle:
Sleep for a while
Send a bunch of packets
Sleep again and so on
I tried to do some calibration and I achieved pretty good transfer rates, however I have a thread that is continuously sending packets in small bunches, but I have nothing but an experimental idea on what the interval should be and what the size of the bunch should be. In principle, I can imagine that sleeping for a really small amount of time, then sending just one packet at a time would be the best solution for the router, however it is completely unfeasible in terms of CPU performance (I probably would need to busy wait since the time between two consecutive packets would be really small).
So is there any other solution? Any widely accepted solution? I assume that my router has a buffer or something like that, so that it can accept SOME packets all at once, and then it needs some time to process them. How big is that buffer?
I am not an expert in this so any explanation would be great.
Please note, however, that for technical reasons there is no way at all I can use TCP.

As mentioned in some other comments, what you're describing is a flow control system. The wikipedia article has a good overview of various ways of doing this:
http://en.wikipedia.org/wiki/Flow_control_%28data%29
The solution that you have in place (sleeping for a hard-coded period between packet groups) will work in principle, but in order to get reasonable performance in a real-world system you need to be able to react to changes in the network. This means implementing some kind of feedback where you automatically adjust both the outgoing data rate and packet size in response to to network characteristics, such as throughput and packetloss.
One simple way of doing this is to use the number of re-transmitted packets as an input into your flow control system. The basic idea would be that when you have a lot of re-transmitted packets, you would reduce the packet size, reduce the data rate, or both. If you have very few re-transmitted packets, you would increase packet size & data rate until you see an increase in re-transmitted packets.
That's something of a gross oversimplification, but I think you get the idea.

Related

UDP transfer is too fast, Apache Mina doesn't handle it

We decided to use UDP to send a lot of data like coordinates between:
client [C++] (using poll)
server [JAVA] [Apache MINA]
My datagrams are only 512 Bytes max to avoid as possible the fragmentation during the transfer.
Each datagram has a header I added (with an ID inside), so that I can monitor :
how many datagrams are received
which ones are received
The problem is that we are sending the datagrams too fast. We receive like the first ones and then have a big loss, and then get some, and big loss again. The sequence of ID datagram received is something like [1], [2], [250], [251].....
The problem is happening in local too (using localhost, 1 network card only)
I do not care about losing datagrams, but here it is not about simple loss due to network (which I can deal with)
So my questions here are:
On client, how can I get the best :
settings, or socket settings?
way to send as much as I can without being to much?
On Server, Apache MINA seems to say that it manage itself the ~"size of the buffer socket"~ but is there still some settings to care about?
Is it possible to reach something like 1MB/s knowing that our connection already allow us to have at least this bandwidth when downloading regular files?
Nowadays, when we want to transfer a ~4KB coordinates info, we have to add sleep time so that we are waiting 5 minutes or more to get it to finish, it's a big issue for us knowing that we should send every minute at least 10MB coordinates informations.
If you want reliable transport, you should use TCP. This will let you send almost as fast as the slower of the network and the client, with no losses.
If you want a highly optimized low-latency transport, which does not need to be reliable, you need UDP. This will let you send exactly as fast as the network can handle, but you can also send faster, or faster than the client can read, and then you'll lose packets.
If you want reliable highly optimized low-latency transport with fine-grained control, you're going to end up implementing a custom subset of TCP on top of UDP. It doesn't sound like you could or should do this.
... how can I get the best settings, or socket settings
Typically by experimentation.
If the reason you're losing packets is because the client is slow, you need to make the client faster. Larger receive buffers only buy a fixed amount of headroom (say to soak up bursts), but if you're systematically slower any sanely-sized buffer will fill up eventually.
Note however that this only cures excessive or avoidable drops. The various network stack layers (even without leaving a single box) are allowed to drop packets even if your client can keep up, so you still can't treat it as reliable without custom retransmit logic (and we're back to implementing TCP).
... way to send as much as I can without being to much?
You need some kind of ack/nack/back-pressure/throttling/congestion/whatever message from the receiver back to the source. This is exactly the kind of thing TCP gives you for free, and which is relatively tricky to implement well yourself.
Is it possible to reach something like 1MB/s ...
I just saw 8MB/s using scp over loopback, so I would say yes. That uses TCP and apparently chose AES128 to encrypt and decrypt the file on the fly - it should be trivial to get equivalent performance if you're just sending plaintext.
UDP is only a viable choice when any number of datagrams can be lost without sacrificing QoS. I am not familiar with Apache MINA, but the scenario described resembles the server which handles every datagram sequentially. In this case all datagrams arrived while the one is serviced will be lost - there is no queuing of UDP datagrams. Like I said, I do not know if MINA can be tuned for parallel datagram processing, but if it can't, it is simply wrong choice of tools.

Which method to send/receive data properly in a network game (UDP, but why not TCP)

I have a C++ application with GUI that runs (on PC 1) just like a network game, and receives data packets from another computer (2) via WiFi (ad-hoc, so it's quite reliable) at fairly regular intervals (like 40ms), once per loop on program (2). I use send/read.
Here is the problem:
- Packets are not always fully sent (but apparently you can simply keep send()ing the remaining data until all is sent, and thats works well)
- More importantly, packets are stacked in the socket during (1)'s loop until the read() occurs, and then there is no way to distinguish packets in the big stream of data, or know if you were already in the middle of a packet.
I tried to fix this with ID headers (you find an ID as first bytes and you know the length of the packet), but I often get lost (unknown ID : we are not at the beginning of the packet) and am forced to ignore all the remaining data.
So my question is:
Why do packets stack? (generally I have 400B of data whereas my packets are <100B long and fps (1) and (2) are not very different)
How can I have a more reliable way to receive actual packets, say, 80% of packets (discarding packet loss, it's not a question of UDP/TCP)?
Would a separate thread for receiving packets work? (on (1), the server)
How do real-time network games to that (including multiple client management)?
Thanks in advance.
(Sorry I do not have the code here, but I tried to be as clear as I could)
Well:
1) UDP transfers MESSAGES, but is unreliable.
2) TCP transfers BYTE STREAMS, and is reliable.
UDP cannot reliably transfer messages. Anything more reliable requires a protocol on top of UDP.
TCP cannot transfer messages unless they are one byte long. Anything more complex requires a protocol on top of TCP.
Why do packets stack? (generally I have 400B of data whereas my packets are <100B long and fps (1) and (2) are not very different)
Because the time to send packets across the net varies, it typically does not make sense to send packets at a high rate, so most networking libraries (e.g. RakNet) will queue up packets and do a send every 10 ms.
In the case of TCP, there is Nagle's algorithm which is a more principled way of doing the same thing. You can turn Nagle's off by setting the NO_DELAY TCP flag.
How can I have a more reliable way to receive actual packets, say, 80% of packets (discarding packet loss, it's not a question of UDP/TCP)?
If you use TCP, you will receive all of the packets and in the right order. The penalty for using TCP is if a packet is dropped, the packets after it wait until that packet can be resent before they are processed. This results in a noticeable delay, so any games that use TCP have sophisticated prediction techniques to hide this delay and other techniques to smoothly "catch up" once the missing packet arrives.
If you use UDP, you can implement a layer on top that gives you reliability but without the ordering if the order of the packets doesn't matter by sending a counter with each packet and having the receiver repeatedly notify the sender of gaps in the counts. You can also implement ordering by doing something similar. Of course, if you enforce both, then you are creating your own TCP layer. See http://www.jenkinssoftware.com/raknet/manual/reliabilitytypes.html for more details.
What you describe is what would happen if you are using TCP without a protocol on top of it to structure your transmitted data. Your idea of using an ID header and packet length is one such protocol. If you send a 4-byte ID followed by a 4-byte length followed by X number of bytes, then the receiver knows that it has to read 4 bytes followed by 4 bytes followed by X bytes to receive a complete packet. It doesn't get much simplier than that. The fact that you are still having problems reading packets with such a simple protocol suggests that your underlying socket reading code is flawed to begin with. Without seeing your actual code, it is difficult to tell you what you are doing wrong.

Using IOCP with UDP?

I'm pretty familiar with what Input/Output Completion Ports are for when it comes to TCP.
But what, if I am for example coding a FPS game, or anything where need for low latency can be a deal breaker - I want immediate response to the player to provide the best playing experience, even at cost of losing some spatial data on the go. It becomes obvious that I should use UDP and aside from sending coordinate updates frequently, I should also implement kind of semi-reliable protocol (afaik TCP induces packet loss in UDP so we should avoid mixing these two) to handle such events like chat messages, or gunshots where packet loss may be crucial.
Let's say I'm aiming at performance which would apply to MMOFPS game that allows to meet hundreds of players in one, persistent world, and aside from fighting with guns, it allows them to communicate through chat messages etc. - something like this actually exists and works well - check out PlanetSide 2.
Many articles there on the net (e.g. these from msdn) say overlapped sockets are the best and IOCP is god-tier concept, but they don't seem to distinguish the cases where we use other protocols than TCP.
So there is almost no reliable information about I/O techniques used when developing such a server, I've looked at this, but the topic seems to be highly controversial, and I've also seen this , but considering discussions in the first link, I don't know if I should follow assumptions of the second one, whether I should use IOCP with UDP at all, and if not, what is the most scalable and efficient I/O concept when it comes to UDP.
Or maybe am I just making another premature optimization and no thinking ahead is required for the moment ?
Thought about posting it on gamedev.stackexchange.com, but this question better applies to general-purpose networking I think.
I do not recommend using this, but technically the most efficient way to receive UDP datagrams would be to just block in recvfrom (or WSARecvFrom if you will). Of course, you'll need a dedicated thread for that, or not much will happen otherwise while you block.
Other than with TCP, you do not have a connection built into the protocol, and you do not have a stream without defined borders. That means you get the sender's address with every datagram that comes in, and you get a whole message or nothing. Always. No exceptions.
Now, blocking on recvfrom means one context switch to the kernel, and one context switch back when something was received. It won't go any faster by having several overlapped reads in flight either, because only one datagram can arrive on the wire at the same time, which is by far the most limiting factor (CPU time is not the bottleneck!). Using an IOCP means at least 4 context switches, two for the receive and two for the notification. Alternatively, an overlapped receive with completion callback is not much better either, because you must NtTestAlert or SleepEx to run the APC queue, so again you have at least 2 extra context switches (though, it's only +2 for all notifications together, and you might incidentially already sleep anyway).
However:
Using an IOCP and overlapped reads is nevertheless the best way to do it, even if it is not the most efficient one. Completion ports are irrespective from using TCP, they work just fine with UDP, too. As long as you use an overlapped read, it does not matter what protocol you use (or even whether it's network or disk, or some other waitable or alertable kernel object).
It also does not really matter for either latency or CPU load whether you burn a few hundred cycles extra for the completion port. We're talking about "nano" versus "milli" here, a factor of one to one million. On the other hand, completion ports are overall a very comfortable, sound, and efficient system.
You can for example trivially implement logic for resending when you did not receive an ACK in time (which you must do when a form of reliability is desired, UDP does not do it for you), as well as keepalive.
For keepalive, add a waitable timer (maybe firing after 15 or 20 seconds) that you reset every time you receive anything. If your completion port ever tells you that this timer went off, you know the connection is dead.
For resends, you could e.g. set a timeout on GetQueuedCompletionStatus, and every time you wake up find all packets that are more than so-and-so old and have not been ACKed yet.
The entire logic happens in one place, which is very nice. It's versatile, efficient, and hard to do wrong.
You can even have several threads (and, indeed, more threads than your CPU has cores) block on the completion port. Many threads sounds like an unwise design, but it is in fact the best thing to do.
A completion port wakes up to N threads in last-in-first-out order, N being the number of cores unless you tell it to do something different. If any of these threads block, another one is woken to handle outstanding events. This means that in the worst case, an extra thread may be running for a short time, but this is tolerable. In the average case, it keeps processor usage close to 100% as long as there is some work to do and zero otherwise, which is very nice. LIFO waking is favourable for processor caches and keeps switching thread contexts low.
This means you can block and wait for an incoming datagram and handle it (decrypt, decompress, perform logic, read someting from disk, whatever) and another thread will be immediately ready to handle the next datagram that might come in the next microsecond. You can use overlapped disk IO with the same completion port, too. If you have compute work (such as AI) to do that can be split into tasks, you can manually post (PostQueuedCompletionStatus) those on the completion port as well and you have a parallel task scheduler for free. All you have to do is wrap an OVERLAPPED into a structure that has some extra data after it, and use a key that you will recognize. No worrying about thread synchronization, it just magically works (you don't even strictly need to have an OVERLAPPED in your custom structure when posting your own notifications, it will work with any structure you pass, but I don't like lying to the operating system, you never know...).
It does not even matter much whether you block, for example when reading from disk. Sometimes this just happens and you can't help it. So what, one thread blocks, but your system still receives messages and reacts to it! The completion port automatically pulls another thread from its pool when it's necessary.
About TCP inducing packet loss on UDP, this is something that I am inclined to call an urban myth (although it is somewhat correct). The way this common mantra is worded is however misleading. It may have been true once upon a time (there exists research on that matter, which is, however, close to a decade old) that routers would drop UDP in favour of TCP, thereby inducing packet loss. That is, however, certainly not the case nowadays.
A more truthful point of view is that anything you send induces packet loss. TCP induces packet loss on TCP and UDP induces packet loss on TCP and vice versa, this is a normal condition (it's how TCP implements congestion control, by the way). A router will generally forward one incoming packet if the cable on the other plug is "silent", it will queue a few packets with a hard deadline (buffers are often deliberately small), optionally it may apply some form of QoS, and it will simply and silently drop everything else.
A lot of applications with rather harsh realtime requirements (VoIP, video streaming, you name it) nowadays use UDP, and while they cope well with a lost packet or two, they do not at all like significant, recurring packet loss. Still, they demonstrably work fine on networks that have a lot of TCP traffic. My phone (like the phones of millions of people) works exclusively over VoIP, data going over the same router as internet traffic. There is no way I can provoke a dropout with TCP, no matter how hard I try.
From that everyday observation, one can tell for certain that UDP is definitively not dropped in favour of TCP. If anything, QoS might favour UDP over TCP, but it most certainly doesn't penaltize it.
Otherwise, services like VoIP would stutter as soon as you open a website and be unavailable alltogether if you download something the size of a DVD ISO file.
EDIT:
To give somewhat of an idea of how simple life with IOCP can be (somewhat stripped down, utility functions missing):
for(;;)
{
if(GetQueuedCompletionStatus(iocp, &n, &k, (OVERLAPPED**)&o, 100) == 0)
{
if(o == 0) // ---> timeout, mark and sweep
{
CheckAndResendMarkedDgrams(); // resend those from last pass
MarkUnackedDgrams(); // mark new ones
}
else
{ // zero return value but lpOverlapped is not null:
// this means an error occurred
HandleError(k, o);
}
continue;
}
if(n == 0 && k == 0 && o == 0)
{
// zero size and zero handle is my termination message
// re-post, then break, so all threads on the IOCP will
// one by one wake up and exit in a controlled manner
PostQueuedCompletionStatus(iocp, 0, 0, 0);
break;
}
else if(n == -1) // my magic value for "execute user task"
{
TaskStruct *t = (TaskStruct*)o;
t->funcptr(t->arg);
}
else
{
/* received data or finished file I/O, do whatever you do */
}
}
Note how the entire logic for both handling completion messages, user tasks, and thread control happens in one simple loop, no obscure stuff, no complicated paths, every thread only executes this same, identical loop.
The same code works for 1 thread serving 1 socket, or for 16 threads out of a pool of 50 serving 5,000 sockets, 10 overlapped file transfers, and executing parallel computations.
I've seen the code to many FPS games that use UDP as the networking protocol.
The standard solution is to send all the data you need to update a single game frame in one large UDP packet. That packet should include a frame number, and a checksum. The packet should of course be compressed.
Generally the UDP packet contains the positions and velicities for every entity near the player, any chat messages that were sent, and all recent state changes. ( e.g. new entity created, entity destrouyed etc. )
Then the client listens for UDP packets. It will use only the packet with the highest frame number. So if out of order packets appear, the older packets are simply ignored.
Any packets with wrong checksums are also ignored.
Each packet should contain all the information to synchronize the client's game state with the server.
Chat messages get sent repeatedly over several packets, and each message has a unique message id For example, you retransmit the same chat message for say a full second worth of frames. If a client misses a chat message after getting it 60 times - then the quality of the network channel is just too low to play the game. Clients will display any messages they get in a UDP packet that have a message ID they have not yet displayed.
Similarly for objects being created or destroyed. All created or destroyed objects have a unique object Id set by the server. Objects get created or destroyed if the object id they correspond to has not been acted on before.
So the key here is to send data redundantly, and key all state transitions to unique id's set by the server.
#edit: Another poster mentioned that for chat messages you might want to use a different protocol on a different port. And they may be right about that probably being optimal. That is for message types where latency is not critical, but reliability is more important you might want to open up a different port and use TCP. But I'd leave that as a later excercise. It is certainly easier and cleaner at first for your game to use just one channel, and figure out the vagaries of multiple ports, multiple channels, with their various failure modes later. (e.g. what happens if the UDP channel is working, but the chat channel goes goes down? What if you succeed in opening one port and not the other? )
When I did this for a client we used ENet as the base reliable UDP protocol and re-implemented this from scratch to use IOCP for the server side whilst using the freely available ENet code for the client side.
IOCP works fine with UDP and integrates nicely with any TCP connections that you might also be handling (we have TCP, WebSocket or UDP client connections in and TCP connections between server nodes and being able to plug all of these into the same thread pool if we want is handy).
If absolute latency and UDP packet processing speed is most important (and it's unlikely it really is) then a using the new Server 2012 RIO API might be worth it, but I'm not convinced yet (see here for some preliminary performance tests and some example servers).
You probably want to look at using GetQueuedCompletionStatusEx() for dealing with your inbound data as it reduces the context switches per datagram as you can pull multiple datagrams back with a single call.
A couple things:
1) As a general rule if you need reliability, you are best off just using TCP. A competitive and perhaps even superior solution on top of UDP is possible, but it is extremely difficult to get right and have it perform properly. The main thing people implementing reliability on top of UDP don't bother with is proper flow control. You must have flow control if you intend to send large amounts of data and want it to gracefully take advantage of the bandwidth that is available at the moment (which changes continuously with route conditions). In practice, implementing anything other than essentially the same algorithm TCP uses is likely to be unfriendly to other protocols on the network as well. It's unlikely you will do a better job at implementing that algorithm than TCP does.
2) As for running TCP and UDP in parallel, it is not as huge of a concern these days as others have noted. At one time I heard that overloaded routers along the way were bias dropping UDP packets before TCP packets, which makes sense in some ways, since a dropped TCP packet will just be resent anyways, and a lost UDP packet often isn't. That said, I am skeptical that this actually happens. In particular, dropping a TCP packet will cause the sender to throttle back, so it may make more sense to drop the TCP packet.
The one case where TCP may interfere with UDP is that TCP by nature of it's algorithm is continuously trying to go faster and faster, unless it reaches a point where it loses packets, then it throttles back and repeats the process. As the TCP connection continuously bumps against that bandwidth ceiling, it is just as likely to cause UDP loss as TCP loss, which in theory would appear as if the TCP traffic was sporadically causing UDP loss.
However, this is a problem you will run into even if you put your own reliable mechanism on top of UDP (assuming you do flow control properly). If you wanted to avoid this condition, you could intentionally throttle the reliable data at the application layer. Typically in a game the reliable data rate is limited to the rate at which the client or server actually needs to send reliable data, which is often well below the bandwidth capabilities of the pipe, and thus the interference never occurs, regardless of whether it is TCP or UDP-reliable based.
Where things get a bit more difficult is if you are making a streaming asset game. For a game like FreeRealms which does this, the assets are downloaded from a CDN via HTTP/TCP and it will attempt to use all available bandwidth, which will increase packetloss on the main game channel (which is typically UDP). I have generally found the interference low enough that I don't think you should be worrying about it too much.
3) As for IOCP, my experience with them is very limited, but having done extensive game networking in the past, I am skeptical that they add value in the case of UDP. Typically the server will have a single UDP socket that is handling all incoming data. With hundreds of users connected, the rate at which the data is coming into the server is very high. Having a background thread doing a blocking call on the socket as others have suggested and then quickly moving the data into a queue for the main application thread to pick up is a reasonable solution, but somewhat unnecessary, since in practice the data is coming in so fast when under load that there is not much point in ever sleeping the thread when it blocks.
Let me put this another way, if the blocking socket call polled a single packet and then put the thread to sleep until the next packet came in, it would be context-switching to that thread thousands of times per second when the data rate got high. Either that, or by the time the unblocked thread executed and cleared the data, there would already be additional data ready to be processed as well. Instead, I prefer to put the socket in non-blocking mode and then have a background thread spin at around 100fps processing it (sleeping between polls as needed to achieve the frame rate). In this manner, the socket buffer will build up incoming packets for 10ms and then the background thread will wake up once and process all that data in bulk, then go back to sleep, thus preventing gratuitous context switches. I then have that same background thread do other send-related processing when it wakes up as well. Being entirely event-driven loses many of it's benefits when the data volume gets the least bit high.
In the case of TCP, the story is quite different, since you need an efficient mechanism to figure out which of hundreds of connects the incoming data is coming from and polling them all is very slow, even on a periodic basis.
So, in the case of UDP with a home-grown UDP-reliable mechanism on top of it, I typically have a background thread playing the same role that the OS plays... whereas the OS gets the data from the network card then distributes it to various logical TCP connections internally for processing, my background thread gets the data from the solitary UDP socket (via periodic polling) and distributes it to my own internal logical connection objects for processing. Those internal logical connections then put the application-level packet data into a thread-safe master-queue flagged with the logical connection they came from. The main application thread then processes that master-queue in, routing the packets directly to the game-level objects associated with that connection. From the main application threads point of view, it simply has an event driven queue it is processing.
The bottom line is that given that the poll call to the solitary UDP socket rarely comes up empty, it is difficult to imagine there is going to be a more efficient way to solve this problem. The only thing you lose with this method is you wait up to 10ms to wake up when in theory you could be waking up the instant the data first arrived, but that is only meaningful if you were under extremely light load anyways. Plus, the main application thread isn't going to be making use of the data until it's next frame cycle anyways, so the difference is moot, and I think the overall system performance is enhanced by this technique.
I wouldn't hold a game as old as PlanetSide up as a paragon of modern network implementation. Especially not having seen the insides of their networking library. :)
Different types of communication require different methodologies. One of the answers above talks around the differences between frame/position updates and chat messages, without recognizing that using the same transport for both is probably silly. You should most definitely use a connected TCP socket between your chat implementation and the chat server, for text-style chat. Don't argue, just do it.
So, for your game client doing updates via arriving UDP packets, the most efficient path from the network adapter through the kernel and into your application is (most likely) going to be a blocking recv. Create a thread that rips packets off the network, verifies their validity (chksum match, sequence number increasing, whatever other checks you have), de-serializes the data into an internal object, then queue the object on an internal queue to the application thread that handles those sorts of updates.
But don't take my word for it: test it! Write a small program that can receive and deserialize 3 or 4 kinds of packets, using a blocking thread and a queue to deliver the objects, then re-write it using a single thread and IOCPs, with the deserialization and queueing in the completion routine. Pound enough packets through it to get the run time up in the minute range, and test which one is fastest. Make sure something (i.e. some thread) in your test app is consuming the objects off the queue so you get a full picture of the relative performance.
Post back here when you have the two test programs done, and let us know which worked out best, mm'kay? Which was fastest, which would you rather maintain in the future, which took the longest to get it working, etc.
If you want to support many simultaneous connections, you need to use an event-driven networking approach. I know of two good libraries: libev (used by nodeJS) and libevent. They are very portable and easy to use. I have successfully used libevent in an application supporting hundreds of parallel TCP/UDP(DNS) connections.
I believe using event-driven network i/o is not premature optimization in a server - it should be the default design pattern. If you want to do a quick prototype implementation it may be better to start in a higher level language. For JavaScript there is nodeJS and for Python there is Twisted. Both I can personally recommend.
How about NodeJS
It supports UDP and it is highly scalable.

Limitations on sending through UDP sockets

I have a big 1GB file, which I am trying to send to another node. After the sender sends 200 packets (before sending the complete file) the code jumps out. Saying "Sendto no send space available". What can be the problem and how to take care of it.
Apart from this, we need maximum throughput in this transfer. So what send buffer size we should use to be efficient?
What is the maximum MTU which we can use to transfer the file without fragmentation?
Thanks
Ritu
Thank you for the answers. Actually, our project specifies to use UDP and then some additional code to take care of lost packets.
Now I am able to send the complete file, using blocking UDP sockets.
I am running the whole setup on an emulab like environment, called deter. I have set link loss to 0 but still my some packets are getting lost. What could be the possible reason behind that? Even if I add delay (assuming receiver drops the packet when its buffer is full) after sending every packet..still this packet losts persists.
It's possible to use UDP for high speed data transfer, but you have to make sure not to send() the data out faster than your network card can pump it onto the wire. In practice that means either using blocking I/O, or blocking on select() and only sending the next packet when select() indicates that the socket is ready-for-write. (ideally you'd also not send the data faster than the receiving machine can receive it, but that's less of an issue these days since modern CPU speeds are generally much faster than modern network I/O speeds)
Once you have that logic working properly, the size of your send-buffer isn't terribly important. (i.e. your send buffer will never be large enough to hold a 1GB file anyway, so making sure your program doesn't overflow the send buffer is the key issue whether the send buffer is large or small) The size of the receive-buffer on the receiver is important though... best to make that as large as possible, so the receiving computer won't drop packets if the receiving process gets held off of the CPU by another program.
Regarding MTU, if you want to avoid packet fragmentation (and assuming your packets are traveling over Ethernet), then you shouldn't place more than 1468 bytes into each UDP packet (or 1452 bytes if you're using IPv6). (Calculated by subtracting the size of the necessary IP and UDP headers from Ethernet's 1500-byte frame size)
Also agree with #jonfen. No UDP for high speed file transfer.
UDP incur less protocol overhead. However, at the maximum transfer rate, transmit errors are inevitable (such as packet loss). So one must incorporate TCP like error correction scheme. End result is lower than TCP performance.

What should i know about UDP programming?

I don't mean how to connect to a socket. What should I know about UDP programming?
Do I need to worry about bad data in my socket?
I should assume if I send 200bytes I may get 120 and 60 bytes separately?
Should I worry about another connection sending me bad data on the same port?
If data doesnt arrive typically how long may I (typically) not see data for (250ms? 1 second? 1.75sec?)
What do I really need to know?
"i should assume if i send 200bytes i
may get 120 and 60bytes separately?"
When you're sending UDP datagrams your read size will equal your write size. This is because UDP is a datagram protocol, vs TCP's stream protocol. However, you can only write data up to the size of the MTU before the packet could be fragmented or dropped by a router. For general internet use, the safe MTU is 576 bytes including headers.
"i should worry about another
connection sending me bad data on the
same port?"
You don't have a connection, you have a port. You will receive any data sent to that port, regardless of where it's from. It's up to you to determine if it's from the right address.
If data doesnt arrive typically how
long may i (typically) not see data
for (250ms? 1 second? 1.75sec?)
Data can be lost forever, data can be delayed, and data can arrive out of order. If any of those things bother you, use TCP. Writing a reliable protocol on top of UDP is a very non trivial task and there is no reason to do so for almost all applications.
Should I worry about another
connection sending me bad data on the
same port?
Yes you should worry about it. Any application can send data to your open UDP port at any time. One of the big uses of UDP is many to one style communications where you multiplex communications with several peers on a single port using the addressed passed back during the recvfrom to differentiate between peers.
However, if you want to avoid this and only accept packets from a single peer you can actually call connect on your UDP socket. This cause the IP stack to reject packets coming from any host:port combo ( socket ) other than the one you want to talk to.
A second advantage of calling connect on your UDP socket is that in many OS's it gives a significant speed / latency improvement. When you call sendto on an unconnected UDP socket the OS actually temporarily connects the socket, sends your data and then disconnects the socket adding significant overhead.
A third advantage of using connected UDP sockets is it allows you to receive ICMP error messages back to your application, such as routing or host unknown due to a crash. If the UDP socket isn't connected the OS won't know where to deliver ICMP error messages from the network to and will silently discard them, potentially leading to your app hanging while waiting for a response from a crashed host ( or waiting for your select to time out ).
Your packet may not get there.
Your packet may get there twice or even more often.
Your packets may not be in order.
You have a size limitation on your packets imposed by the underlying network layers. The packet size may be quite small (possibly 576 bytes).
None of this says "don't use UDP". However you should be aware of all the above and think about what recovery options you may want to take.
Fragmentation and reassembly happens at the IP level, so you need not worry about that (Wikipedia). (This means that you won't receive split or truncated packets).
UDP packets have a checksum for the data and the header, so receiving bogus data is unlikely, but possible. Lost or duplicate packets are also possible. You should check your data in any case anyway.
There's no congestion control, so you may wish to consider that, if you plan on clogging the tubes with a lot of UDP packets.
UDP is a connectionless protocol. Sending data over UDP can get to the receiver, but can also get lost during transmission. UDP is ideal for things like broadcasting and streaming audio or video (i.e. a dropped packet is never a problem in those situations.) So if you need to ensure your data gets to the other side, stick with TCP.
UDP has less overhead than TCP and is therefore faster. (TCP needs to build a connection first and also checks data packets for data corruption which takes time.)
Fragmented UDP packets (i.e. packets bigger than about half a Kb) will probably be dropped by routers, so split your data into small chuncks before sending it over. (In some cases, the OS can take care of that.) Note that it is allways a packet that might make it, or not. Half packets aren't processed.
Latency over long distances can be quite big. If you want to do retransmission of data, I would go with something like 5 to 10 times the agerage latency time over the current connection. (You can measure the latency by sending and receiving a few packets.)
Hope this helps.
I won't follow suit with the other people who answered this, they all seem to push you toward TCP, and that's not for gaming at all, except maybe for login/chat info. Let's go in order:
Do I need to worry about bad data in my socket?
Yes. Even though UDP contains an extremely simple checksum for routers and such, it is not 100% efficient. You can add your own checksum device, but most of the time UDP is used when reliability is already not an issue, so data that doesn't conform should just be dropped.
I should assume if I send 200bytes I may get 120 and 60 bytes separately?
No, UDP is direct data write and read. However, if the data is too large, some routers will truncate and you lose part of the data permanently. Some have said roughly 576 bytes with header, I personally wouldn't use more than 256 bytes (nice round log2 number).
Should I worry about another connection sending me bad data on the same port?
UDP listens for any data from any computer on a port, so on this sense yes. Also note that UDP is a primitive and a raw format can be used to fake the sender, so you should use some sort of "key" in order for the listener to verify the sender against their IP.
If data doesnt arrive typically how long may I (typically) not see data for (250ms? 1 second? 1.75sec?)
Data sent on UDP is usually disposable, so if you don't receive data, then it can easily be ignored...however, sometimes you want "semi-reliable" but you don't want 'ordered reliable' like TCP uses, 1 second is a good estimate of a drop. You can number your packets on a rotation and write your own ACK communication. When a packet is received, it records the number and sends back a bitfield letting the sender know which packets it received. You can read this unfinished document for more information (although unfinished, it still yields valiable info):
http://gafferongames.com/networking-for-game-programmers/
The big thing to know when attempting to use UDP is:
Your packets might not all make it over the line, which means there is going to be possible data corruption.
If you're working on an application where 100% of the data needs to arrive reliably to provide functionality, use TCP. If you're working on an application where some loss is allowable (streaming media, etc.) then go for UDP but don't expect everything to get from one of the pipe to the other intact.
One way to look at the difference between applications appropriate for UDP vs. TCP is that TCP is good when data delivery is "better late than never", UDP is good when data delivery is "better never than late".
Another aspect is that the stateless, best-effort nature of most UDP-based applications can make scalability a bit easier to achieve. Also note that UDP can be multicast while TCP can't.
In addition to don.neufeld's recommendation to use TCP.
For most applications TCP is easier to implement. If you need to maintain packet boundaries in a TCP stream, a good way is to transmit a two byte header before the data to delimit the messages. The header should contain the message length. At the receiving end just read two bytes and evaluate the value. Then just wait until you have received that many bytes. You then have a complete message and are ready to receive the next 2-byte header.
This gives you some of the benefit of UDP without the hassle of lost data, out-of-order packet arrival etc.
And don't assume that if you send a packet it got there.
If there is a packet size limitation imposed by some router along the way, your UDP packets could be silently truncated to that size.
Two things:
1) You may or may not received what was sent
2) Whatever you receive may not be in the same order it was sent.