Data Transfer Protocol Design - c++

I am writing a protocol to transfer gigabytes of data over a network using TCP, to try to teach myself a little bit about programming on protocols. I am unsure of how to design this transfer protocol, in order to transfer the data in the fastest and most efficient way.
I am using Qt on windows.
At the moment, my design of my application protocol (the part to transfer the data) is as follows:
First shoot the login details.
Write the first data packet (into the socket) of 4 kilobytes, and then wait for the server to confirm it has got the packet.
When the server confirms receiving the data packet (by writing int "1"), write the next 4 kilobytes.
When all data has been transferred, send the md5sum of the data transferred to the server.
If the server confirms again with an int 8, data transfer completes.
At the moment, I am not able to get speeds higher than 166KB/sec on the same computer when transferring over 127.0.0.1. I have been trying to read other protocol designs, but there is hardly any documentation on data transfer protocols that one can write for their application.
Is the protocol design that I've posted wrong or suffering from some serious issues?
Should the protocol wait for each packet to be confirmed by the server or should I write it continuously?

First, I would recommend spending some time on reading about TCP, and about Sliding Window Protocol.
I think there are 2 reasons why your implementation is so slow: first, you wait for acknowledgement of each packet - very slow, you should use sliding window.
Second, you use MD5 checksumming. There is nothing wrong with that, but TCP already implements some basic checksumming, and MD5 implementation you use can be very slow.
And finally, typical way to find out why something works very slow is to use profiling.

Related

UDP transfer is too fast, Apache Mina doesn't handle it

We decided to use UDP to send a lot of data like coordinates between:
client [C++] (using poll)
server [JAVA] [Apache MINA]
My datagrams are only 512 Bytes max to avoid as possible the fragmentation during the transfer.
Each datagram has a header I added (with an ID inside), so that I can monitor :
how many datagrams are received
which ones are received
The problem is that we are sending the datagrams too fast. We receive like the first ones and then have a big loss, and then get some, and big loss again. The sequence of ID datagram received is something like [1], [2], [250], [251].....
The problem is happening in local too (using localhost, 1 network card only)
I do not care about losing datagrams, but here it is not about simple loss due to network (which I can deal with)
So my questions here are:
On client, how can I get the best :
settings, or socket settings?
way to send as much as I can without being to much?
On Server, Apache MINA seems to say that it manage itself the ~"size of the buffer socket"~ but is there still some settings to care about?
Is it possible to reach something like 1MB/s knowing that our connection already allow us to have at least this bandwidth when downloading regular files?
Nowadays, when we want to transfer a ~4KB coordinates info, we have to add sleep time so that we are waiting 5 minutes or more to get it to finish, it's a big issue for us knowing that we should send every minute at least 10MB coordinates informations.
If you want reliable transport, you should use TCP. This will let you send almost as fast as the slower of the network and the client, with no losses.
If you want a highly optimized low-latency transport, which does not need to be reliable, you need UDP. This will let you send exactly as fast as the network can handle, but you can also send faster, or faster than the client can read, and then you'll lose packets.
If you want reliable highly optimized low-latency transport with fine-grained control, you're going to end up implementing a custom subset of TCP on top of UDP. It doesn't sound like you could or should do this.
... how can I get the best settings, or socket settings
Typically by experimentation.
If the reason you're losing packets is because the client is slow, you need to make the client faster. Larger receive buffers only buy a fixed amount of headroom (say to soak up bursts), but if you're systematically slower any sanely-sized buffer will fill up eventually.
Note however that this only cures excessive or avoidable drops. The various network stack layers (even without leaving a single box) are allowed to drop packets even if your client can keep up, so you still can't treat it as reliable without custom retransmit logic (and we're back to implementing TCP).
... way to send as much as I can without being to much?
You need some kind of ack/nack/back-pressure/throttling/congestion/whatever message from the receiver back to the source. This is exactly the kind of thing TCP gives you for free, and which is relatively tricky to implement well yourself.
Is it possible to reach something like 1MB/s ...
I just saw 8MB/s using scp over loopback, so I would say yes. That uses TCP and apparently chose AES128 to encrypt and decrypt the file on the fly - it should be trivial to get equivalent performance if you're just sending plaintext.
UDP is only a viable choice when any number of datagrams can be lost without sacrificing QoS. I am not familiar with Apache MINA, but the scenario described resembles the server which handles every datagram sequentially. In this case all datagrams arrived while the one is serviced will be lost - there is no queuing of UDP datagrams. Like I said, I do not know if MINA can be tuned for parallel datagram processing, but if it can't, it is simply wrong choice of tools.

Reliable UDP Algorithm?

I'm working on reliable UDP networking and I have to know something. I think UDP reliable algorithm works like that (IDK, I guess);
Server send: (header:6)abcdef
Client receive: (header:6)abdf, sends back "I got 4 data, they are abdf"
Server send: (header:2)ce
Client receive: (header:2)ce, OK I'm going to combine them!
Now is this true way to reliable UDP?
EDIT (after answer, maybe this can be helpful for someone): I'm goint to use TCP because reliable UDP is not a good way to handle my operations. I'll send position like un-important, temporal variables. Maybe if I create a algorithm for reliable UDP, this reliable process will took 3-4 UDP send-recv that means I can send 3-4 other unreliable position data at this time and I'm sending small datas which is can be more efficiency than reliable UDP.
The "true way" to get reliable UDP is to use TCP.
If you still want to do it over UDP, you can verify the integrity of the message by sending a checksum with the message, and then recalculating the checksum at the other end to see if it matches the checksum you sent.
If it doesn't match, request the packet again. Note that this is essentially reinventing TCP.
Well, even with:
- Client receive: (header:6)abdf, sends back "I got 4 data, they are abdf"
- Server send: (header:2)ce
what if server will not receive your response (which may happen in UDP)? So switching to TCP is much better option, if you're not concerned about speed of connection.
Your problem sounds like it's tailor-made for the Data Distribution Service.
I'll send position like un-important, temporal variables
In fact, location ordinates are the popular examples for many of its vendors. RTI has a demonstration which goes well with your use case.
Yeah, a lot of folks groan when they hear "IDL" but I'd recommend that you give it a fair shake. DDS is unlike very many popular pub-sub/distribution/etc protocols in that it's not a simple encapsulation/pipeline.
I think the really cool thing is that often a lot of logic and design elements go into the problem of "how do I react the underlying network or my peer(s) misbehave(s)?" DDS offers a quality of service negotiation and hooks for your code to react when the QoS terms aren't met.
I would recommend against taking this decision lightly, it's a good deal more complex than TCP, UDP, AMQP, etc. But if you can afford the complexity and can amortize it over a large enough system -- it can pay real dividends.
In the end, DDS does deliver "reliable" messages over UDP. It's designed to support many different transports, and many different dimensions of QoS. It's truly dizzying when you see all the different dimensions of QoS that are considered by this service.

Sending large chunks of data over Boost TCP?

I have to send mesh data via TCP from one computer to another... These meshes can be rather large. I'm having a tough time thinking about what the best way to send them over TCP will be as I don't know much about network programming.
Here is my basic class structure that I need to fit into buffers to be sent via TCP:
class PrimitiveCollection
{
std::vector<Primitive*> primitives;
};
class Primitive
{
PRIMTYPES primType; // PRIMTYPES is just an enum with values for fan, strip, etc...
unsigned int numVertices;
std::vector<Vertex*> vertices;
};
class Vertex
{
float X;
float Y;
float Z;
float XNormal;
float ZNormal;
};
I'm using the Boost library and their TCP stuff... it is fairly easy to use. You can just fill a buffer and send it off via TCP.
However, of course this buffer can only be so big and I could have up to 2 megabytes of data to send.
So what would be the best way to get the above class structure into the buffers needed and sent over the network? I would need to deserialize on the recieving end also.
Any guidance in this would be much appreciated.
EDIT: I realize after reading this again that this really is a more general problem that is not specific to Boost... Its more of a problem of chunking the data and sending it. However I'm still interested to see if Boost has anything that can abstract this away somewhat.
Have you tried it with Boost's TCP? I don't see why 2MB would be an issue to transfer. I'm assuming we're talking about a LAN running at 100mbps or 1gbps, a computer with plenty of RAM, and don't have to have > 20ms response times? If your goal is to just get all 2MB from one computer to another, just send it, TCP will handle chunking it up for you.
I have a TCP latency checking tool that I wrote with Boost, that tries to send buffers of various sizes, I routinely check up to 20MB and those seem to get through without problems.
I guess what I'm trying to say is don't spend your time developing a solution unless you know you have a problem :-)
--------- Solution Implementation --------
Now that I've had a few minutes on my hands, I went through and made a quick implementation of what you were talking about: https://github.com/teeks99/data-chunker There are three big parts:
The serializer/deserializer, boost has its own, but its not much better than rolling your own, so I did.
Sender - Connects to the receiver over TCP and sends the data
Receiver - Waits for connections from the sender and unpacks the data it receives.
I've included the .exe(s) in the zip, run Sender.exe/Receiver.exe --help to see the options, or just look at main.
More detailed explanation:
Open two command prompts, and go to DataChunker\Debug in both of them.
Run Receiver.exe in one of the
Run Sender.exe in the other one (possible on a different computer, in which case add --remote-host=IP.ADD.RE.SS after the executable name, if you want to try sending more than once and --num-sends=10 to send ten times).
Looking at the code, you can see what's going on, creating the receiver and sender ends of the TCP socket in the respecitve main() functions. The sender creates a new PrimitiveCollection and fills it in with some example data, then serializes and sends it...the receiver deserializes the data into a new PrimitiveCollection, at which point the primitive collection could be used by someone else, but I just wrote to the console that it was done.
Edit: Moved the example to github.
Without anything fancy, from what I remember in my network class:
Send a message to the receiver asking what size data chunks it can handle
Take a minimum of that and your own sending capabilities, then reply saying:
What size you'll be sending, how many you'll be sending
After you get that, just send each chunk. You'll want to wait for an "Ok" reply, so you know you're not wasting time sending to a client that's not there. This is also a good time for the client to send a "I'm canceling" message instead of "Ok".
Send until all packets have been replied with an "Ok"
The data is transfered.
This works because TCP guarantees in-order delivery. UDP would require packet numbers (for ordering).
Compression is the same, except you're sending compressed data. (Data is data, it all depends on how you interpret it). Just make sure you communicate how the data is compressed :)
As for examples, all I could dig up was this page and this old question. I think what you're doing would work well in tandem with Boost.Serialization.
I would like to add one more point to consider - setting TCP socket buffer size in order to increase socket performance to some extent.
There is an utility Iperf that let test speed of exchange over the TCP socket. I ran on Windows a few tests in a 100 Mbs LAN. With the 8Kb default TCP window size the speed is 89 Mbits/sec and with 64Kb TCP window size the speed is 94 Mbits/sec.
In addition to how to chunk and deliver the data, another issue you should consider is platform differences. If the two computers are the same architecture, and the code running on both sides is the same version of the same compiler, then you should, probably, be able to just dump the raw memory structure across the network and have it work on the other side. If everything isn't the same, though, you can run into problems with endianness, structure padding, field alignment, etc.
In general, it's good to define a network format for the data separately from your in-memory representation. That format can be binary, in which case numeric values should be converted to standard forms (mainly, changing endianness to "network order", which is big-endian), or it can be textual. Many network protocols opt for text because it eliminates a lot of formatting issues and because it makes debugging easier. Personally, I really like JSON. It's not too verbose, there are good libraries available for every programming language, and it's really easy for humans to read and understand.
One of the key issues to consider when defining your network protocol is how the receiver knows when it has received all of the data. There are two basic approaches. First, you can send an explicit size at the beginning of the message, then the receiver knows to keep reading until it's gotten that many bytes. The other is to use some sort of an end-of-message delimiter. The latter has the advantage that you don't have to know in advance how many bytes you're sending, but the disadvantage that you have to figure out how to make sure the the end-of-message delimiter can't appear in the message.
Once you decide how the data should be structured as it's flowing across the network, then you should figure out a way to convert the internal representation to that format, ideally in a "streaming" way, so you can loop through your data structure, converting each piece of it to network format and writing it to the network socket.
On the receiving side, you just reverse the process, decoding the network format to the appropriate in-memory format.
My recommendation for your case is to use JSON. 2 MB is not a lot of data, so the overhead of generating and parsing won't be large, and you can easily represent your data structure directly in JSON. The resulting text will be self-delimiting, human-readable, easy to stream, and easy to parse back into memory on the destination side.

Optimizing Sockets in Symbian

I have a TCP connection opened between Symbian and a Server machine and I would like
to transfer huge chunks of data (around 32K) between these two endpoints. Unfortuantely,
the performance figures are pretty poor and I am looking for ideas how I could improve
my implementation. One of the things I tried was to increase the number of bytes that can be
buffered by the socket for sending & receiving to 64K.
iSocket.SetOpt(KSoTcpSendWinSize, KSolInetTcp, 0x10000);
iSocket.SetOpt(KSoTcpRecvWinSize, KSolInetTcp, 0x10000);
Are there any other things that could be optimized at a socket level for better throughput?
It is also possible, that my socket code does something stupid. It follows a simple request/response
protocol. I have to use the blocking WaitForRequest routine to be sure that the data has been send/received
so that I can process it then.
//store requestinfo in reqbuf and send it to server; wait for iStatus
iSocket.Send( reqbuff, 0, iStatus, len );
User::WaitForRequest(iStatus);
//store 32K file in resbuff; wait for iStatus to be sure that all data has
//been received
iSocket.Recv(resbuff, 0, iStatus, len);
User::WaitForRequest(iStatus);
//do something with the 32K received
Would be thankful for every comment!
You can send and receive in parallell if you use active objects. There should be example code in the SDK. Obviously it depends on the application and protocol used whether that will help.
I'm no TCP expert, but I think there are parameters on the socket that can cause your usage pattern (sending one large buffer, then receiveing a large buffer) to not use the network optimally compared to when sending approximately equal amounts of data in both directions.
All things about TCP sockets that can be configured in other OS:se should be possible to configure on Symbian as well, but first you need to figure out what. I suggest you ask another question that is TCP general and get some pointers. Then you can figure out how to set that up in Symbian.
Are you positive that the
//do something with the 32K received
doesn't take particularily long? Your app appears to be single-threaded so if this is holding up the line, well thats an obvious bottleneck.
Also what do you mean by poor performance? Have you compared the performance to other tcp apps?
Lastly, if performance is a big issue you can switch over to raw sockets/datagram sockets, and optimize your own validation protocol for your specific data.

How do I write data over an ethernet connection to an IP:Port address?

I have a need to write a quick and dirty application to write some data over an ethernet connection to a remote machine.
The remote machine is sitting waiting for data, and I just want to blat some data at it to test the connection and bandwidth etc.
I'd like to be able to, say, send a known pattern of data (simple counting or repeating pattern) over the connection and be able to increase the bandwidth by x2, x10, x100 etc.
No need for handshaking, CRC, specific data format, framing etc. just plain old data.
Please... no third party libraries, just C++ (or C, or python)
If you can use netcat:
echo "BEEFCAKE" | nc remote.host port
I can recommend Beej's Guide to Network Programming. Helped me to understand all that network mumbo-jumbo.
However, if you need something really quick, why not use .NET? That has pretty nice classes for doing things like this. You could write your data in 10 lines.
P.S. Don't get thrown off by the fact that this is written for *nix. Winsock has all exactly the same functions.
When you say "IP:Port" then you must mean you need something higher layer than just an ethernet frame. You should read up on TCP/UDP/IP programming. I think the best resource online for this is Beej's Guide.. This is targeted toward berkeley or windows sockets.
Python sockets tutorial here.
Or just use teh googles and search for "socket programming in [language]".
This tutorial seems to cover what you want:
http://www.amk.ca/python/howto/sockets/
You can use Sockets.
Here is some basic tutorial.
Sounds like you want to be able to saturate the link for testing purposes without regard to whether the receiver can accept all the data. So, you will not want to use TCP since it is acknowledged, and flow controlled to avoid overrunning the receiver.
Probably easiest to go with UDP, although you could also consider a raw socket if you really want to write Ethernet frames directly (i.e. if you need to send data without IP/UDP headers, just raw Ethernet frames. In this case you'll need the destination MAC address to put in the Ethernet frame so it goes to the right place).
Regarding framing and CRC: Ethernet hardware will frame the packets and generate/check CRCs.
Edit: It might help if you could state what you are trying to achieve. Is it to test the speed of the Ethernet links/switches, or to see how fast the sending and receiving CPUs can exchange data? (It's likely that you can send data faster over Ethernet than the receiving CPU can handle it, although that does depend on the speed of the Ethernet and CPUs, as well as what OS the CPU is running, how busy it is, how it's network stack is tuned, etc..).