What the receiver do if the message is segmented by TCP

What the receiver do if the message is segmented by TCP - c++

I raised this question when reading the source code of muduo (C++ network library).
If a client sends a big size message which will be segmented by TCP, what happens in server side? (Does server know this message is already segmented?)
And is it necessary for network library to wait for the whole message and do not interrupt the upper layer?

When dealing with a stream protocol like TCP, you already have to reassemble received data into chunks of your own choosing. That's either a fixed number of bytes per chunk, or it's decided dynamically by parsing the data in terms of your application's protocol (e.g. HTTP).
You don't know when you receive a packet from the network layer that it has been segmented: you only know that you received some data. You may know (because you understand your own protocol) that you're expecting more data to finish the chunk, but you won't know whether there is any more data until you receive it. If you do receive it.
Conversely, a single TCP packet may well contain more than a single chunk of your application-layer data! Again, you need to be aware that there is no direct relationship between the two things.
You can, however, depend on the TCP packets being delivered in the same order in which they were sent, which is nice.
Simple analogy: a big ol' ship, carrying cargo. It may be carrying 40 cars, or it may be carrying just half the quantity of parts required to construct an airplane. Or it may be carrying both! You don't know until you read the shipping manifest and consult your own records on delivery. It's then your responsibility to unpack what you've received and do what you need to do with it.
And is it necessary for network library to wait for the whole message and do not interrupt the upper layer?
If the library wants to pass a full "message" to the upper layer, then usually yes. Some approaches will just block waiting for a full message, but that's not common nowadays. Asynchronous I/O is your friend.
(This was a generic answer, written with no knowledge of what muduo does specifically.)

Related

Does Boost Asio networking send/recieve have any sort of data completeness guarantee?

I've been using boost asio sockets (UDP and TCP) to handle a custom protocol between my client server program. Its been working great until I discovered that on TCP async_send/async_recieve calls that data can arrived in combined chunks.
For example, if I make two send calls each with it's own packet, they can arrive combined at a single receive call. I wrongly assumed that every send corresponds to a receive, but I'm obviously wrong. It however has worked well for the longest time until I found the issue running the client for a different OS.
So my question is: are there any guarantees to the completeness of the data on arrival for every receive call? (e.g. async_send 128 bytes arrive in multiples of 128 bytes, or how it arrives must always be treated as random, like 1 bytes arrives then 127 bytes is possible)
More specifically, does this mean that:
Data can arrive concatenated or partial for every send call, and I
have to always handle the concatenated/partial data manually
Is this true for both UDP and TCP asio sockets?
I searched around and couldn't find any documentation on this so I was wondering if anyone have any idea.

First its important to understand that boost asio socket receive and sends methods just mean that they ordered the underlying network stack to receive or send data. By network stack this could be the windows socket API.
If you are sending data right to the same computer, via so called loopback addresses, the operating system (if there is any) can just "give" it to the listening i.e. receiving program. Thats the scenario where you would be most lucky to get things in order and always complete for all cases.
However if you want you are addressing another computer or because the operating system is in the mood, you will have different behaviour:
TCP was designed that you will get you data in the order you have send it. But the chunks or packet size if will be sent differs even on the same connection and is a key feature of TCP. Your OS or hardware network adapter might do some send or receive buffering too, before informing you. However things won't get lost.
So in short for TCP: You can make sure the data is complete by waiting for a certain point in your data async_read_until is just there for this case. Data from multiple send calls might be in one receive or many
UDP was designed to have a low latency in contrast to TCP, but without its ordering and completeness guarantees. So when you send a UDP datagram i.e. packet, usually the OS and network adapter will try to send it out ASAP. However on the way to the other computer, the internet might loose it, or hold one packet back until the one you send after the first, so that data you send later, could be received later, while you can also get the sent first, later, or might not. But when you receive a datagram it's complete in it self.
So in short for UDP: Data will arrive in datagram chunks, but some datagrams might be missing, or might arrive in another order than sent. The data from one send might be in one receive, might not, or later

So after some more testing here's what I concluded: the answer is no. Boost Asio sockets does not have magic that can enforce data completeness beyond what the TCP/UDP protocols enforces.
Edit:
So here's more of my research:
For TCP, it acts like a data stream. So packets may arrive partial or combined and is complete. So the user application need to handle deserialization of combined or partial data.
For UDP, because it is a datagram packet, if the packet arrives, it is guaranteed to be independent and complete. So there is no need to handle partial or combined packets.

reassembly of tcp packet

I'm parsing a file with lots of tcp packets which i need to parse. The problem is that they get segmented and i can't find any indication when and where they do so. No flags or anything else indicates, that the middle of current packet may contain the beginning of the next one. The protocol above tcp is FIX(used in online trading) but i'd like for my code to be able to work with any protocols(or at least understand which is protocol is it).
I'm writing code in C++ and can't use any additional libraries.
So, how do i figure out what is the protocol above tcp and where it gets segmented ?

You can't. TCP/IP is conceptually a stream, not a sequence of messages (the fact that it is ultimately implemented as a sequence of packets is irrelevant). When you write a sequence of bytes to a TCP/IP stream, that sequence is added to the stream; it is not treated as a message which should maintain its own identity. No notion of message begin/end is transmitted along with the stream, unless you do so yourself in your own protocol.
If you find this hard to believe, consider how it works for files: if you write a sequence of bytes to a file, that sequence does not somehow become a record that you can later identify and retrieve. If you want that kind of structure you have to add it yourself. The same is true for TCP/IP.
The transport packets used to implement TCP/IP have no relation to the data blocks you specify with your API calls; they are merely a way to implement the TCP/IP stream. For some use cases there may appear to be a mapping, but this is accidental.
The only way to split a TCP/IP stream back into separate messages is by using knowledge of the protocol running on top of TCP/IP. In your case this is FIX. I assume you know how that works; you can use that knowledge to correctly split the FIX data back into its original messages. A generic TCP/IP message splitter cannot be made.

As I can see your problem is to separate TCP packets. To solve it you can relay on length of payload (this answer) and checksum. If checksum is correct for data with specified length, than your packet is correct, if no - you need seek in thee previous part for start of the packet or drop this part of data. At least this approach will help you to find point where dada was segmented.
For more precise answer it will be better to see little part of data.
But main your problem is segmentation of packets. For better performance you should try to exclude this problem (maybe change network card to Intel).

Qt, tcp/ip communication checksum

I am writing a data display program where I receive the data through a serial port. The listener is written by others and it is quite complex. Now I need to transfer the received data to another program/pc. So I am thinking of the standard tcp communication from Qt.
Is there any class that come along with the tcp classes that does job like checksum?
If I am transmitting an array of 10 doubles each time but at high frequency. How could I write a client that received all the data correctly without writing those complex algorithms to check the validity of received data bytes?

TCP/IP includes these checks as part of the protocol itself. This includes guarantees for data integrity, as well as the correct re-assembly of data (i.e it will definitely be in the same order). You mentioned that TCP chops the datastream into packets; this is true, but it will re-assemble the packets in the correct order on the receiving end, or request a re-transmission if it needs to do so. All of this is taken care of by the Qt networking classes.

Sending large chunks of data over Boost TCP?

I have to send mesh data via TCP from one computer to another... These meshes can be rather large. I'm having a tough time thinking about what the best way to send them over TCP will be as I don't know much about network programming.
Here is my basic class structure that I need to fit into buffers to be sent via TCP:
class PrimitiveCollection
{
std::vector<Primitive*> primitives;
};
class Primitive
{
PRIMTYPES primType; // PRIMTYPES is just an enum with values for fan, strip, etc...
unsigned int numVertices;
std::vector<Vertex*> vertices;
};
class Vertex
{
float X;
float Y;
float Z;
float XNormal;
float ZNormal;
};
I'm using the Boost library and their TCP stuff... it is fairly easy to use. You can just fill a buffer and send it off via TCP.
However, of course this buffer can only be so big and I could have up to 2 megabytes of data to send.
So what would be the best way to get the above class structure into the buffers needed and sent over the network? I would need to deserialize on the recieving end also.
Any guidance in this would be much appreciated.
EDIT: I realize after reading this again that this really is a more general problem that is not specific to Boost... Its more of a problem of chunking the data and sending it. However I'm still interested to see if Boost has anything that can abstract this away somewhat.

Have you tried it with Boost's TCP? I don't see why 2MB would be an issue to transfer. I'm assuming we're talking about a LAN running at 100mbps or 1gbps, a computer with plenty of RAM, and don't have to have > 20ms response times? If your goal is to just get all 2MB from one computer to another, just send it, TCP will handle chunking it up for you.
I have a TCP latency checking tool that I wrote with Boost, that tries to send buffers of various sizes, I routinely check up to 20MB and those seem to get through without problems.
I guess what I'm trying to say is don't spend your time developing a solution unless you know you have a problem :-)
--------- Solution Implementation --------
Now that I've had a few minutes on my hands, I went through and made a quick implementation of what you were talking about: https://github.com/teeks99/data-chunker There are three big parts:
The serializer/deserializer, boost has its own, but its not much better than rolling your own, so I did.
Sender - Connects to the receiver over TCP and sends the data
Receiver - Waits for connections from the sender and unpacks the data it receives.
I've included the .exe(s) in the zip, run Sender.exe/Receiver.exe --help to see the options, or just look at main.
More detailed explanation:
Open two command prompts, and go to DataChunker\Debug in both of them.
Run Receiver.exe in one of the
Run Sender.exe in the other one (possible on a different computer, in which case add --remote-host=IP.ADD.RE.SS after the executable name, if you want to try sending more than once and --num-sends=10 to send ten times).
Looking at the code, you can see what's going on, creating the receiver and sender ends of the TCP socket in the respecitve main() functions. The sender creates a new PrimitiveCollection and fills it in with some example data, then serializes and sends it...the receiver deserializes the data into a new PrimitiveCollection, at which point the primitive collection could be used by someone else, but I just wrote to the console that it was done.
Edit: Moved the example to github.

Without anything fancy, from what I remember in my network class:
Send a message to the receiver asking what size data chunks it can handle
Take a minimum of that and your own sending capabilities, then reply saying:
What size you'll be sending, how many you'll be sending
After you get that, just send each chunk. You'll want to wait for an "Ok" reply, so you know you're not wasting time sending to a client that's not there. This is also a good time for the client to send a "I'm canceling" message instead of "Ok".
Send until all packets have been replied with an "Ok"
The data is transfered.
This works because TCP guarantees in-order delivery. UDP would require packet numbers (for ordering).
Compression is the same, except you're sending compressed data. (Data is data, it all depends on how you interpret it). Just make sure you communicate how the data is compressed :)
As for examples, all I could dig up was this page and this old question. I think what you're doing would work well in tandem with Boost.Serialization.

I would like to add one more point to consider - setting TCP socket buffer size in order to increase socket performance to some extent.
There is an utility Iperf that let test speed of exchange over the TCP socket. I ran on Windows a few tests in a 100 Mbs LAN. With the 8Kb default TCP window size the speed is 89 Mbits/sec and with 64Kb TCP window size the speed is 94 Mbits/sec.

In addition to how to chunk and deliver the data, another issue you should consider is platform differences. If the two computers are the same architecture, and the code running on both sides is the same version of the same compiler, then you should, probably, be able to just dump the raw memory structure across the network and have it work on the other side. If everything isn't the same, though, you can run into problems with endianness, structure padding, field alignment, etc.
In general, it's good to define a network format for the data separately from your in-memory representation. That format can be binary, in which case numeric values should be converted to standard forms (mainly, changing endianness to "network order", which is big-endian), or it can be textual. Many network protocols opt for text because it eliminates a lot of formatting issues and because it makes debugging easier. Personally, I really like JSON. It's not too verbose, there are good libraries available for every programming language, and it's really easy for humans to read and understand.
One of the key issues to consider when defining your network protocol is how the receiver knows when it has received all of the data. There are two basic approaches. First, you can send an explicit size at the beginning of the message, then the receiver knows to keep reading until it's gotten that many bytes. The other is to use some sort of an end-of-message delimiter. The latter has the advantage that you don't have to know in advance how many bytes you're sending, but the disadvantage that you have to figure out how to make sure the the end-of-message delimiter can't appear in the message.
Once you decide how the data should be structured as it's flowing across the network, then you should figure out a way to convert the internal representation to that format, ideally in a "streaming" way, so you can loop through your data structure, converting each piece of it to network format and writing it to the network socket.
On the receiving side, you just reverse the process, decoding the network format to the appropriate in-memory format.
My recommendation for your case is to use JSON. 2 MB is not a lot of data, so the overhead of generating and parsing won't be large, and you can easily represent your data structure directly in JSON. The resulting text will be self-delimiting, human-readable, easy to stream, and easy to parse back into memory on the destination side.

Count the number of packets sent to a server from a client?

So I'm almost done an assignment involving Win32 programming and sockets, but I have to generate and analyze some statistics about the transfers. The only part I'm having trouble with is how to figure out the number of packets that were sent to the server from the client.
The data sent can be variable-length, so I can't just divide the total bytes received by a #define'd value.
We have to use asynchronous calls to do everything, so I've been trying to increment a counter with every FD_READ message I get for the server's socket. However, because I have to be able to accept a potentially large file size, I have to call recv/recvfrom with a buffer size around 64k. If I send a small packet (a-z), there are no problems. But if I send a string of 1024 characters 10x, the server reports 2 or 3 packets received, but 0% data loss in terms of bytes sent/received.
Any idea how to get the number of packets?
Thanks in advance :)

This really boils down to what you mean by 'packet.'
As you are probably aware, when a TCP/UDP message is sent on the wire, the data being sent is 'wrapped,' or prepended, with a corresponding TCP/UDP header. This is then 'wrapped' in an IP header, which is in turn 'wrapped' in an Ethernet frame. You can see this breakout if you use a sniffing package like Wireshark.
The point is this. When I hear the term 'packet,' I think of data at the IP level. IP data is truly packetized on the wire, so packet counts make sense when talking about IP. However, if you're using regular sockets to send and receive your data, the IP headers, as well as the TCP/UDP headers, are stripped off, i.e., you don't get this information from the socket. And without that information, it is impossible to determine the number of 'packets' (again, I'm thinking IP) that were transmitted.
You could do what others are suggesting by adding your own header with a length and a counter. This information will help you accurately size your receive buffers, but it won't help you determine the number of packets (again, IP...), especially if you're doing TCP.
If you want to accurately determine the number of packets using Winsock sockets, I would suggest creating a 'raw' socket as suggested here. This socket will collect all IP traffic seen by your local NIC. Use the IP and TCP/UDP headers to filter the data based on your client and server sockets, i.e., IP addresses and port numbers. This will give an accurate picture of how many IP packets were actually used to transmit your data.

Not a direct answer to your question but rather a suggestion for a different solution.
What if you send a length-descriptor in front of the data you want to transfer? That way you can already allocate the correct buffer size (not too much, not too little) on the client and also check if there were any losses when the transfer is over.
With TCP you should have no problem at all because the protocol itself handles the error-free transmission or otherwise you should get a meaningful error.
Maybe with UDP you could also split up your transfer into fixed-size chunks with a propper sequence-id. You'd have to accumulate all incoming packages before you sort them (UDP makes no guarantee on the receive-order) and paste the data together.
On the other hand you should think about it if it is really necessary to support UDP as there is quite some manual overhead if you want to get that protocol error-safe... (see the Wikipedia Article on TCP for a list of the problems to get around)

Do your packets have a fixed header, or are you allowed to define your own. If you can define your own, include a packet counter in the header, along with the length. You'll have to keep a running total that accounts for rollover in your counter, but this will ensure you're counting packets sent, rather than packets received. For an simple assignment, you probably won't be encountering loss (with UDP, obviously) but if you were, a packet counter would make sure your statistics reflected the sent message accurately.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js