TCP fragmentation

TCP fragmentation - c++

I know that TCP provides stream-like data transmission, but the main question is - what situations can occur while sending data over TCP?
1. The message can be split to N chunks to fit in MTU size.
2. Two messages can be read in 1 recv call.
Can there be the next situation?
MTU for example 1500 bytes.
Client calls send with 1498 bytes data.
Client calls send with 100 bytes data.
Server calls recv and receives 1500 bytes data.
Server calls recv and receives 98 bytes data.
So it end up with situation when 2 bytes from second client send will be received in first server recv.
My protocol defined as foolows:
4 bytes - data length
data content.
I wonder can I came up with situation when 4 bytes (data length) will be split into 2 chunks?

Yes, a stream of bytes may be split on any byte boundary. You certainly can have your 4 byte data length header split in any of 8 different ways:
4
1-3
2-2
3-1
1-1-2
1-2-1
2-1-1
1-1-1-1
Some of these are more likely to occur than others, but you must account for them. Code that could handle this might look something like the following:
unsigned char buf[4];
size_t len = 0;
while (len < sizeof(buf)) {
ssize_t n = recv(s, buf+len, sizeof(buf)-len, 0);
if (n < 0) {
// error handling here
}
len += n;
}
length = buf[0] | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24);

I always write my applications in a manner that expects the data to become fragmented somehow. It's not hard to do once you come up with a good design.
What's the best way to monitor a socket for new data and then process that data?

Related

Prepending a message with the size of the message

I'm writing something server-client related, and I have this code snippet here:
char serverReceiveBuf[65536];
client->read(serverReceiveBuf, client->bytesAvailable());
handleConnection(serverReceiveBuf);
that reads data whenever a readyRead() signal is emitted by the server. Using bytesAvailable() is fine when I test on my local network since there's no latency, but when I deploy the program I want to make sure the entire message is received before I handleConnection().
I was thinking of ways to do this, but read and write only accept chars, so the maximum message size indicator I can send in one char is 127. I want the maximum size to be 65536, but the only way I can think of doing that is have a size-of-size-of-message variable first.
I reworked the code to look like this:
char serverReceiveBuf[65536];
char messageSizeBuffer[512];
int messageSize = 0, i = 0; //max value of messageSize = 65536
client->read(messageSizeBuffer,512);
while((int)messageSizeBuffer[i] != 0 || i <= 512){
messageSize += (int) messageSizeBuffer[i];
//client will always send 512 bytes for size of message size
//if message size < 512 bytes, rest of buffer will be 0
}
client->read(serverReceiveBuf, messageSize);
handleConnection(serverReceiveBuf);
but I'd like a more elegant solution if one exists.

It is a very common technique when sending messages over a stream to send a fixed-sized header before the message payload. This header can include many different pieces of information, but it always includes the payload size. In the simplest case, you can send the message size encoded as a uint16_t for a maximum payload size of 65535 (or uint32_t if that's not sufficient). Just make sure you handle byte ordering with ntohs and htons.
uint16_t messageSize;
client->read((char*)&messageSize, sizeof(uint16_t));
messageSize = ntohs(messageSize);
client->read(serverReceiveBuf, messageSize);
handleConnection(serverReceiveBuf);

read and write work with byte streams. It does not matter to them if the bytes are chars or any other form of data. You can send a 4-byte integer by casting its address to char* and sending 4 bytes. On the receiving end cast the 4 bytes back to an int. (If the machines are of different types you may also have endian issues, requiring the bytes to be rearranged into an int. See htonl and its cousins.)

Garbage values and Buffers differences in TCP

First question: I am confused between Buffers in TCP. I am trying to explain my proble, i read this documentation TCP Buffer, author said a lot about TCP Buffer, thats fine and a really good explanation for a beginner. What i need to know is this TCP Buffer is same buffer with the one we use in our basic client server program (Char *buffer[Some_Size]) or its some different buffer hold by TCP internally ?
My second question is that i am sending a string data with prefix length (This is data From me) from client over socket to server, when i print my data at console along with my string it prints some garbage value also like this "This is data From me zzzzzz 1/2 1/2....." ?. However i fixed it by right shifting char *recvbuf = new char[nlength>>3]; nlength to 3 bits but why i need to do it in this way ?
My third question is in relevance with first question if there is nothing like TCP Buffer and its only about the Char *buffer[some_size] then whats the difference my program will notice using such static memory allocation buffer and by using dynamic memory allocation buffer using char *recvbuf = new char[nlength];. In short which is best and why ?
Client Code
int bytesSent;
int bytesRecv = SOCKET_ERROR;
char sendbuf[200] = "This is data From me";
int nBytes = 200, nLeft, idx;
nLeft = nBytes;
idx = 0;
uint32_t varSize = strlen (sendbuf);
bytesSent = send(ConnectSocket,(char*)&varSize, 4, 0);
assert (bytesSent == sizeof (uint32_t));
std::cout<<"length information is in:"<<bytesSent<<"bytes"<<std::endl;
// code to make sure all data has been sent
while (nLeft > 0)
{
bytesSent = send(ConnectSocket, &sendbuf[idx], nLeft, 0);
if (bytesSent == SOCKET_ERROR)
{
std::cerr<<"send() error: " << WSAGetLastError() <<std::endl;
break;
}
nLeft -= bytesSent;
idx += bytesSent;
}
std::cout<<"Client: Bytes sent:"<< bytesSent;
Server code:
int bytesSent;
char sendbuf[200] = "This string is a test data from server";
int bytesRecv;
int idx = 0;
uint32_t nlength;
int length_received = recv(m_socket,(char*)&nlength, 4, 0);//Data length info
char *recvbuf = new char[nlength];//dynamic memory allocation based on data length info
//code to make sure all data has been received
while (nlength > 0)
{
bytesRecv = recv(m_socket, &recvbuf[idx], nlength, 0);
if (bytesRecv == SOCKET_ERROR)
{
std::cerr<<"recv() error: " << WSAGetLastError() <<std::endl;
break;
}
idx += bytesRecv;
nlength -= bytesRecv;
}
cout<<"Server: Received complete data is:"<< recvbuf<<std::endl;
cout<<"Server: Received bytes are"<<bytesRecv<<std::endl;
WSACleanup();
system("pause");
delete[] recvbuf;
return 0;
}

You send 200 bytes from the client, unconditionally, but in the server you only receive the actual length of the string, and that length does not include the string terminator.
So first of all you don't receive all data that was sent (which means you will fill up the system buffers), and then you don't terminate the string properly (which leads to "garbage" output when trying to print the string).
To fix this, in the client only send the actual length of the string (the value of varSize), and in the receiving server allocate one more character for the terminator, which you of course needs to add.

First question: I am confused between Buffers in TCP. I am trying to
explain my proble, i read this documentation TCP Buffer, author said a
lot about TCP Buffer, thats fine and a really good explanation for a
beginner. What i need to know is this TCP Buffer is same buffer with
the one we use in our basic client server program (Char
*buffer[Some_Size]) or its some different buffer hold by TCP internally ?
When you call send(), the TCP stack will copy some of the bytes out of your char array into an in-kernel buffer, and send() will return the number of bytes that it copied. The TCP stack will then handle the transmission of those in-kernel bytes to its destination across the network as quickly as it can. It's important to note that send()'s return value is not guaranteed to be the same as the number of bytes you specified in the length argument you passed to it; it could be less. It's also important to note that sends()'s return value does not imply that that many bytes have arrived at the receiving program; rather it only indicates the number of bytes that the kernel has accepted from you and will try to deliver.
Likewise, recv() merely copies some bytes from an in-kernel buffer to the array you specify, and then drops them from the in-kernel buffer. Again, the number of bytes copied may be less than the number you asked for, and generally will be different from the number of bytes passed by the sender on any particular call of send(). (E.g if the sender called send() and his send() returned 1000, that might result in you calling recv() twice and having recv() return 500 each time, or recv() might return 250 four times, or (1, 990, 9), or any other combination you can think of that eventually adds up to 1000)
My second question is that i am sending a string data with prefix
length (This is data From me) from client over socket to server, when
i print my data at console along with my string it prints some garbage
value also like this "This is data From me zzzzzz 1/2 1/2....." ?.
However i fixed it by right shifting char *recvbuf = new
char[nlength>>3]; nlength to 3 bits but why i need to it in this way ?
Like Joachim said, this happens because C strings depend on the presence of a NUL-terminator byte (i.e. a zero byte) to indicate their end. You are receiving strlen(sendbuf) bytes, and the value returned by strlen() does not include the NUL byte. When the receiver's string-printing routine tries to print the string, it keeps printing until if finds a NUL byte (by chance) somewhere later on in memory; in the meantime, you get to see all the random bytes that are in memory before that point. To fix the problem, either increase your sent-bytes counter to (strlen(sendbuf)+1), so that the NUL terminator byte gets received as well, or alternatively have your receiver manually place the NUL byte at the end of the string after it has received all of the bytes of the string. Either way is acceptable (the latter way might be slightly preferable as that way the receiver isn't depending on the sender to do the right thing).
Note that if your sender is going to always send 200 bytes rather than just the number of bytes in the string, then your receiver will need to always receive 200 bytes if it wants to receive more than one block; otherwise when it tries to receive the next block it will first get all the extra bytes (after the string) before it gets the next block's send-length field.
My third question is in relevance with first question if there is
nothing like TCP Buffer and its only about the Char *buffer[some_size]
then whats the difference my program will notice using such static
memory allocation buffer and by using dynamic memory allocation buffer
using char *recvbuf = new char[nlength];. In short which is best and
why ?
In terms of performance, it makes no difference at all. send() and receive() don't care a bit whether the pointers you pass to them point at the heap or the stack.
In terms of design, there are some tradeoffs: if you use new, there is a chance that you can leak memory if you don't always call delete[] when you're done with the buffer. (This can particularly happen when exceptions are thrown, or when error paths are taken). Placing the buffer on the stack, on the other hand, is guaranteed not to leak memory, but the amount of space available on the stack is finite so a really huge array could cause your program to run out of stack space and crash. In this case, a single 200-byte array on the stack is no problem, so that's what I would use.

recv the first few bytes from a socket to determine buffer size

I'm writing a distributed system in c++ using TCP/IP and sockets.
For each of my messages, I need to receive the first 5 bytes to know the full length of the incoming message.
What's the best way to do this?
recv() only 5 bytes, then recv() again. if I choose this, would it be safe to assume I'll get 0 or 5 bytes in the recv (aka not write a loop to keep trying)?
use MSG_PEEK
recv() some larger buffer size, then read the first 5 bytes and allocate the final buffer then.

You don't need to know anything. TCP is a stream protocol, and at any given moment you can get as little as one byte, or as much as multiple megabytes of data. The correct and only way to use a TCP socket is to read in a loop.
char buf[4096]; // or whatever
std::deque<char> data;
for (int res ; ; )
{
res = recv(fd, buf, sizeof buf, MSG_DONTWAIT);
if (res == -1)
{
if (errno == EAGAIN || errno == EWOULDBLOCK)
{
break; // done reading
}
else
{
// error, break, die
}
}
if (res == 0)
{
// socket closed, finalise, break
}
else
{
data.insert(data.end(), buf, buf + res);
}
}
The only purpose of the loop is to transfer data from the socket buffer to your application. Your application must then decide independently if there's enough data in the queue to attempt extraction of some sort of higher-level application message.
For example, in your case you would check if the queue's size is at least 5, then inspect the first five bytes, and then check if the queue holds a complete application message. If no, you abort, and if yes, you extract the entire message and pop if off from the front of the queue.

Use a state machine with two states:
First State.
Receive bytes as they arrive into a buffer. When there are 5 or more bytes perform your check on those first 5 and possibly process the rest of the buffer. Switch to the second state.
Second State.
Receive and process bytes as they arrive to the end of the message.

to answer your question specifically:
it's not safe to assume you'll get 0 or 5. it is possible to get 1-4 as well. loop until you get 5 or an error as others have suggested.
i wouldn't bother with PEEK, most of the time you'll block (assuming blocking calls) or get 5 so skip the extra call into the stack.
this is fine too but adds complexity for little gain.

Partial receipt of packets from socket C++

I have a trouble, my server application sends packet 8 bytes length - AABBCC1122334455 but my application receives this packet in two parts AABBCC1122 and 334455, via "recv" function, how can i fix that?
Thanks!

To sum up a liitle bit:
TCP connection doesn't operate with packets or messages on the application level, you're dealing with stream of bytes. From this point of view it's similar to writing and reading from a file.
Both send and recv can send and receive less data than provided in the argument. You have to deal with it correctly (usually by applying proper loop around the call).
As you're dealing with streams, you have to find the way to convert it to meaningful data in your application. In other words, you have to design serialisation protocol.
From what you've already mentioned, you most probably want to send some kind of messages (well, it's usually what people do). The key thing is to discover the boundaries of messages properly. If your messages are of fixed size, you simply grab the same amount of data from the stream and translate it to your message; otherwise, you need a different approach:
If you can come up with a character which cannot exist in your message, it could be your delimiter. You can then read the stream until you reach the character and it'll be your message. If you transfer ASCII characters (strings) you can use zero as a separator.
If you transfer binary data (raw integers etc.), all characters can appear in your message, so nothing can act as a delimiter. Probably the most common approach in this case is to use fixed-size prefix containing size of your message. Size of this extra field depends on the max size of your message (you will be probably safe with 4 bytes, but if you know what is the maximum size, you can use lower values). Then your packet would look like SSSS|PPPPPPPPP... (stream of bytes), where S is the additional size field and P is your payload (the real message in your application, number of P bytes is determined by value of S). You know every packet starts with 4 special bytes (S bytes), so you can read them as an 32-bit integer. Once you know the size of the encapsulated message, you read all the P bytes. After you're done with one packet, you're ready to read another one from the socket.
Good news though, you can come up with something completely different. All you need to know is how to deserialise your message from a stream of bytes and how send/recv behave. Good luck!
EDIT:
Example of function receiving arbitrary number of bytes into array:
bool recv_full(int sock, char *buffer, size_t size)
{
size_t received = 0;
while (received < size)
{
ssize_t r = recv(sock, buffer + received, size - received, 0);
if (r <= 0) break;
received += r;
}
return received == size;
}
And example of receiving packet with 2-byte prefix defining size of payload (size of payload is then limited to 65kB):
uint16_t msgSize = 0;
char msg[0xffff];
if (recv_full(sock, reinterpret_cast<char *>(&msgSize), sizeof(msgSize)) &&
recv_full(sock, msg, msgSize))
{
// Got the message in msg array
}
else
{
// Something bad happened to the connection
}

That's just how recv() works on most platforms. You have to check the number of bytes you receive and continue calling it in a loop until you get the number that you need.

You "fix" that by reading from TCP socket in a loop until you get enough bytes to make sense to your application.

my server application sends packet 8 bytes length
Not really. Your server sends 8 individual bytes, not a packet 8 bytes long. TCP data is sent over a byte stream, not a packet stream. TCP neither respects nor maintains any "packet" boundary that you might have in mind.
If you know that your data is provided in quanta of N bytes, then call recv in a loop:
std::vector<char> read_packet(int N) {
std::vector buffer(N);
int total = 0, count;
while ( total < N && (count = recv(sock_fd, &buffer[N], N-total, 0)) > 0 )
total += count;
return buffer;
}
std::vector<char> packet = read_packet(8);
If your packet is variable length, try sending it before the data itself:
int read_int() {
std::vector<char> buffer = read_packet(sizeof (int));
int result;
memcpy((void*)&result, (void*)&buffer[0], sizeof(int));
return result;
}
int length = read_int();
std::vector<char> data = read_buffer(length);

my c++ client/server file exchange implementation is very slow...why?

Hi have implemented simple file exchange over a client/server connection in c++. Works fine except for the one problem that its so damn slow. This is my code:
For sending the file:
int send_file(int fd)
{
char rec[10];
struct stat stat_buf;
fstat (fd, &stat_buf);
int size=stat_buf.st_size;
while(size > 0)
{
char buffer[1024];
bzero(buffer,1024);
bzero(rec,10);
int n;
if(size>=1024)
{
n=read(fd, buffer, 1024);
// Send a chunk of data
n=send(sockFile_, buffer, n, 0 );
// Wait for an acknowledgement
n = recv(sockFile_, rec, 10, 0 );
}
else // reamining file bytes
{
n=read(fd, buffer, size);
buffer[size]='\0';
send(sockFile_,buffer, n, 0 );
n=recv(sockFile_, rec, 10, 0 ); // ack
}
size -= 1024;
}
// Send a completion string
int n = send(sockFile_, "COMP",strlen("COMP"), 0 );
char buf[10];
bzero(buf,10);
// Receive an acknowledgemnt
n = recv(sockFile_, buf, 10, 0 );
return(0);
}
And for receiving the file:
int receive_file(int size, const char* saveName)
{
ofstream outFile(saveName,ios::out|ios::binary|ios::app);
while(size > 0)
{
// buffer for storing incoming data
char buf[1024];
bzero(buf,1024);
if(size>=1024)
{
// receive chunk of data
n=recv(sockFile_, buf, 1024, 0 );
// write chunk of data to disk
outFile.write(buf,n);
// send acknowledgement
n = send(sockFile_, "OK", strlen("OK"), 0 );
}
else
{
n=recv(sockFile_, buf, size, 0 );
buf[size]='\0';
outFile.write(buf,n);
n = send(sockFile_, "OK", strlen("OK"), 0 );
}
size -= 1024;
}
outFile.close();
// Receive 'COMP' and send acknowledgement
// ---------------------------------------
char buf[10];
bzero(buf,10);
n = recv(sockFile_, buf, 10, 0 );
n = send(sockFile_, "OK", strlen("OK"), 0 );
std::cout<<"File received..."<<std::endl;
return(0);
}
Now here are my initial thoughts: Perhaps the buffer is too small. I should therefore try increasing the size from I dunno, 1024 bytes (1KB) to 65536 (64KB) blocks, possibly. But this results in file corruption. Ok, so perhaps the code is also being slowed down by the need to receive an acknowledgement after each 1024 byte block of data has been sent, so why not remove them? Unfortunately this results in the blocks not arriving in the correct order and hence file corruption.
Perhaps I could split the file into chunks before hand and create multiple connections and send each chunk over its own threaded connection and then reassemble the chunks somehow in the receiver....
Any idea how I could make the file transfer process more efficient (faster)?
Thanks,
Ben.

Skip the acknowledgement of buffers! You insert an artificial round trip (server->client+client->server) for probably each single packet.
This slows down the transfer.
You do not need this ack. You are using TCP, which gives you a reliable stream. Send the number of bytes, then send the whole file. Do not read after send and so on.
EDIT: As a second step, you should increase the buffer size. For internet transfer you can assume an MTU of 1500, so there will be space for a payload of 1452 bytes in each IP packet. This should be your minimal buffer size. Make it larger and let the operating system slice the buffers into packets for you. For LAN you have a much higher MTU.

My guess is that you are getting out of sync and some of your reads are less than 1024. It happens all the time with sockets. The "size -= 1024" statement should be "size -= n".
My guess is that n is sometimes less than 1024 from the recv().

You should certainly increase the buffer size, and if this causes corruption it is an error in your code, which you need to fix. Also, if you use a stream protocol (i.e. TCP/IP) the order and delivery of packets is guaranteed.

Read this thread:
Send and Receive a file in socket programming in Linux with C/C++ (GCC/G++)
Oh, and use sendfile POSIX command, here's an example to get you started:
http://tldp.org/LDP/LGNET/91/misc/tranter/server.c.txt

A couple of things.
1) You are reallocating the buffer each time you go through your while loop:
while(size > 0)
{
char buf[1024];
You can pull it out of the while loop on both sides and you won't be dumping on your stack as much.
2) 1024 is a standard buffer size, and I wouldn't go much above 2048 because then the lower level TCP/IP stack will just have to break it up anyways.
3) If you really need speed, rather than waiting for a recv ack you could just add a packet number to each packet and then check them on the receiving end. This makes your receiving code a little more complex because it has to store packets that are out of order and put them in order. But then you wouldn't need an acknowledgement.
4) It's a little thing, but what if the file that you are sending has a size that is a multiple of 1024... Then you won't send the trailing '/0'. To fix that you just need to change your while to:
while (size >= 0)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

TCP fragmentation - c++

I always write my applications in a manner that expects the data to become fragmented somehow. It's not hard to do once you come up with a good design. What's the best way to monitor a socket for new data and then process that data?

Related

Prepending a message with the size of the message

Garbage values and Buffers differences in TCP

recv the first few bytes from a socket to determine buffer size

Partial receipt of packets from socket C++

my c++ client/server file exchange implementation is very slow...why?

Categories

Resources