Receiving all data sent with C sockets

Receiving all data sent with C sockets - c++

If I write a server, how can I implement the receive function to get all the data sent by a specific client if I don't know how that client sends the data?
I am using a TCP/IP protocol.

If you really have no protocol defined, then all you can do is accept groups of bytes from the client as they arrive. Without a defined protocol, there is no way to know that you have received "all the bytes" that the client sent, since there is always the possibility that a network failure occurred somewhere between the client and your server during transmission, causing the last part of the stream not to arrive at the server. In that case, you would get the usual end-of-stream indication from the TCP socket (e.g. recv() returning 0, or EWOULDBLOCK if you are using non-blocking sockets), so you would know that you aren't going to receive any more data from the client (because the TCP connection is now disconnected)... but that isn't quite the same thing as knowing you have received all of the data the client meant for you receive.
Depending on your application, that might be good enough. If not, then you'll have to work out a protocol, and trust that your clients will abide by the rules of that protocol. Having the client send a header first saying how many bytes it plans to send is a good approach; or having it send some special "Okay, that's all I meant to send" indicator is also possible (although if you do it that way, you have to watch out for false positives if the special indicator could appear by chance inside the data itself)

One call to send does not equal one call to recv. Either send a header so the receiver know how much data to expect, or send some sort of sentinel value so the the receiver knows when to stop reading.

It depends on how you want to design your protocol.
ASCII protocols usually use a special character to delimit the end of the data, while binary protocols usually send the length of the data first as a fixed-size integer (both sides know this size) and then the variable-length data follows.

You can combine size with your data in one buffer and call send once. People usually use first 2 bytes for size of data in a packet. Like this,
|size N (2 bytes) | data (N bytes) |
In this case, you can contain 65535 byte-long custom data.
Since TCP does not preserve message boundary, it doesn't matter how many times you call send. You have to call receive until you get N size(2 bytes) then you can keep calling receive until you have N bytes data you sent.
UPDATE: This is just a sample to show how to check message boundary in TCP. Security/Encryption is a whole different story and it deserves a new thread. That said, do not simply copy this design. :)

TCP is stream-based, so there is no concept of a "complete message": it's given by a higher-level protocol (e.g. HTTP) or you'd have to invent it yourself. If you were free to use UDP (datagram-based), then there would be no need to do send() multiple times, or receive().
A newer SCTP protocol also supports the concept of a message natively.
With TCP, to implement messages, you have to tell the receiver the size of the message. It can be the first few bytes (commonly 2, since that allows messages up to 64K -- but you have to be careful of byte order if you may be communicating between different systems), or it can be something more complicated. HTTP, for example, has a whole set of rules by which the receiver determines the length of the message. One of them is the Content-Length HTTP header, which contains a string representing the number of bytes in the body of the message. Header-only HTTP messages are simply delimited by a blank line. As you can see, there are no easy (or standard) answers.

TCP is a stream based protocol. As such there is no concept of length of data built into TCP in the same way as there is no concept of data length for keyboard input.
It is therefore up to the higher level protocol to specify the end of the message. This can be done by including the packet length in the protocol or specifying a special end-of-message byte sequence.
For example HTTP headers are terminated by a double \r\n sequence and the length of the message body can be obtains from the Content-Length header.

Related

Does Boost Asio networking send/recieve have any sort of data completeness guarantee?

I've been using boost asio sockets (UDP and TCP) to handle a custom protocol between my client server program. Its been working great until I discovered that on TCP async_send/async_recieve calls that data can arrived in combined chunks.
For example, if I make two send calls each with it's own packet, they can arrive combined at a single receive call. I wrongly assumed that every send corresponds to a receive, but I'm obviously wrong. It however has worked well for the longest time until I found the issue running the client for a different OS.
So my question is: are there any guarantees to the completeness of the data on arrival for every receive call? (e.g. async_send 128 bytes arrive in multiples of 128 bytes, or how it arrives must always be treated as random, like 1 bytes arrives then 127 bytes is possible)
More specifically, does this mean that:
Data can arrive concatenated or partial for every send call, and I
have to always handle the concatenated/partial data manually
Is this true for both UDP and TCP asio sockets?
I searched around and couldn't find any documentation on this so I was wondering if anyone have any idea.

First its important to understand that boost asio socket receive and sends methods just mean that they ordered the underlying network stack to receive or send data. By network stack this could be the windows socket API.
If you are sending data right to the same computer, via so called loopback addresses, the operating system (if there is any) can just "give" it to the listening i.e. receiving program. Thats the scenario where you would be most lucky to get things in order and always complete for all cases.
However if you want you are addressing another computer or because the operating system is in the mood, you will have different behaviour:
TCP was designed that you will get you data in the order you have send it. But the chunks or packet size if will be sent differs even on the same connection and is a key feature of TCP. Your OS or hardware network adapter might do some send or receive buffering too, before informing you. However things won't get lost.
So in short for TCP: You can make sure the data is complete by waiting for a certain point in your data async_read_until is just there for this case. Data from multiple send calls might be in one receive or many
UDP was designed to have a low latency in contrast to TCP, but without its ordering and completeness guarantees. So when you send a UDP datagram i.e. packet, usually the OS and network adapter will try to send it out ASAP. However on the way to the other computer, the internet might loose it, or hold one packet back until the one you send after the first, so that data you send later, could be received later, while you can also get the sent first, later, or might not. But when you receive a datagram it's complete in it self.
So in short for UDP: Data will arrive in datagram chunks, but some datagrams might be missing, or might arrive in another order than sent. The data from one send might be in one receive, might not, or later

So after some more testing here's what I concluded: the answer is no. Boost Asio sockets does not have magic that can enforce data completeness beyond what the TCP/UDP protocols enforces.
Edit:
So here's more of my research:
For TCP, it acts like a data stream. So packets may arrive partial or combined and is complete. So the user application need to handle deserialization of combined or partial data.
For UDP, because it is a datagram packet, if the packet arrives, it is guaranteed to be independent and complete. So there is no need to handle partial or combined packets.

Are TCP packets reordered usually?

I am reimplementing an old network layer library, but using boost asio this time. Our software is tcpip dialoging with a 3rd party software. Several messages behave very well on both sides, but there is one case I misunderstand:
The 3rd party sends two messages (msg A and B) one after the other (real short timing) but I receive only a part of message A in tcp-packet 1, and the end of message A and the whole message B in tcp-packet 2. (I sniff with wireshark).
I had not thought of this case, I am wondering if it is common with tcp, and if my layer should be adaptative to that case - or should I say to the 3rd party to check what they do on their side so as I received both message in different packets.

Packets can be fragmented and arrive out-of-sequence. The TCP stack which receives them should buffer and reorder them, before presenting the data as an incoming stream to the application layer.
My problem is with message B, that I don't see because it's after the end of message one in the same packet.
You can't rely on "messages" having a one-to-one mapping to "packets": to the application, TCP (not UDP) looks like a "streaming" protocol.
An application which sends via TCP needs another way to separate messages. Sometimes that's done by marking the end of each message. For example SMTP marks the end-of-message as follows:
The transmission of the body of the mail message is initiated with a
DATA command after which it is transmitted verbatim line by line and
is terminated with an end-of-data sequence. This sequence consists of
a new-line (), a single full stop (period), followed by
another new-line. Since a message body can contain a line with just a
period as part of the text, the client sends two periods every time a
line starts with a period; correspondingly, the server replaces every
sequence of two periods at the beginning of a line with a single one.
Such escaping method is called dot-stuffing.
Alternatively, the protocol might specify a prefix at the start of each message, which will indicate the message-length in bytes.
If you're are coding the TCP stack, then you'll have access to the TCP message header: the "Data offset" field tells you how long each message is.

Yes, this is common. TCP/IP is a streaming protocol and your "logical" packet may be split across many "physical" packets, so the client is responsible for assembling the higher-level packets. Additionally, TCP/IP guarantees the proper ordering, so you don't have to worry about assembling out of order packets.

your problem has got nothing to do with TCP at all. your problem is that you expected asio to do the message parsing for you. it does not, you have to implement it.
if your messages are all the same size do an async read for that size.
if they are of different length do a async read for your header size, analyze the header and do an async read for the rest of the message according to the header.
if your messages are of variable length and the size is unknown but there is a defined end character or sequence then you have to save the remaining bytes behind that end sequence and append the next read to that remainder.

Packet Size modification over Sockets

I am doing socket programming in QT and I have to design a protocol to transfer data over TCP/IP.
Now my protocol design is simple. It sends commands in a fashion that the first byte of the data written to the socket for every write will be the command. So whenever I write into the socket using socket->write("CDATA") the first byte, "C" in this case will mean a command for the server to do something.
I just want to know one thing, that whether the write will be broken down into multiple reads on the server ? I know there will be a buffer size on the server for the read. But can the socket->write() on the client be recieved in multiple reads on the server when the write is within the buffer limits of the server ?
To clear this question I will given an example Lets say the buffer read size of the socket on the server is 4096 bytes. The client writes socket->write("CDATA") to the server. Now is there any possibility that server will receive this in more than one read? Because I have a while loop on the server :
while{
char str[] = socket->read();
// What is the coomand in the first byte
if(str[0] == "C"){
// Do something
}
}
If the data sent by the client is received in more than one read (even though the client sent it in one write) my protocol design will fail.

Now is there any possibility that server will receive this in more than one read?
Yes, TCP/IP can fragment messages any way it likes. TCP is a stateful stream protocol: you are guaranteed that bytes you put in on one end will come out the other end in the same order. IP is connectionless and datagram based. Due to the nature of carrying TCP over IP, circumstances can arise in which data packets are split, merged, or otherwise processed in transit.
You should find a way to sanitize your program to the intricacies of network communication. You can:
Use a datagram protocol like UDP (you lose the guarantee of getting data in the order they are sent, and dropped packets becomes a possibility as well. Today's networks are fairly robust; this is not usually a problem).
[DATAGRAM (size specified in datagram header)]
Always read blocks of a fixed size from the network
[DATA - block of data of some fixed size]
Include the size of the incoming data as a header attached to the front
[LENGTH - 4 byte integer][DATA - block of data of size LENGTH]
Use some sort of delimiter to indicate end-of-data and continue reading until you get it
[DATA - indeterminately sized data][DELIMITER - end-of-data control sequence]
Chances are you can use library methods to perform this behavior for you requiring very little code on your part.

Packets Are Stacked When Sent at Regular Intervals

I am trying to send a message over a TCP socket at a regular interval (every second). Sometimes the full message will not be sent or two-four messages will be stacked and sent at once. I have if statements for if the return value is 0 or < 0, but those are never true. I tried the obvious approach of checking the exact return value of send() to see if less or more bytes were sent. It just returns the number that I specify in the parameter to send (which makes sense if send blocks until it sends that much), even if less bytes are sent. So is there an accurate way to say "was the right size packet sent? no? - do something"?

TCP provides a reliable stream of bytes, there's no message boundary. If you need to know the length of the message you have to build this into the protocol, eg: send every message with a 2 byte header which specifies the message length.

There's no such facility with TCP. It's up to the in-kernel network stack how to slice TCP stream into packets. Having said that you can set TCP_NODELAY option on your socket to disable Nagle algorithm.

If I am understanding you right, sometimes you send two or more packets and they are received as one on the distant end.
This is the nature of TCP/IP. You cannot guarantee the packets will arrive as distinct, just that they will arrive in order and reliably.

Not sure what platform you are using or what syntax you are using (streams, FILE objects or file descriptors; some code would clarify this) but you may need to do an explicit flush operation after you write each message to force the kernel. I generally use C-style file descriptors and it is usually sufficient to call fflush on the descriptors to make whatever I've queued up go out immediately.

Confusion about UDP/IP and sendto/recvfrom return values

I'm working with UDP sockets in C++ for the first time, and I'm not sure I understand how they work. I know that sendto/recvfrom and send/recv normally return the number of bytes actually sent or received. I've heard this value can be arbitrarily small (but at least 1), and depends on how much data is in the socket's buffer (when reading) or how much free space is left in the buffer (when writing).
If sendto and recvfrom only guarantee that 1 byte will be sent or received at a time, and datagrams can be received out of order, how can any UDP protocol remain coherent? Doesn't this imply that the bytes in a message can be arbitrarily shuffled when I receive them? Is there a way to guarantee that a message gets sent or received all at once?

It's a little stronger than that. UDP does deliver a full package; the buffer size can be arbitrarily small, but it has to include all the data sent in the packet. But there's also a size limit: if you want to send a lot of data, you have to break it into packets and be able to reassemble them yourself. It's also no guaranteed delivery, so you have to check to make sure everything comes through.
But since you can implement all of TCP with UDP, it has to be possible.
usually, what you do with UDP is you make small packets that are discrete.
Metaphorically, think of UDP like sending postcards and TCP like making a phone call. When you send a postcard, you have no guarantee of delivery, so you need to do something like have an acknowledgement come back. With a phone call, you know the connection exists, and you hear the answers right away.

Actually you can send a UDP datagram of 0 bytes length. All that gets sent is the IP and UDP headers. The UDP recvfrom() on the other side will return with a length of 0. Unlike TCP this does not mean that the peer closed the connection because with UDP there is no "connection".

No. With sendto you send out packets, which can contain down to a single byte.
If you send 10 bytes as a single sendto call, these 10 bytes get sent into a single packet, which will be received coherent as you would expect.
Of course, if you decide to send those 10 bytes one by one, each of them with a sendto call, then indeed you send and receive 10 different packets (each one containing 1 byte), and they could be in arbitrary order.
It's similar to sending a book via postal service. You can package the book as a whole into a single box, or tear down every page and send each one as an individual letter. In the first case, the package is bulkier but you receive the book as a single, ordered entity. In the latter, each package is very light, but good luck reading that ;)

I have a client program that uses a blocking select (NULL timeout parameter) in a thread dedicated to waiting for incoming data on a UDP socket. Even though it is blocking, the select would sometimes return with an indication that the single read descriptor was "ready". A subsequent recvfrom returned 0.
After some experimentation, I have found that on Windows at least, sending a UDP packet to a port on a host that's not expecting it can result in a subsequent recvfrom getting 0 bytes. I suspect some kind of rejection notice might be coming from the other end. I now use this as a reminder that I've forgotten to start the process on the server that looks for the client's incoming traffic.
BTW, if I instead "sendto" a valid but unused IP address, then the select does not return a ready status and blocks as expected. I've also found that blocking vs. non-blocking sockets makes no difference.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js