UDP transfer & maintaining network byte order - c++

So I am having some trouble sending and receiving a custom packet via a client-server program in C++. I have implemented something similar using TCP, however I am having issues converting everything into a single datagram when using UDP.
Currently I have the header consisting of an even number of uint32_t fields. Each of these is stored in network order like so in a header struct:
uint32_t x = htonl (int y);
...
And I am combining the header with a payload in a packet like so:
typedef struct {
450Header header; // 512 bytes consisting of the unint32_t like above
char data[ BLOCKSIZE ]; // 3.5k
// Total Packet Size = 4k
} Packet;
'
The header portion will contain info about the packet, my question how to deal with byte order for strings and the payload. I would ideally like to add some string fields to the header for things like filename, and if I am sending a file larger than the blocksize in the packet I would like to split it up into multiple packets, I need to know how to split the file so it can be interpreted in the right order the receiving end.
I have successfully built a header independently (with all fields in network byte order), do i need to also convert the ordering of the strings if I add them to the header? I assume if i keep the strings at a set size I can still write a checksum function for the header.
If everything is correctly ordered from question 1, do I need to convert them again on the receiving end back to host order?
I have an mmap function which loads the file into a char buffer, can I simply copy this piece by piece to the data buffer in the Packet using something like memcpy and incrementing the offset? Or do I have to worry about network order for the payload data as well?
Do I need to use a checksum on the payload? And how do I do so if it does not use the whole buffer and ends up with an odd number of bytes?
Eventually I want the header to contain sequence number so I can practice implementing procedures if packets are dropped (i.e. go back N), so what I am most worried about is standardizing the order in which everything is sent from the client so it can be interpreted in the right order on the server side.

8bit data (including 8bit integers, Ansi/UTF-8 strings, etc) do not suffer from byte order issues, so you can send/receive them as-is. It is only multi-byte data (like 16bit/32bit integers, 16bit UCS2/UTF-16 strings, etc) that you have to deal with byte ordering. For instance, integers should always be in network byte order, but UTF-16 strings can use either UTF-16LE or UTF-16BE at your discretion (though you should use UTF-8 instead).
If you have to split data into multiple packages, then you need to put information in the packet header to specify the order of the packets. UDP does not guarantee packets arrive in the same order that they are sent, or even guarantee that they arrive at all. So the receiver needs to collect the packets (requesting missing packets as needed) and then re-order them accordingly before then processing the data.
Yes, you should always convert packets to network byte order before sending them, and convert back to host byte order when processing them on the receiving end. Converting to network byte order is only for transmission purposes to ensure a consistent format across platforms.
A checksum is a good idea to ensure the integrity of the data for each packet, but it is not a requirement. You can certain provide a fixed-length checksum for arbitrary-length data, there are plenty of checksum algorithms that support that.

Related

Advantage of asio::streambuf over raw array

I don't quite understand the advantage of using streambuf over the regular array.
Let me explain my problem. I have a network connection which is encrypted using Rijndael 128 ECB + some easy cipher to encrypt remaining data that is shorter than 16 bytes. The packets are structured as length_of_whole_packet+operationcode+data. I have to actually copy all the data from the streambuf so I can apply decryption algorithm? Why making another copy of data I already have?
Same problem I have with the sending data. When in securedmode the packet structure is length_of_whole_packet+crc+data, where crc and data is encrypted. I could make some monstrosity as MakePacket(HEADER, FORMAT, ...) that would allocate array, format the packet, add crc and encrypt it, but I would like to avoid vararg function. I can't use structs as the packets has dynamic length because there can be arrays in it, or strings. If I used MakePacket(unsigned char opcode, &streambuf sb) then there would be problem with the crc again -> have to make a copy to encrypt it.
Should I use the vararg monstrosity for sending using regular array as buffer in combination with unsigned char pbyRecvBuffer[BUFFERMAXLEN] for recv?
I'm not really sure how to design this communication with avoiding copies of data.
Thank you for you answers.
When using streambufs, copying of data can often be minimized by using algorithms that operate on iterators, such as std::istreambuf_iterator or boost::asio::buffers_iterator, rather than copying data from a streambuf into another data structure.
For stream-like application protocols, boost::asio::streambuf is often superior to boost::asio::buffer() compatible types, such as a raw array. For example, consider HTTP, where a delimiter is used to identify boundaries between variable length headers and bodies. The higher-level read_until() operations provide an elegant way to to read the protocol, as Boost.Asio will handle the memory allocation, detect the delimiter, and invoke the completion handler once the message boundary has been reached. If an application used a raw array, it would need to read chunks and copy each fragmented chunk into an aggregated memory buffer until the appropriate delimiter was identified.
If the application can determine the exact amount of bytes to read, then it may be worth considering using boost::array for fixed length portions and std::vector for the variable length portions. For example, an application protocol with a:
fixed length body could be read into a boost::array.
fixed length header that contains enough information to determine the length of the following variable length body could use a std::vector to read the fixed size header, resize the vector once the body length has been determined, then read the body.
In context of the question, if length_of_whole_packet is of a fixed length, the application could read it into std::vector, resize the vector based on the determined body length, then read the remaining data into the vector. The decryption algorithm could then operate directly on the vector and use an output iterator, such as std::back_insert_iterator, with an auxiliary output buffer if the algorithm cannot be done in-place. The same holds true for encrypting data to be written.

C++ Boost asio get data size?

I am using the boost asio library to read some data using tcp. After using a.accept(*sock);, how to get the size of the 1st packet the client will send?
I use (sock->remote_endpoint().address()).to_string() to get the IP address of the user, so I guess there must be a similar simple way to get the size of the packet, right?
At the application level, it is often far more useful to know the number of bytes currently available for reading, rather than the packet size. The amount of data available for reading may be constructed from one or more TCP segments. In the OSI model, a TCP segment (Layer 4: Transport) may be constructed from one or more IP Layer packets (Layer 3: Network), and each packet may be constructed from one or more Ethernet frames (Layer 2: Data Link).
Therefore, I am going to assume the application is interested in knowing how many bytes to read, rather than knowing lower level details, such as the size of a packet. There are a few solutions to this problem:
Query the socket for how much data is available via socket::available(), then allocate the buffer accordingly.
std::vector<char> data(socket_.available());
boost::asio::read(socket_, boost::asio::buffer(data));
Use a class that Boost.Asio can grow in memory, such as boost::asio::streambuf. Some operations, such as boost::asio::read() accept streambuf objects as their buffer and will allocate memory as is required for the operation. However, a completion condition should be provided; otherwise, the operation will continue until the buffer is full.
boost::asio::streambuf data;
boost::asio::read(socket_, data,
boost::asio::transfer_at_least(socket_.available()));
As Igor R. suggests in the comments, incorporate length as part of the communication protocol. Check the Boost.Asio examples for examples of communication protocols. Focus on the protocol, not necessarily on the Boost.Asio API.
In a fixed length protocol, a constant byte size is used to indicate message boundaries, such as in the Boost.Asio Porthopper example. As the reader knows the size of the message, the reader can allocate a buffer in advance.
In a variable length protocol, such as the one used in the Boost.Asio Chat example, a message is often divided into two parts: a header and a body. One approach is to have a a fixed size header that contains various meta-information, such as the length of the body. This allows an application to read a header into a fixed size buffer, extract the body length, allocate a buffer for the body, then read the body.
// Read fixed header.
std::vector<char> data(fixed_header_size);
boost::asio::read(socket_, boost::asio::buffer(data));
protocol::header header(data);
network_to_local(header); // Handle endianess.
// Read body.
data.resize(header.body_length());
boost::asio::read(socket_, boost::asio::buffer(data));
protocol::body body(data);
network_to_local(body); // Handle endianess.
On the other hand, if I am mistaken, and you do need the total length of a packet, then one can use the basic_raw_socket. Boost.Asio's ICMP example demonstrates reading IPv4 packets from a socket, and extracting the header's field values.

Can TCP data overlap in the buffer

If I keep sending data to a receiver is it possible for the data sent to overlap such that they accumulate in the buffer and so the next read to the buffer reads also the data of another sent data?
I'm using Qt and readAll() to receive data and parse it. This data has some structure in it so I can know if the data is already complete or if it is valid data at all but I'm worried that other data will overlap with others when I call readAll() and so would invalidate this suppose-to-be valid data.
If it can happen, how do I prevent/control it? Or is that something the OS/API worries about instead? I'm worried partly because of how the method is called. lol
TCP is a stream based connection, not a packet based connection, so you may not assume that what is sent in one time will also be received in one time. You still need some kind of protocol to packetize your stream.
For sending strings, you could use the nul-character as separator or you could begin with a header which contains a magic and a length.
According to http://qt-project.org/doc/qt-4.8/qiodevice.html#readAll this function snarfs all the data and returns it as an array. I don't see how the API raises concerns about overlapping data. The array is returned by value, and given that it represents the entire stream, so what would it even overlap with? Are you worried that the returned object actually has reference semantics (i.e. that it just holds pointers to storage that is re-used in other calls to the same function?)
If send and receive buffers overlap in any system, that's a bug, unless special care is taken that the use is completely serialized. (I.e. a buffer is somehow used only for sending and only for receiving, without any mixup.)
Why dont you use a fixed length header followed by variable length packet with the header holding the information of length of packet.
This way you can avoid worrying about packet boundaries. Say for example instead of just sending the string send the length of the string followed by the string. In the receiver end always read the length and then based on the length read the string.

Sending variable length arrays over a network?

In the game I'm making, I nee to be able to send std::vectors of integer over a network.
A packet seems to be made up entirely of a string. Since enet, the network libray im using takes care of endian, my first idea on solving this is to send a messsage where the first byte is the message id, as usual, the next 4 bytes would be an integer indicating the length of the array, and all subsequent bytes would be the ints in the array. On the client side I can then push these back into a vector.
Is this how it is usually done or am I missing something critical? Is there a better way to do it?
Thanks
In general, there are two approaches to solving this problem which can be combined. One is to put the length of the array before the actual array. The other involves including some framing to mark the end (or beginning) of your message. There are advantages and disadvantages to each.
Putting the length before the array is simplest. However, if there should ever be a bug where the length does not match the number of integers, there is no way to detect this or recover from it.
Using a framing byte(s) to mark the end of the message has the advantage of more robustness and the ability to recover from an improperly formatted message at the cost of complexity. The complexity comes in the fact that if your framing bytes appear in your array of integers, you must escape the bytes (i.e. prepend and escape character). Also, your code to read messages from the network becomes a little more complicated.
Of course, this all assumes that you are dealing stream and not a datagram. If your messages are clearly packetized already, then the length should be enough.
There are two ways to send variable length data on a stream: by prefixing the data with the length, or by suffixing it with a delimiter.
A suffix can have a theoretically infinite size, but it means the delimiter must not appear in the data. This approach can be used for strings, with a NUL ('\0') character as the delimiter.
When dealing with binary data, then you don't have any choice but to prefix the length. The size of the data will be limited to the the size of the prefix, which is rarely a problem with a 4 byte prefix (because otherwise it means you're sending more than 4 gigabytes of data).
So, it all depends on the data being sent.
Appending that header information to your packet is a good approach to take. Another option you can do on the receive side, if the data is stored in some unsigned char* buffer of memory, is create a structure like so:
typedef struct network_packet
{
char id;
int message_size;
int data[];
} __attribute__((packed)) network_packet;
You can then simply "overlay" this structure on-top of your received buffer like so:
unsigned char* buffer;
//...fill the buffer and make sure it's the right size -- endian is taken care of
//via library
network_packet* packet_ptr = (network_packet*)buffer;
//access the 10th integer in the packet if the packet_ptr->message_size
//is long enough
if (packet_ptr->message_size >= 10)
int tenth_int = packet_ptr->data[9];
This will avoid you having to go through the expense of copying all the data back, which already exists in a buffer, back into another std::vector on the receive side.

C++ byte stream

For a networked application, the way we have been transmitting dynamic data is through memcpying a struct into a (void*). This poses some problems, like when this is done to an std::string. Strings can be dynamic length, so how will the other side know when the string ends? An idea I had was to use something similiar to Java's DataOuputStream, where I could just pass whatever variables to it and it could then be put into a (void*). If this can't be done, then its cool. I just don't really like memcpying a struct. Something about it doesn't seem quite right.
Thanks,
Robbie
nothing wrong with memcpy on a struct - as lng as the struct is filled with fixed-size buffers. Put a dynamic variable in there and you have to serialise it differently.
If you have a struct with std::strings in there, create a stream operator and use it to format a buffer. You can then memcpy that buffer to the data transport. If you have boost, use Boost::serialize which does all this for you (that link also has links to alternative serialization libs)
Notes: the usual way to pass a variable-size buffer is to begin by sending the length, then that many bytes of data. Occasionally you see data transferred until a delimiter is received (and fields within that data are delimited themselves by another character, eg a comma).
I see two parts of this question:
- serialization of data over a network
- how to pass structures into a network stack
To serialize data over a network, you'll need a protocol. Doesn't have to be difficult; for ASCII even a cr/lf as packet end may do. If you use a framework (like MFC), it may provide serialization functions for you; in that case you need to worry about how to send this in packets. A packetization which often works well for me is :
<length><data_type>[data....][checksum]
In this case the checksum is optional, and also zero-data is possible, for instance if the signal is carried in the data_type (i.e. Ack for acklnowedgement)
If you're working on the memcpy with structures, you'll need to consider that memcpy only makes a shallow copy. A pointer is worthless once transmitted over a network; instand you should transmit the data from that pointer (i.e. the contents of your string example)
For sending dynamic data across the network you have the following options.
First option in the same packet.
void SendData()
{
int size;
char payload[256];
Send(messageType)
Send(size);
Send(payload)
}
Second option:
void SendData()
{
char payload[256];
Send(messageType)
Send(payload)
}
Though in either situation, you will be faced with more of a design choice. In the first example you would send the message type, and the payload size and also then the payload.
The second option you have is you can send the message type and then you can send the string that has a delimiter of null terminator.
Though either option does not cover fully the problem your facing I think. Firstly, you need to determine if you're building a game what type of protocal you will be using, UDP? TCP? The second problem you will be facing is the maximum packet size. Then on top of that you need to have the framework in place so that you can calculate the optimum packet size that will not be fragmented and lost to the inter web. After that you have bandwidth control in regards to how much data you can transmitted and receive between the client and server.
For example the way that most games approach this situation is each packet is identified with the following.
MessageType
MessageSize
CRCCheckSum
MessageID
void buffer[payload]
In situation where you need to send dynamic data you would send a series of packets not just one. For example if you were to send a file accross the network the best option would to use TCP/IP because its a streaming protocal and it garnentees that the complete stream arrives safly to the other end. On the other hand UDP is a packet based protocal and is does not do any checking that all packets arrived in order or at all on the other end.
So in conclusion.
For dynamic data, send multiple packets but with a special flag
to say more data is to arrive to complete this message.
Keep it simple and if your working with C++ dont assume the packet or data
will contain a null terminator and check the size compared to the
payload if you decide to use a null terminator.