Sending variable length arrays over a network? - c++

In the game I'm making, I nee to be able to send std::vectors of integer over a network.
A packet seems to be made up entirely of a string. Since enet, the network libray im using takes care of endian, my first idea on solving this is to send a messsage where the first byte is the message id, as usual, the next 4 bytes would be an integer indicating the length of the array, and all subsequent bytes would be the ints in the array. On the client side I can then push these back into a vector.
Is this how it is usually done or am I missing something critical? Is there a better way to do it?
Thanks

In general, there are two approaches to solving this problem which can be combined. One is to put the length of the array before the actual array. The other involves including some framing to mark the end (or beginning) of your message. There are advantages and disadvantages to each.
Putting the length before the array is simplest. However, if there should ever be a bug where the length does not match the number of integers, there is no way to detect this or recover from it.
Using a framing byte(s) to mark the end of the message has the advantage of more robustness and the ability to recover from an improperly formatted message at the cost of complexity. The complexity comes in the fact that if your framing bytes appear in your array of integers, you must escape the bytes (i.e. prepend and escape character). Also, your code to read messages from the network becomes a little more complicated.
Of course, this all assumes that you are dealing stream and not a datagram. If your messages are clearly packetized already, then the length should be enough.

There are two ways to send variable length data on a stream: by prefixing the data with the length, or by suffixing it with a delimiter.
A suffix can have a theoretically infinite size, but it means the delimiter must not appear in the data. This approach can be used for strings, with a NUL ('\0') character as the delimiter.
When dealing with binary data, then you don't have any choice but to prefix the length. The size of the data will be limited to the the size of the prefix, which is rarely a problem with a 4 byte prefix (because otherwise it means you're sending more than 4 gigabytes of data).
So, it all depends on the data being sent.

Appending that header information to your packet is a good approach to take. Another option you can do on the receive side, if the data is stored in some unsigned char* buffer of memory, is create a structure like so:
typedef struct network_packet
{
char id;
int message_size;
int data[];
} __attribute__((packed)) network_packet;
You can then simply "overlay" this structure on-top of your received buffer like so:
unsigned char* buffer;
//...fill the buffer and make sure it's the right size -- endian is taken care of
//via library
network_packet* packet_ptr = (network_packet*)buffer;
//access the 10th integer in the packet if the packet_ptr->message_size
//is long enough
if (packet_ptr->message_size >= 10)
int tenth_int = packet_ptr->data[9];
This will avoid you having to go through the expense of copying all the data back, which already exists in a buffer, back into another std::vector on the receive side.

Related

Sending a structure trough send

Before anything, I want to say the title is not the question.
My question is why can't you just send all the bytes of the structure and then cast it into that structure (giving you have the structure defined in both sides, which makes sense you have).
Thank you!
EDIT: Here's my current structure:
struct COMPUTER_INFO
{
const char* Name;
int Brightness;
int Volume;
}
I was thinking that it you can easily calculate the size of all that and then send it trough send().
Name is a pointer (contains an address) that only makes sense to your program on your computer. If you sent this structure as bytes the receiving program would receive just the address not the characters that comprise the name. The received address would also not point to a valid location in memory in the receiving computer.
Brightness and Volume are ints - ints do not have a fixed size they are the "natural" word size of the computer (the standard does impose some restrictions). So the sizeof(int) on the sending and receiving computer may be different. There may also be encoding differences e.g. big vs little endian. See: https://en.wikipedia.org/wiki/Endianness
in general you are right. You can, of course, send the raw byte streams. But how do you want the receiver to recognize the positions of the members of your structure in this byte stream? Especially, the char* in your example is of variable size, so this will not be possible.
I recommend using the Boost serialization package. You can find a detailed tutorial here. The package will take care of "serializing" (packaging your object into a byte stream) and "deserializing" (constructing your object from the byte stream). It is absolutely convenient for nearly all STL containers and can easily be expanded for custom types.

Sending a part of a byte array

I am reading data from a serial port (in an Arduino) and framing it (syncing on a few bytes). To do that, I am reading the data into a big buffer.
Once I got the frame, I extract data and I want to send it to a different serial port using serial.write (Serial.write(buf, len)) which accepts a byte array and its size.
Since the data size can be random, I need something like a dynamic array (which is not recommended in Arduino). Any ideas?
Since the data size can be random, I need something like a dynamic array
In C you rarely need a dynamic array, because arrays passed to functions do not carry their size with them. That is why all functions that take an array also take length.
Let's say you have your data inside bigBuffer at position startPos, and you wish to send length bytes. All you need to do is
Serial.write(&bigBuffer[startPos], length);
or with pointer arithmetic syntax
Serial.write(bigBuffer+startPos, length);

Advantage of asio::streambuf over raw array

I don't quite understand the advantage of using streambuf over the regular array.
Let me explain my problem. I have a network connection which is encrypted using Rijndael 128 ECB + some easy cipher to encrypt remaining data that is shorter than 16 bytes. The packets are structured as length_of_whole_packet+operationcode+data. I have to actually copy all the data from the streambuf so I can apply decryption algorithm? Why making another copy of data I already have?
Same problem I have with the sending data. When in securedmode the packet structure is length_of_whole_packet+crc+data, where crc and data is encrypted. I could make some monstrosity as MakePacket(HEADER, FORMAT, ...) that would allocate array, format the packet, add crc and encrypt it, but I would like to avoid vararg function. I can't use structs as the packets has dynamic length because there can be arrays in it, or strings. If I used MakePacket(unsigned char opcode, &streambuf sb) then there would be problem with the crc again -> have to make a copy to encrypt it.
Should I use the vararg monstrosity for sending using regular array as buffer in combination with unsigned char pbyRecvBuffer[BUFFERMAXLEN] for recv?
I'm not really sure how to design this communication with avoiding copies of data.
Thank you for you answers.
When using streambufs, copying of data can often be minimized by using algorithms that operate on iterators, such as std::istreambuf_iterator or boost::asio::buffers_iterator, rather than copying data from a streambuf into another data structure.
For stream-like application protocols, boost::asio::streambuf is often superior to boost::asio::buffer() compatible types, such as a raw array. For example, consider HTTP, where a delimiter is used to identify boundaries between variable length headers and bodies. The higher-level read_until() operations provide an elegant way to to read the protocol, as Boost.Asio will handle the memory allocation, detect the delimiter, and invoke the completion handler once the message boundary has been reached. If an application used a raw array, it would need to read chunks and copy each fragmented chunk into an aggregated memory buffer until the appropriate delimiter was identified.
If the application can determine the exact amount of bytes to read, then it may be worth considering using boost::array for fixed length portions and std::vector for the variable length portions. For example, an application protocol with a:
fixed length body could be read into a boost::array.
fixed length header that contains enough information to determine the length of the following variable length body could use a std::vector to read the fixed size header, resize the vector once the body length has been determined, then read the body.
In context of the question, if length_of_whole_packet is of a fixed length, the application could read it into std::vector, resize the vector based on the determined body length, then read the remaining data into the vector. The decryption algorithm could then operate directly on the vector and use an output iterator, such as std::back_insert_iterator, with an auxiliary output buffer if the algorithm cannot be done in-place. The same holds true for encrypting data to be written.

Searching for a Binary Value

I am trying to find a way to identify the start of a chunk of data sent via a TCP socket. The data chunk has the value of the integer 1192 written into it as the first four bytes, followed by the content length. How can I search the binary data (the char* received) for this value? I realize I can loop through and advance the pointer by one each time, copy out the first four bytes, and check it, but that isn't the most elegant or possibly efficient solution.
Is there also another way this could be done that I'm not thinking of?
Thanks in advance.
It sounds like linear scanning might be required, but you shouldn't really be losing your message positioning if the sending side of the connection is making its send()/write() calls in a sensible manner, you are reading in your buffers properly, and there isn't an indeterminate amount of "dead" space in the stream between messages.
If the protocol itself is sensible (there is at least a length field!), you should never lose track of message boundaries. Just read the marker/length pair, then read length payload bytes, and the next message should start immediately after this, so a linear scan shouldn't have to go anywhere ideally.
Also, don't bother copying explicitly, just cast:
// call htonl() to flip endianness if need be...
uint32_t x = *reinterpret_cast<uint32_t *>(charptr);

limit on string size in c++?

I have like a million records each of about 30 characters coming in over a socket. Can I read all of it into a single string? Is there a limit on the string size I can allocate?
If so, is there someway I can send data over the socket records by record and receive it record by record. I dont know the size of each record until runtime.
To answer your first question: The maximum size of a C++ string is given by string::max_size
std::string::max_size() will tell you the theoretical limit imposed by the architecture your program is running under. Other than that, as long as you have sufficient RAM and/or disk swap space, you can have std::strings of huge size.
The answer to your second question is yes, you can send record by record, moreover you might not be able to send big chunks of data over a socket at once - there are limits on the size of a single send operation. That the size of a single string is not known until runtime is not a problem, it doesn't need to be known at compile time for sending them over a socket. How to actually send those strings record by record depends on what socket/networking library you are using; consult the relevant documentation.
There is no official limit on the size of a string. The software will ask your system for memory and, as long as it gets it, it will be able to add characters to your string.
The rest of your question is not clear.
The only practical limit on string size in c++ is your available memory. That being said, it will be expensive to reallocate your string to the right size as you keep receiving data (assuming you do not know its total size in advance). Normally you would read chunks of the data into a fixed-size buffer and decode it into its naturally shape (your records) as you get it.
The size of a string is only limited by the amount of memory available to the program, it is more of a operating system limitation than a C++ limitation. C++/C strings are null terminated so the string routines will happily process extremely long strings until they find a null.
On Win32 the maximum amount of memory available for data is normally around 2 Gigs.
You can read arbitrarily large amounts of data from a socket, but you must have some way of delimiting the data that you're reading. There must be an end of record marker or length associated with the records that you are reading so use that to parse the records. Do you really want read the data into a string? What happens if your don't have enough free memory to hold the data in RAM? I suspect there is a more efficient way to handle this data, but I don't know enough about the problem.
In theory, no. But don't go allocating 100GB of memory, because the user will probably not have that much RAM. If you are using std::strings then the max size is std::string::npos.
If we are talking about char* You are limited with smth about 2^32 on 32-bit systems and with 2^64 on (surprise) 64-bit ones
Update: This is wrong. See comments
How about send them with different format?
in your server:
send(strlen(szRecordServer));
send(szRecordServer);
in you client:
recv(cbRecord);
alloc(szRecordClient);
recv(szRecordClient);
and repeat this million times.