Sending a structure trough send - c++

Before anything, I want to say the title is not the question.
My question is why can't you just send all the bytes of the structure and then cast it into that structure (giving you have the structure defined in both sides, which makes sense you have).
Thank you!
EDIT: Here's my current structure:
struct COMPUTER_INFO
{
const char* Name;
int Brightness;
int Volume;
}
I was thinking that it you can easily calculate the size of all that and then send it trough send().

Name is a pointer (contains an address) that only makes sense to your program on your computer. If you sent this structure as bytes the receiving program would receive just the address not the characters that comprise the name. The received address would also not point to a valid location in memory in the receiving computer.
Brightness and Volume are ints - ints do not have a fixed size they are the "natural" word size of the computer (the standard does impose some restrictions). So the sizeof(int) on the sending and receiving computer may be different. There may also be encoding differences e.g. big vs little endian. See: https://en.wikipedia.org/wiki/Endianness

in general you are right. You can, of course, send the raw byte streams. But how do you want the receiver to recognize the positions of the members of your structure in this byte stream? Especially, the char* in your example is of variable size, so this will not be possible.
I recommend using the Boost serialization package. You can find a detailed tutorial here. The package will take care of "serializing" (packaging your object into a byte stream) and "deserializing" (constructing your object from the byte stream). It is absolutely convenient for nearly all STL containers and can easily be expanded for custom types.

Related

Memcpy and Structures

I'm working to copy the following structure to a byte array to send over a named pipe. I've found that since switching from a byte array that I had given a static definition, to a vector because my host length will be of varying lengths.
Here is the outline of my structure:
USHORT version; // Header Version
USHORT type; // IPVersion
USHORT count; // Number of IP addresses of remote system
USHORT length; // Header Length (1)
BYTE SysConfigLocIP[4];
BYTE SysConfigRemoteIP[4];
USHORT lengthHost;
std::vector<BYTE>HostName;
later, after filling the structure I copy it to a byte like so:
BYTE Response[sizeof(aMsg)]
memcpy(response, &aMsg, sizeof(aMsg))
I find that my array is vector is holding the correct information for the host when I inspect the container during a debug. However, after the copy to the Response byte array, I'm finding the data that has been copied is drastically different. Is this a valid operation, if so, what can I do correctly copy the data from my vector the BYTE array. If not, what are other strategies I can use to dynamically size the structure to send the hostnames? Thank you for taking the moment of time to read my question, and I appreciate any feedback.
I'm working to copy the following structure to a byte array to send
over a named pipe.
named pipe (or other forms of inter-process or inter-processor communication) does not understand your struct, neither do they understand vector. They just operate on the concept of byte-in-byte-out. It is up to you, the programmer, to assign meaning to those bytes.
As suggested, please read on serialization. Try starting at http://en.wikipedia.org/wiki/Serialization. If permitted you can use the Boost solution, http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html, but I would still encourage you to understand the basics first.
As an exercise, first try transferring a vector<int> from sender to receiver. The number of elements in the vector must not be implicitly known by the receiver. Once you achieve that, migrating from int to your struct would be trivial.
That memcpy will only work for POD (plain old data) types. A vector is not POD. Instead, write code to put each byte in the buffer exactly where it needs to be. Don't rely on "magic".
99% of the time in C++ there is no reason to use memcpy. It breaks classes. Learn about copy constructors and std::copy and use them instead.

Why Serialization when a class object in memory is already binary (C/C++)?

My guess is that data is scattered in physical memory (even the data of a class object is sequential in virtual memory), so in order to send the data correctly it needs to be reassembled, and to be able to send over the network, one additional step is the transformation of host byte order to network byte order. Is it correct?
Proper serialization can be used to send data to arbitrary systems, that might not work under the same architecture as the source host.
Even an object that only consist of native types can be troublesome sharing between two systems because of the extra padding that might exists in between and after members, among other things. Sharing raw memory dumps of objects between programs compiled for the same architecture but with different compiler versions can also turn into a big hassle. There is no guarantee how variable type T actually is stored in memory.
If you are not working with pointers (references included), and the data is meant to be read by the same binary as it's dumped from, it's usually safe just to dump a raw struct to disk, but when sending data to another host.. drum roll serialization is the way to go.
I've heard developers talking about ntohl / htonl / ntohl / ntohs as methods of serializing/deserializing integers, and when you think about it saying that isn't that far from the truth.
The word "serialization" is often used to describe this "complicated method of storing data in a generic way", but then again; your first programming assignment where you were asked to save information about Dogs to file (hopefully*) made use of serialization, in some way or another.
* "hopefully" meaning that you didn't dump the raw memory representation of your Dog object to disk
Pointers!
If you've allocated memory on the heap you'll just end up with a serialised pointer pointing to an arbitrary area of memory. If you just have a few ints and chars then yes you can just write it out directly to a file, but that then becomes platform dependent because of the byte ordering that you mentioned.
Pointer and data pack(data align)
If you memcpy your object's memory, there is dangerous to copy a wild pointer value instead of it's data. There is another risk, if the sender and receiver have different data pack(data align) method, you will get rubbish after decoding.
Binary representations may be different between different architectures, compilers and even different versions of the same compiler. There's no guarantee that what system A sees as a signed integer will be seen as the same on system B. Byte ordering, word langths, struct padding etc will become hard to debug problems if you don't properly define the protocol or file format for exchanging the data.
Class (when we speak of C++) also includes virtual method pointers - and they must be reconstructed on receiving end.

Sending variable length arrays over a network?

In the game I'm making, I nee to be able to send std::vectors of integer over a network.
A packet seems to be made up entirely of a string. Since enet, the network libray im using takes care of endian, my first idea on solving this is to send a messsage where the first byte is the message id, as usual, the next 4 bytes would be an integer indicating the length of the array, and all subsequent bytes would be the ints in the array. On the client side I can then push these back into a vector.
Is this how it is usually done or am I missing something critical? Is there a better way to do it?
Thanks
In general, there are two approaches to solving this problem which can be combined. One is to put the length of the array before the actual array. The other involves including some framing to mark the end (or beginning) of your message. There are advantages and disadvantages to each.
Putting the length before the array is simplest. However, if there should ever be a bug where the length does not match the number of integers, there is no way to detect this or recover from it.
Using a framing byte(s) to mark the end of the message has the advantage of more robustness and the ability to recover from an improperly formatted message at the cost of complexity. The complexity comes in the fact that if your framing bytes appear in your array of integers, you must escape the bytes (i.e. prepend and escape character). Also, your code to read messages from the network becomes a little more complicated.
Of course, this all assumes that you are dealing stream and not a datagram. If your messages are clearly packetized already, then the length should be enough.
There are two ways to send variable length data on a stream: by prefixing the data with the length, or by suffixing it with a delimiter.
A suffix can have a theoretically infinite size, but it means the delimiter must not appear in the data. This approach can be used for strings, with a NUL ('\0') character as the delimiter.
When dealing with binary data, then you don't have any choice but to prefix the length. The size of the data will be limited to the the size of the prefix, which is rarely a problem with a 4 byte prefix (because otherwise it means you're sending more than 4 gigabytes of data).
So, it all depends on the data being sent.
Appending that header information to your packet is a good approach to take. Another option you can do on the receive side, if the data is stored in some unsigned char* buffer of memory, is create a structure like so:
typedef struct network_packet
{
char id;
int message_size;
int data[];
} __attribute__((packed)) network_packet;
You can then simply "overlay" this structure on-top of your received buffer like so:
unsigned char* buffer;
//...fill the buffer and make sure it's the right size -- endian is taken care of
//via library
network_packet* packet_ptr = (network_packet*)buffer;
//access the 10th integer in the packet if the packet_ptr->message_size
//is long enough
if (packet_ptr->message_size >= 10)
int tenth_int = packet_ptr->data[9];
This will avoid you having to go through the expense of copying all the data back, which already exists in a buffer, back into another std::vector on the receive side.

How to interpret binary data in C++?

I am sending and receiving binary data to/from a device in packets (64 byte). The data has a specific format, parts of which vary with different request / response.
Now I am designing an interpreter for the received data. Simply reading the data by positions is OK, but doesn't look that cool when I have a dozen different response formats. I am currently thinking about creating a few structs for that purpose, but I don't know how will it go with padding.
Maybe there's a better way?
Related:
Safe, efficient way to access unaligned data in a network packet from C
You need to use structs and or unions. You'll need to make sure your data is properly packed on both sides of the connection and you may want to translate to and from network byte order on each end if there is any chance that either side of the connection could be running with a different endianess.
As an example:
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
typedef struct {
unsigned int packetID; // identifies packet in one direction
unsigned int data_length;
char receipt_flag; // indicates to ack packet or keep sending packet till acked
char data[]; // this is typically ascii string data w/ \n terminated fields but could also be binary
} tPacketBuffer ;
#pragma pack(pop) /* restore original alignment from stack */
and then when assigning:
packetBuffer.packetID = htonl(123456);
and then when receiving:
packetBuffer.packetID = ntohl(packetBuffer.packetID);
Here are some discussions of Endianness and Alignment and Structure Packing
If you don't pack the structure it'll end up aligned to word boundaries and the internal layout of the structure and it's size will be incorrect.
I've done this innumerable times before: it's a very common scenario. There's a number of things which I virtually always do.
Don't worry too much about making it the most efficient thing available.
If we do wind up spending a lot of time packing and unpacking packets, then we can always change it to be more efficient. Whilst I've not encountered a case where I've had to as yet, I've not been implementing network routers!
Whilst using structs/unions is the most efficient approach in term of runtime, it comes with a number of complications: convincing your compiler to pack the structs/unions to match the octet structure of the packets you need, work to avoid alignment and endianness issues, and a lack of safety since there is no or little opportunity to do sanity checks on debug builds.
I often wind up with an architecture including the following kinds of things:
A packet base class. Any common data fields are accessible (but not modifiable). If the data isn't stored in a packed format, then there's a virtual function which will produce a packed packet.
A number of presentation classes for specific packet types, derived from common packet type. If we're using a packing function, then each presentation class must implement it.
Anything which can be inferred from the specific type of the presentation class (i.e. a packet type id from a common data field), is dealt with as part of initialisation and is otherwise unmodifiable.
Each presentation class can be constructed from an unpacked packet, or will gracefully fail if the packet data is invalid for the that type. This can then be wrapped up in a factory for convenience.
If we don't have RTTI available, we can get "poor-man's RTTI" using the packet id to determine which specific presentation class an object really is.
In all of this, it's possible (even if just for debug builds) to verify that each field which is modifiable is being set to a sane value. Whilst it might seem like a lot of work, it makes it very difficult to have an invalidly formatted packet, a pre-packed packets contents can be easilly checked by eye using a debugger (since it's all in normal platform-native format variables).
If we do have to implement a more efficient storage scheme, that too can be wrapped in this abstraction with little additional performance cost.
It's hard to say what the best solution is without knowing the exact format(s) of the data. Have you considered using unions?
I agree with Wuggy. You can also use code generation to do this. Use a simple data-definition file to define all your packet types, then run a python script over it to generate prototype structures and serialiation/unserialization functions for each one.
This is an "out-of-the-box" solution, but I'd suggest to take a look at the Python construct library.
Construct is a python library for
parsing and building of data
structures (binary or textual). It is
based on the concept of defining data
structures in a declarative manner,
rather than procedural code: more
complex constructs are composed of a
hierarchy of simpler ones. It's the
first library that makes parsing fun,
instead of the usual headache it is
today.
construct is very robust and powerful, and just reading the tutorial will help you understand the problem better. The author also has plans for auto-generating C code from definitions, so it's definitely worth the effort to read about.

C++ byte stream

For a networked application, the way we have been transmitting dynamic data is through memcpying a struct into a (void*). This poses some problems, like when this is done to an std::string. Strings can be dynamic length, so how will the other side know when the string ends? An idea I had was to use something similiar to Java's DataOuputStream, where I could just pass whatever variables to it and it could then be put into a (void*). If this can't be done, then its cool. I just don't really like memcpying a struct. Something about it doesn't seem quite right.
Thanks,
Robbie
nothing wrong with memcpy on a struct - as lng as the struct is filled with fixed-size buffers. Put a dynamic variable in there and you have to serialise it differently.
If you have a struct with std::strings in there, create a stream operator and use it to format a buffer. You can then memcpy that buffer to the data transport. If you have boost, use Boost::serialize which does all this for you (that link also has links to alternative serialization libs)
Notes: the usual way to pass a variable-size buffer is to begin by sending the length, then that many bytes of data. Occasionally you see data transferred until a delimiter is received (and fields within that data are delimited themselves by another character, eg a comma).
I see two parts of this question:
- serialization of data over a network
- how to pass structures into a network stack
To serialize data over a network, you'll need a protocol. Doesn't have to be difficult; for ASCII even a cr/lf as packet end may do. If you use a framework (like MFC), it may provide serialization functions for you; in that case you need to worry about how to send this in packets. A packetization which often works well for me is :
<length><data_type>[data....][checksum]
In this case the checksum is optional, and also zero-data is possible, for instance if the signal is carried in the data_type (i.e. Ack for acklnowedgement)
If you're working on the memcpy with structures, you'll need to consider that memcpy only makes a shallow copy. A pointer is worthless once transmitted over a network; instand you should transmit the data from that pointer (i.e. the contents of your string example)
For sending dynamic data across the network you have the following options.
First option in the same packet.
void SendData()
{
int size;
char payload[256];
Send(messageType)
Send(size);
Send(payload)
}
Second option:
void SendData()
{
char payload[256];
Send(messageType)
Send(payload)
}
Though in either situation, you will be faced with more of a design choice. In the first example you would send the message type, and the payload size and also then the payload.
The second option you have is you can send the message type and then you can send the string that has a delimiter of null terminator.
Though either option does not cover fully the problem your facing I think. Firstly, you need to determine if you're building a game what type of protocal you will be using, UDP? TCP? The second problem you will be facing is the maximum packet size. Then on top of that you need to have the framework in place so that you can calculate the optimum packet size that will not be fragmented and lost to the inter web. After that you have bandwidth control in regards to how much data you can transmitted and receive between the client and server.
For example the way that most games approach this situation is each packet is identified with the following.
MessageType
MessageSize
CRCCheckSum
MessageID
void buffer[payload]
In situation where you need to send dynamic data you would send a series of packets not just one. For example if you were to send a file accross the network the best option would to use TCP/IP because its a streaming protocal and it garnentees that the complete stream arrives safly to the other end. On the other hand UDP is a packet based protocal and is does not do any checking that all packets arrived in order or at all on the other end.
So in conclusion.
For dynamic data, send multiple packets but with a special flag
to say more data is to arrive to complete this message.
Keep it simple and if your working with C++ dont assume the packet or data
will contain a null terminator and check the size compared to the
payload if you decide to use a null terminator.