Combine multiple variables to send as a udp packet - c++

I need to send a packet of data over a UDP connection in C++. The first message I need to send is built up of two 32 bit integers and a 64 bit integer. What is the best way to combine multiple variable types into one block of data ready for sending over a UDP connection?

It depends on the requirements for your network. Do you care about endianness? If you do, you should use not just any serialisatioin, but a safe one in regards to endianness.
Generally, each class/struct sendable through the network should have special methods or overloaded operators to stream them in and out. Ultimately you'll have to use macros/functions like hton/ntoh for streaming primitive types eg int, int64, float, double etc.
Upd: also if your network endpoint applications run on different platforms/compilers, you may have different sizes of int, long, short etc. So when serialising, you'll have to convert your integers to some predefined types with sizes guaranteed to be the same on all supported platforms.

I wrote a DNS resolver by hand in C, and structs is indeed the way I did it. Use bit fields to specify sizes of each piece. More about bit fields: http://msdn.microsoft.com/en-us/library/ewwyfdbe.aspx
Make sure to use hton/ntoh to take care of byte order. More information here; http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html
In fact, peruse beej's guide -- mucho useful information there!

Related

endianness influence in C++ code

I know that this might be a silly question, but I am a newbie C++ developer and I need some clarifications about the endianness.
I have to implement a communication interface that relies on SCTP protocol in order to communicate between two different machines (one ARM based, and the other Intel based).
The aim is to:
encode messages into a stream of bytes to be sent on the socket (I used a vector of uint8_t, and positioned each byte of the different fields -taking care of splitting uint16/32/64 to single bytes- following big-endian convention)
send the bytestream via socket to the receiver (using stcp)
retrieve the stream and parse it in order to fill the message object with the correct elements (represented by header + TV information elements)
I am confused on where I could have problem with the endianness of the underlying architecture of the 2 machines in where the interface will be used.
I think that taking care of splitting objects into single bytes and positioning them using big-endian can preclude that, at the arrival, the stream is represented differently, right? or am I missing something?
Also, I am in doubt about the role of C++ representation of multiple-byte variables, for example:
uint16_t var=0x0123;
//low byte 0x23
uint8_t low = (uint8_t)var;
//hi byte 0x01
uint8_t hi = (uint8_t)(var >> 8);
This piece of code is endianness dependent or not? i.e. if I work on a big-endian machine I suppose that the above code is ok, but if it is little-endian, will I pick up the bytes in different order?
I've searched already for such questions but no one gave me a clear reply, so I have still doubts on this.
Thank you all in advance guys, have a nice day!
This piece of code is endianness dependent or not?
No the code doesn't depend on endianess of the target machine. Bitwise operations work the same way as e.g. mathematical operators do.
They are independent of the internal representation of the numbers.
Though if you're exchanging data over the wire, you need to have a defined byte order known at both sides. Usually that's network byte ordering (i.e. big endian).
The functions of the htonx() ntohx() family will help you do en-/decode the (multibyte) numbers correctly and transparently.
The code you presented is endian-independent, and likely the correct approach for your use case.
What won't work, and is not portable, is code that depends on the memory layout of objects:
// Don't do this!
uint16_t var=0x0123;
auto p = reinterpret_cast<char*>(&var);
uint8_t hi = p[0]; // 0x01 or 0x23 (probably!)
uint8_t lo = p[1]; // 0x23 or 0x01 (probably!)
(I've written probably in the comments to show that these are the likely real-world values, rather than anything specified by Standard C++)

What is the data type of content sent over socket?

When using Berkeley socket api, what is the data type of content that is sent over the read/send or write/recv calls? For example -
char *msg = "Our Message!";
int len, bytes_sent;
len = strlen(msg);
bytes_sent = send(sockfd, msg, len, 0);
in this code, we are using char type, but are we limited to just char type since send/write/sendto usually take void * type. I've also seen arguments like if we send some int, it might actually be stored in little endian/big endian causing problems b/w source/dest if their endianess don't match. Then why doesn't char type suffers from this problem too?
Also different languages like C and C++ have different size of char too, then why isn't this a problem? If socket doesn't care any type and just sees the content as buffer, why don't we see random corruption of data when different tcp servers/clients are written in different languages and communicate with each other?
In short, what values(type) can I send safely through sockets?
You cannot safely send anything through a raw socket and expect the receiver to make sense of it. For example, the sending process might be on a machine where the character encoding is EBCDIC, and the receiving process might be on a machine where the character encoding was ASCII. It's up to the processes to either negotiate a protocol to sort this out, or to simply say in their specifications "We are using ASCII (or whatever)".
Once you have got the character encodings worked out, transmit the data in text is my advice. This avoids all endian problems, and is easier to debug and log.
The simplest answer is that the data is an uninterpreted stream of octets, that is to say 8-bit bytes. Any interepretation of it is done by the sender and receiver, and they better agree. You certainly need to take both the size and endianness of integers into account, and compiler alignment and padding rules too. This is why for example you should not use C structs as network protocols.

C++ Byte order in socket programming

In C++ we send data using socket on the network. I am aware that we need to use htons() , ntohs() function to maintain byte order big endian and little endian.
support we have following data to be sent
int roll;
int id;
char name[100];
This can also be wrapped into struct.
My confusion here is, for roll and id, we can use htons() function. But for the string name, what should and how should we do it? Do we need to use any such function? will it work on every machine like mac, intel and other network?
I want to send all three fields in one packet.
You'd use htonl for int, not htons.
The name doesn't need to be reordered, since the bytes of the array correspond directly to bytes on the network.
The issue of byte-order only arises with words larger than a byte, since different architectures choose different ends at which to place the least-significant byte.
For char arrays this conversion is not necessary since they do not have a network byte order but are sequentially transmitted. The reason that ntohs and htons exist, is that some data types consist of lesser and more significant bits, which are interpreted differently on different architectures. This is not the case in strings.
To add to helpful comments here - if your structs get much more complex you could be better off considering a serialization library like Boost.Serialization or Google Protocol Buffers, which handle endianness for you under the covers.
When encoding the string, make sure you send a length (probably a short handled using htons) before the string itself, don't just send 100 chars every time.

Can marshalling or packing be implemented by unions?

In beej's guide to networking there is a section of marshalling or packing data for Serialization where he describes various functions for packing and unpacking data (int,float,double ..etc).
It is easier to use union(similar can be defined for float and double) as defined below and transmit integer.pack as packed version of integer.i, rather than pack and unpack functions.
union _integer{
char pack[4];
int i;
}integer;
Can some one shed some light on why union is a bad choice?
Is there any better method of packing data?
Different computers may lay the data out differently. The classic issue is endianess (in your example, whether pack[0] has the MSB or LSB). Using a union like this ties the data to the specific representation on the computer that generated it.
If you want to see other ways to marshall data, check out the Boost serialization and Google protobuf.
The union trick is not guaranteed to work, although it usually does. It's perfectly valid (according to the standard) for you to set the char data, and then read 0s when you attempt to read the int, or vice-versa. union was designed to be a memory micro-optimization, not a replacement for casting.
At this point, usually you either wrap up the conversion in a handy object or use reinterpret_cast. Slightly bulky, or ugly... but neither of those are necessarily bad things when you're packing data.
Why not just do a reinterpret_cast to a char* or a memcpy into a char buffer? They're basically the same thing and less confusing.
Your idea would work, so go for it if you want, but I find that clean code is happy code. The easier it is to understand my work, the less likely it is that someone (like my future self) will break it.
Also note that only POD (plain old data) types can be placed in a union, which puts some limitations on the union approach that aren't there in a more intuitive one.

How to use protocol buffers?

Could someone please help and tell me how to use protocol buffers. Actually I want to exchange data through sockets between a program running on unix and anoother running on windows in order to run simulation studies.
The programs that use sockets to exchange data, are written in C/C++ and I would be glad if somneone could help me to use protocol buffers in order to exchange data in the form of :
struct snd_data{
char *var="temp";
int var1=1;
float var2;
double var2;
}
I tried several ways, but still data are not exchanged correctly. Any help would be very appreciated
Thanks for your help,
You start by defining your message in a .proto file:
package foo;
message snd_data {
required string var= 1;
required int32 var1 = 2;
optional float var2 = 3;
optional double var3 = 4;
}
(I guess the float and double actually are different variables...)
Then you compile it using protoc and then you have code implementing your buffer.
For further information see: http://code.google.com/apis/protocolbuffers/docs/cpptutorial.html
How are you writing your messages to the socket? Protobufs is not endian-sensitive itself, but neither does protobufs define a transport mechanism -- protobuf defines a mapping between a message and its serialized form (which is a sequence of (8-bit) bytes) and it is your responsibility to transfer this sequence of bytes to the remote host.
In our case, we define a very simple transport protocol; first we write the message size as an 32-bit integer (big endian), then comes the message itself. (Also remember that protobuf messages are not self-identifying, which means that you need to know which message you are sending. This is typically managed by having a wrapper message containing optional fields for all messages you want to send. See the protobuf website and mailing list archives for more info about this technique.)
Endianess is handled within protobuf.
See:
https://groups.google.com/forum/?fromgroups#!topic/protobuf/XbzBwCj4yL8
How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?
Are both machines x86? Otherwise you need to watch for big endian and little endian differences. Its also worth paying attention to struct packing. Passing pointer can be problematic too due to the fact pointer are different sizes on different platforms. All in there is far too little information in your post to say, for certain, what is going wrong ...
The answer lies in the endianess of the data being transmitted, this is something you need to consider very carefully and check. Look here to show what endianness can do and cause data to get messed up on both the receiver and sender. There is no such perfect measure of transferring data smoothly, just because data sent from a unix box guarantees the data on the windows box will be in the same order in terms of memory structure for the data. Also the padding of the structure on the unix box will be different to the padding on the windows box, it boils down to how the command line switches are used, think structure alignment.