Best way to implement binary serial protocol in C++ [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm implementing a serial binary protocol with C++ as following format:
| sync word (0xAA) | sync word (0xBB) | message length (2 bytes) | device id (4 bytes) | message type (1 byte) | timestamp (4 bytes) | payload (N bytes) | crc (2 bytes) |
For binary operation, I can only come up C-style way. For example, use memcpy to extract each field with pre-defined offset:
void Parse(string data) {
bool found_msg_head = false;
int i=0;
for (; I<data.size()-1; i++) {
if (data[i] == 0xAA && data[i+1] == 0xBB) {
found_msg_head = true;
break;
}
}
if (found_msg_head) {
uint16_t msg_length;
uint32_t device_id;
// declare fields ...
memcpy(&msg_length, data.c_str()+i+2, 2);
memcpy(&device_id, data.c_str()+i+4, 4);
// memcpy to each field ...
// memcpy crc and validate...
}
}
void SendMsg(const MyMsg& msg)
{
const uint16_t msg_len = sizeof(msg) + 16;
const uint32_t dev_id = 0x01
const uint32_t timestamp = (uint32_t)GetCurrentTimestamp();
const uint8_t msg_type = 0xAB;
char buf[msg_len];
buf[0] = 0xAA;
buf[1] = 0xBB;
memcpy(buf + 2, &msg_len, sizeof(msg_len));
memcpy(buf + 4, &dev_id, sizeof(dev_id));
memcpy(buf + 8, &timestamp, sizeof(timestamp));
memcpy(buf + 12, &msg_type, sizeof(msg_type));
memcpy(buf + 13, &g_msg_num, sizeof(g_msg_num));
memcpy(buf + 14, &msg, sizeof(msg));
uint16_t crc = CalculateCrc16((uint8_t*)buf, sizeof(DevSettingMsg) + 14);
memcpy(buf + sizeof(buf) - 2, &crc, 2);
std::string str(buf, sizeof(buf));
// send str to serial port or other end points...
}
Is there any better way to implement this kind of protocols with C or C++?
If I want to keep my C++ code in C++ style (i.e. use C++ STL only, no memcpy, no & for variable address), what's the equivalent of above C, C++ mixed code in C++ only?
Thanks.

The binary communication protocols are notorious for requiring lots of error-prone boilerplate code. I strongly recommend looking into existing libraries and/or code generation solutions, the CommsChampion Ecosystem could be a good fit.

I recommend documenting your protocol as well as possible.
The X11 protocol specification could be inspirational.
Did you consider re-using some existing communication protocol e.g. HTTP ? Do you care about communication between heterogenous systems and machines with different endianness and word sizes (e.g. a RaspberryPI communicating with an Arduino or a Linux/PowerPC or a Linux/x86-64 server)?
You could use C++ code generators like SWIG, or C code generators like RPCGEN, or in some cases write your own C++ code generator (perhaps with the help of GNU m4 or of GPP or GNU autoconf)
You may use (or adapt) existing C++ frameworks like VMIME (which implement mail related protocols)
You could use existing Web service protocols (HTTP like) with libraries like libonion or Wt.

Related

How does .Byte[] function on a specific byte? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am working on the following lines of code:
#define WDOG_STATUS 0x0440
#define ESM_OP 0x08
and in a method of my defined class I have:
bool WatchDog = 0;
bool Operational = 0;
unsigned char i;
ULONG TempLong;
unsigned char Status;
TempLong.Long = SPIReadRegisterIndirect (WDOG_STATUS, 1); // read watchdog status
if ((TempLong.Byte[0] & 0x01) == 0x01)
WatchDog = 0;
else
WatchDog = 1;
TempLong.Long = SPIReadRegisterIndirect (AL_STATUS, 1);
Status = TempLong.Byte[0] & 0x0F;
if (Status == ESM_OP)
Operational = 1;
else
Operational = 0;
What SPIReadRegisterInderect() does is, it takes an unsigned short as Address of the register to read and an unsigned char Len as number of bytes to read.
What is baffling me is Byte[]. I am assuming that this is a method to separate some parts of byte from the value in Long ( which is read from SPIReadRegisterIndirect ). but why is that [0]? shouldn't it be 1? and how is that functioning? I mean if it is isolating only one byte, for example for the WatchDog case, is TempLong.Byte[0] equal to 04 or 40? (when I am printing the value before if statement, it is shown as 0, which is neither 04 nor 40 from WDOG_STATUS defined register.)
Please consider that I am new to this subject. and I have done google search and other searchs but unfortunatly I could not found what I wanted. Could somebody please help me to understand how this syntax works or direct me to a documentation where I can read about it?
Thank you in advance.
Your ULONG must be defined somewhere.
Else you'd get the syntax error 'ULONG' does not name a type
Probably something like:
typedef union {unsigned long Long; byte Byte[4];} ULONG;
Check union ( and typedef ) in your C / C++ book, and you'll see that
this union helps reinterpreting the long variable as an array of bytes.
Byte[0] is the first byte. Depending on the hardware (avr controllers are little endian) it's probably the LSB (least significant byte)

Is it necessary to cast individual indices of a pointer after you cast the whole pointer in c? Why? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
In the code below the address of ip is casted to uint8_t *.
But below again each index of the casted pointer is casted to uint8_t.
Why the programmer has done this? Does it make a difference if we remove all those casts that come after the initial cast? This code converts an IPv4 IP Address to an IP Number. Thank you
uint32_t Dot2LongIP(char* ipstring)
{
uint32_t ip = inet_addr(ipstring);
uint8_t *ptr = (uint8_t *) &ip;
uint32_t a = 0;
if (ipstring != NULL) {
a = (uint8_t)(ptr[3]);
a += (uint8_t)(ptr[2]) * 256;
a += (uint8_t)(ptr[1]) * 256 * 256;
a += (uint8_t)(ptr[0]) * 256 * 256 * 256;
}
return a;
}
Why the programmer has done this?
Ignorance, fear, or other incompetence.
The type of ptr is uint8_t *, so the type of ptr[i] is uint8_t. Converting a uint8_t to a uint8_t has no effect. Also, putting it in parentheses has no effect.
Does it make a difference if we remove all those casts that come after the initial cast?
Yes, it makes the code smaller and clearer. It has no effect on the program semantics.
This code converts an IPv4 IP Address to an IP Number.
No, it does not, not correctly; the code is broken.
When the uint8_t value is used in multiplication with 256, the usual arithmetic conversions are applied. These promote the uint8_t to int, and then the result of the * operator is an int. For ptr[0], as two more multiplications by 256 are performed, the result remains an int. Unfortunately, if the high bit (bit 7) of ptr[0] is set, these multiplications overflow a 32-bit int. Then the behavior of the program is not defined by the C standard.
To avoid this, the value should have been cast to uint32_t. (This speaks only to getting the arithmetic correct; I make no assertion about the usefulness of taking apart an in_addr_t returned by inet_addr and reassembling it in this way.)
I suspect that is Arduino code :)
uint32_t Dot2LongIP(char* ipstring)
{
uint32_t ip = inet_addr(ipstring);
return return nothl(ip);
}
or if you do not want to use htonl
uint32_t Dot2LongIP(char* ipstring)
{
uint32_t ip = inet_addr(ipstring);
ip = ((ip & 0xff000000) >> 24) | ((ip & 0x00ff0000) >> 8) |
((ip & 0x0000ff00) << 8) | ((ip & 0x000000ff) << 24);
return ip;
}
Dereferencing any index of ptr (i.e., ptr[0], ptr[1], etc) will have the type uint8_t. The casting performed on them is redundant.

Writing a program for a computer that uses Litttle or Big endian. And have the same result [duplicate]

This question already has answers here:
Detecting endianness programmatically in a C++ program
(29 answers)
Closed 2 years ago.
This question is about endian's.
Goal is to write 2 bytes in a file for a game on a computer. I want to make sure that people with different computers have the same result whether they use Little- or Big-Endian.
Which of these snippet do I use?
char a[2] = { 0x5c, 0x7B };
fout.write(a, 2);
or
int a = 0x7B5C;
fout.write((char*)&a, 2);
Thanks a bunch.
From wikipedia:
In its most common usage, endianness indicates the ordering of bytes within a multi-byte number.
So for char a[2] = { 0x5c, 0x7B };, a[1] will be always 0x7B
However, for int a = 0x7B5C;, char* oneByte = (char*)&a; (char *)oneByte[0]; may be 0x7B or 0x5C, but as you can see, you have to play with casts and byte pointers (bear in mind the undefined behaviour when char[1], it is only for explanation purposes).
One way that is used quite often is to write a 'signature' or 'magic' number as the first data in the file - typically a 16-bit integer whose value, when read back, will depend on whether or not the reading platform has the same endianness as the writing platform. If you then detect a mismatch, all data (of more than one byte) read from the file will need to be byte swapped.
Here's some outline code:
void ByteSwap(void *buffer, size_t length)
{
unsigned char *p = static_cast<unsigned char *>(buffer);
for (size_t i = 0; i < length / 2; ++i) {
unsigned char tmp = *(p + i);
*(p + i) = *(p + length - i - 1);
*(p + length - i - 1) = tmp;
}
return;
}
bool WriteData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t magic = 0xAB12; // Something that can be tested for byte-reversal
if (fwrite(&magic, sizeof(uint16_t), 1, file) != 1) return false;
if (fwrite(data, size, num, file) != num) return false;
return true;
}
bool ReadData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t test_magic;
bool is_reversed;
if (fread(&test_magic, sizeof(uint16_t), 1, file) != 1) return false;
if (test_magic == 0xAB12) is_reversed = false;
else if (test_magic == 0x12AB) is_reversed = true;
else return false; // Error - needs handling!
if (fread(data, size, num, file) != num) return false;
if (is_reversed && (size > 1)) {
for (size_t i = 0; i < num; ++i) ByteSwap(static_cast<char *>(data) + (i*size), size);
}
return true;
}
Of course, in the real world, you wouldn't need to write/read the 'magic' number for every input/output operation - just once per file, and store the is_reversed flag for future use when reading data back.
Also, with proper use of C++ code, you would probably be using std::stream arguments, rather than the FILE* I have shown - but the sample I have posted has been extracted (with only very little modification) from code that I actually use in my projects (to do just this test). But conversion to better use of modern C++ should be straightforward.
Feel free to ask for further clarification and/or explanation.
NOTE: The ByteSwap function I have provided is not ideal! It almost certainly breaks strict aliasing rules and may well cause undefined behaviour on some platforms, if used carelessly. Also, it is not the most efficient method for small data units (like int variables). One could (and should) provide one's own byte-reversal function(s) to handle specific types of variables - a good case for overloading the function with different argument types).
Which of these snippet do I use?
The first one. It has same output regardless of native endianness.
But you'll find that if you need to interpret those bytes as some integer value, that is not so straightforward. char a[2] = { 0x5c, 0x7B } can represent either 0x5c7B (big endian) or 0x7B5c (little endian). So, which one did you intend?
The solution for cross platform interpretation of integers is to decide on particular byte order for the reading and writing. De-facto "standard" for cross platform data is to use big endian.
To write a number in big endian, start by bit-shifting the input value right so that the most significant byte is in the place of the least significant byte. Mask all other bytes (technically redundant in first iteration, but we'll loop back soon). Write this byte to the output. Repeat this for all other bytes in order of significance.
This algorithm produces same output regardless of the native endianness - it will even work on exotic "middle" endian systems if you ever encounter one. Writing to little endian is similar, but in reverse order.
To read a big endian value, read the first byte of input, shift it left so that it goes to the place of most significant byte. Combine the shifted byte with the result (initially zero) using bitwise-or. Repeat with the next byte by shifting to the second most significant place and so on.
to know the Endianess of a computer?
To know endianness of a system, you can use std::endian in the upcoming C++20. Prior to that, you can use implementation specific macros from endian.h header. Or you can do a simple calculation like you suggest.
But you never really need to know the endianness of a system. You can simply use the algorithms that I described, which work on systems of all endianness without having to know what that endianness is.

computing a 8 bit checksum in c++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have the following setup where I am trying to write a custom file header. I have the fields which are described as follows:
// Assume these variables are initialized.
unsigned short x, y, z;
unsigned int dl;
unsigned int offset;
// End assume
u_int8_t major_version = 0x31;
u_int8_t minor_version = 0x30;
u_int_32_t data_offset = (u_int_32_t)offset;
u_int16_t size_x = (u_int16_t)x;
u_int16_t size_y = (u_int16_t)y;
u_int16_t size_z = (u_int16_t)z;
u_int_32_t data_size = (u_int_32_t)dl;
Now, what I would like to do is compute an 8 bit header checksum from the fields starting from major revision to the data_size variables. I am fairly new to this and something simple would suffice for my current needs.
I'm not sure what you are trying to achieve, but to strictly answer your question, a 8-bit checksum is usually computed as
sum_of_all_elements %255
Simply put, add all the elements together, and sum % 255 is your checksum.
Watch out for overflows when doing the addition (you can compute partial sums if you run into trouble).
On a side note, a 8-bit checksum is not that good - It won't help you distinguish all cases and it won't help with data recovery; that's why a 16-bit checksum is usually preferred.

Appropriate hashing function to hash random binary strings

i have an two arrays : char data1[length] where length is a multiple of 8 i.e length can be 8, 16,24 ... The array contains binary data read from a file that is open in binary mode. I will keep reading from the file and everytime i read i will store the read value in a hash table. The disterbution of this binary data has a random distribution. I would like to hash each array and store them in a hash table in order to be able to look for the char with the specific data again. What would be a good hashing function to achive this task. Thanks
Please note that i am writing this in c++ and c so any language you choose to provide a solution for would be great.
If the data that you read is 8 bytes long and really distributed randomly, and your hashcode needs to be 32 bits, what about this:
uint32_t hashcode(const unsigned char *data) {
uint32_t hash = 0;
hash ^= get_uint32_le(data + 0);
hash ^= get_uint32_le(data + 4);
return hash;
}
uint32_t get_uint32_le(const unsigned char *data) {
uint32_t value = 0;
value |= data[0] << 0;
value |= data[1] << 8;
value |= data[2] << 16;
value |= data[3] << 24;
return value;
}
If you need more speed, this code can probably made a lot faster if you can guarantee that data is always properly aligned to be interpreted as an const uint32_t *.
I have successfully used MurmurHash3 in one of my projects.
Pros:
It is fast. Very fast.
It supposedly has a low collision rate.
Cons:
It's not suitable for cryptography applications.
It's not standardized in any shape or form.
It's not portable to non-x86 platforms. However, it's small enough that you should be able to port it if you really need to - I was able to port it to Java, although that's not nearly the same thing.
It's a good possibility for use in e.g. a fast hash-table implementation...