Managing buffers of binary data in C++

Managing buffers of binary data in C++ - c++

I'm trying to create a simple binary format to transmit over a BlueToothLE module on Arduino. I'm trying to describe properties of a list of objects. And for starters I'm trying to transmit just a single property.
The format I'm attempting to encode and pass around is as follows.
namePropertyID, nameLength, nameString...
So given a name of "Bob"
0x01 0x03 0x42 0x6F 0x62
nameID 3 chars "B" "o" "b"
But when I pass the buffer around, it seems to mutate.
before I pass it, it reads:
0x01 0x03 0x42 0x6F 0x62
But after I pass it, it reads:
0x00 0x3C 0x18 0x04 0x00
Program.h
typedef enum {
InfoTypeName = 0x01
} InfoType;
class Program {
public:
char *name;
uint8_t * data();
uint8_t dataLen();
};
Program.cpp
#include "Program.h"
uint8_t* Program::data() {
uint8_t nameLength = strlen(name);
uint8_t buff[dataLen()];
buff[0] = InfoTypeName;
buff[1] = nameLength;
for (uint8_t i = 0; i < nameLength; i++) {
buff[i+2] = (uint8_t)name[i];
}
// First check of data, things look ok.
for (uint8_t i = 0; i < nameLength+2; i++) {
Serial.print(F(" 0x")); Serial.print(buff[i], HEX);
}
Serial.println();
return buff;
}
uint8_t Program::dataLen() {
return strlen(name) + 2;
}
Elsewhere I pass this to the Bluetooth library:
BTLEserial.write(program.data(), program.dataLen());
Which is implemented like so, and is printing out seemingly incorrect data:
size_t Adafruit_BLE_UART::write(uint8_t * buffer, uint8_t len)
{
Serial.print(F("\tWriting out to BTLE:"));
for (uint8_t i=0; i<len; i++) {
Serial.print(F(" 0x")); Serial.print(buffer[i], HEX);
}
Serial.println();
// actually sends the data over bluetooth here...
}
So a few questions:
Why the data mutating?
Is this a good approach for generating buffers?
Is having two separate methods, one for length and one for data a good pattern?

The problem is that in Program::data(), buff is a local variable. You are returning a pointer to its first element, which is a dangling pointer at the call side. You need to ensure the buffer you export is something that stays alive for long enough. There are different ways of doing this, but I am not entirely familiar with the limitations arduino places on what parts of the C and C++ standard libraries you can use.
The simplest approach could be to reserve a buffer in main, and pass that around to the code that populates it and consumes it. Alternatively, you could give your Program class a buffer data member. The main problem is going to be ensuring that the buffer is large enough for different messages.
I would first try something like this:
void create_msg_(const char* name, uint8_t buff, size_t size)
{
// populate buff with the message
}
void send_msg(const char* name)
{
size_t size = strlen(name) + 2;
uint8_t buff[size]; // VLA extension, not std C++
create_msg_(name, buff, size);
BTLEserial.write(buff, size);
}

Related

Subsetting char array without copying it in C++

I have a long array of char (coming from a raster file via GDAL), all composed of 0 and 1. To compact the data, I want to convert it to an array of bits (thus dividing the size by 8), 4 bytes at a time, writing the result to a different file. This is what I have come up with by now:
uint32_t bytes2bits(char b[33]) {
b[32] = 0;
return strtoul(b,0,2);
}
const char data[36] = "00000000000000000000000010000000101"; // 101 is to be ignored
char word[33];
strncpy(word,data,32);
uint32_t byte = bytes2bits(word);
printf("Data: %d\n",byte); // 128
The code is working, and the result is going to be written in a separate file. What I'd like to know is: can I do that without copying the characters to a new array?
EDIT: I'm using a const variable here just to make a minimal, reproducible example. In my program it's a char *, which is continually changing value inside a loop.

Yes, you can, as long as you can modify the source string (in your example code you can't because it is a constant, but I assume in reality you have the string in writable memory):
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
// You would need to make sure that the `data` argument always has
// at least 33 characters in length (the null terminator at the end
// of the original string counts)
char temp = data[32];
data[32] = 0;
uint32_t byte = bytes2bits(data);
data[32] = temp;
printf("Data: %d\n",byte); // 128
}

In this example by using char* as a buffer to store that long data there is not necessary to copy all parts into a temporary buffer to convert it to a long.
Just use a variable to step through the buffer by each 32 byte length period, but after the 32th byte there needs the 0 termination byte.
So your code would look like:
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
int dataLen = strlen(data);
int periodLen = 32;
char* periodStr;
char tmp;
int periodPos = periodLen+1;
uint32_t byte;
periodStr = data[0];
while(periodPos < dataLen)
{
tmp = data[periodPos];
data[periodPos] = 0;
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
data[periodPos] = tmp;
periodStr = data[periodPos];
periodPos += periodLen;
}
if(periodPos - periodLen <= dataLen)
{
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
}
}
Please than be careful to the last period, which could be smaller than 32 bytes.

const char data[36]
You are in violation of your contract with the compiler if you declare something as const and then modify it.
Generally speaking, the compiler won't let you modify it...so to even try to do so with a const declaration you'd have to cast it (but don't)
char *sneaky_ptr = (char*)data;
sneaky_ptr[0] = 'U'; /* the U is for "undefined behavior" */
See: Can we change the value of an object defined with const through pointers?
So if you wanted to do this, you'd have to be sure the data was legitimately non-const.

The right way to do this in modern C++ is by using std::string to hold your string and std::string_view to process parts of that string without copying it.
You can using string_view with that char array you have though. It's common to use it to modernize the classical null-terminated string const char*.

Build struct element with correct size

I am intercepting some packets, and then put them into an structure.
#pragma pack(push, 1)
struct PacketHeader {
short Size;
short Checksum;
short Index;
};
#pragma pack(pop)
I have a packet with PacketHeader and some other bytes that fill this structure:
struct STRUCT_SVC_ROOM_CREATE {
PacketHeader Header;
unsigned char TitleLength; // 1 byte
char* RoomTitle[23];
short Unknow;
short Unknow2;
short Password1;
short Password2;
char LastByte;
};
In the above struct, TitleLength is one byte, that in decimal can 0x17 (23) or any number. This number the numbers of chars contained in RoomTitle.
I need to set size of RoomTitle accortng to TitleLenght byte (as decimal number).
How could I modify the struct to handle the text size in the right location inside the struct?

You should do something like follows, to parse the RoomTitle from the packet received at your socket:
struct STRUCT_SVC_ROOM_CREATE {
PacketHeader Header; // Header length is sizeof(PacketHeader)
unsigned char TitleLength; // 1 byte
char RoomTitle[255]; // I suspect you don't have 23 `RoomTitle[23];` char*
// pointers at this point, but just a char* maximally
// sized as the maximum number that TitleLength can hold
// (which is 255).
short Unknow; // Unknow length is sizeof(short)
short Unknow2; // ... ditto aso.
short Password1;
short Password2;
char LastByte;
};
As I pointed out in the code comments above
Read the PacketHeader (take care of Size and CRC endianess!)
Read the payload data according to PacketHeader::Size from the packet into another buffer. (Consider to check the CRC)
Read the TitleLength and RoomTitle from the payload data accordingly. Take care, if you want to handle the RoomTitle data as a c-style string, it's actually terminated with '\0'. Also use the TitleLength information when copying elsewhere.
Read the data with well known size coming after (take care of endianess again)
Some pseudo code (not tested):
int recv_ROOM_CREATE_packet(int sockfd, STRUCT_SVC_ROOM_CREATE* packet) {
read(sockfd,&(packet->Header),sizeof(PacketHeader));
read(sockfd,&(packet->TitleLength),sizeof(unsigned char));
read(sockfd,packet->RoomTitle,packet->TitleLength);
// ensure that packet->RoomTitle is a correctly terminated c-style string
if(packet->TitleLength < 255) {
packet->RoomTitle[packet->TitleLength + 1] = `\0`;
}
else {
packet->RoomTitle[254] = `\0`;
}
// aso ...
}

array in C/C++ for AVR keeps appending

In C, I have an array waiting to receive bytes from a sensor, save in a buffer, and then print out like this:
unsigned char responseFrame[300];
int main(void) {
UART_init();
while(1) {
receive(responseFrame);
myLog(responseFrame, sizeof(responseFrame));
}
}
I populate the array by doing the following:
void receive(unsigned char *rcv_buff) {
uint8_t recv_data;
for (int i=0; i<300; i++){
USART1_Flush();
rcv_buff[i] = USART1_RX();
}
}
Then I print out what's in the buffer using the following:
// Logs this output to the serial port; used for debugging
void myLog(unsigned char *msg, int size) {
for (int i=0; i<size; i++) {
USART0_TX(msg[i]);
}
}
This prints out the array, but when another iteration of bytes is received, everything is appended so let's say I receive {0xFF, 0xFF} first my output for the first iteration is:
0xFF 0xFF, 0x00, 0x00 ... 0x00
But upon the next iteration let's say {0x0A, 0x0A} is received instead, in the output I see this:
0xFF, 0xFF, 0x0A, 0x0A, 0x00, 0x00 ... 0x00
NOTE: The ellipses is just saying there's more 0x00s which are printed out until we basically reach the size of the array.
Why is this appending and not overwriting from the start of the array?
Here's my USART0_TX and USART1_RX functions:
void USART0_TX(uint8_t myData) {
// Wait if a byte is being transmitted
while( !(UCSR0A & (1<<UDRE0)) );
// Transmit data
UDR0 = myData;
};
uint8_t USART1_RX(void) {
// Wait until recv buffer is full
while( !(UCSR1A & (1<<RXC1)) );
// Return recvd data
return UDR1;
};
Here's the code I'm using to flush my USART1 RX:
//USART1 flush, clears USART1 buffer
void USART1_Flush( void )
{
unsigned char dummy;
while ( UCSR1A & (1<<RXC1) ) dummy = UDR1;
}

I believe your function called "Flush" is really a "Poll" function, looking for a character to appear (this would be normal usage, wait for the char to appear). The logic where you use RXC1 appears inverted. Try looking at this quite professional looking AVR driver (that has a poll option just as you are doing):
usart.c
Another nicely commented polled driver (and small with lots of comments): avr uart driver

Typecasting from byte[] to struct

I'm currently working on a small C++ project where I use a client-server model someone else built. Data gets sent over the network and in my opinion it's in the wrong order. However, that's not something I can change.
Example data stream (simplified):
0x20 0x00 (C++: short with value 32)
0x10 0x35 (C++: short with value 13584)
0x61 0x62 0x63 0x00 (char*: abc)
0x01 (bool: true)
0x00 (bool: false)
I can represent this specific stream as :
struct test {
short sh1;
short sh2;
char abc[4];
bool bool1;
bool bool2;
}
And I can typecast it with test *t = (test*)stream; However, the char* has a variable length. It is, however, always null terminated.
I understand that there's no way of actually casting the stream to a struct, but I was wondering whether there would be a better way than struct test() { test(char* data) { ... }} (convert it via the constructor)

This is called Marshalling or serialization.
What you must do is read the stream one byte at a time (or put all in a buffer and read from that), and as soon as you have enough data for a member in the structure you fill it in.
When it comes to the string, you simply read until you hit the terminating zero, and then allocate memory and copy the string to that buffer and assign it to a pointer in the struct.
Reading strings this way is simplest and most effective if you have of the message in a buffer already, because then you don't need a temporary buffer for the string.
Remember though, that with this scheme you have to manually free the memory containing the string when you are done with the structure.

Just add a member function that takes in the character buffer(function input parameter char *) and populates the test structure by parsing it.
This makes it more clear and readable as well.
If you provide a implicit conversion constructor then you create a menace which will do the conversion when you least expect it.

When reading variable length data from a sequence of bytes,
you shouldn't fit everything into a single structure or variable.
Pointers are also used to store this variable length.
The following suggestion, is not tested:
// data is stored in memory,
// in a different way,
// NOT as sequence of bytes,
// as provided
struct data {
short sh1;
short sh2;
int abclength;
// a pointer, maybe variable in memory !!!
char* abc;
bool bool1;
bool bool2;
};
// reads a single byte
bool readByte(byte* MyByteBuffer)
{
// your reading code goes here,
// character by character, from stream,
// file, pipe, whatever.
// The result should be true if not error,
// false if cannot rea anymore
}
// used for reading several variables,
// with different sizes in bytes
int readBuffer(byte* Buffer, int BufferSize)
{
int RealCount = 0;
byte* p = Buffer;
while (readByte(p) && RealCount <= BufferSize)
{
RealCount++
p++;
}
return RealCount;
}
void read()
{
// real data here:
data Mydata;
byte MyByte = 0;
// long enough, used to read temporally, the variable string
char temp[64000];
// fill buffer for string with null values
memset(temp, '\0', 64000);
int RealCount = 0;
// try read "sh1" field
RealCount = (readBuffer(&(MyData.sh1), sizeof(short)));
if (RealCount == sizeof(short))
{
// try read "sh2" field
RealCount = readBuffer(&(MyData.sh2), sizeof(short));
if (RealCount == sizeof(short))
{
RealCount = readBuffer(temp, 64000);
if (RealCount > 0)
{
// store real bytes count
MyData.abclength = RealCount;
// allocate dynamic memory block for variable length data
MyData.abc = malloc(RealCount);
// copy data from temporal buffer into data structure plus pointer
// arrays in "plain c" or "c++" doesn't require the "&" operator for address:
memcpy(MyData.abc, temp, RealCount);
// comented should be read as:
//memcpy(&MyData.abc, &temp, RealCount);
// continue with rest of data
RealCount = readBuffer(&(MyData.bool1), sizeof(bool));
if (RealCount > 0)
{
// continue with rest of data
RealCount = readBuffer(&(MyData.bool2), sizeof(bool));
}
}
}
}
} // void read()
Cheers.

Serialization/Deserialization of a struct to a char* in C

I have a struct
struct Packet {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
char* Serialize() {
char *message = new char[MaxMailSize];
message[0] = senderId;
message[1] = sequenceNumber;
for (unsigned i=0;i<MaxDataSize;i++)
message[i+2] = data[i];
return message;
}
void Deserialize(char *message) {
senderId = message[0];
sequenceNumber = message[1];
for (unsigned i=0;i<MaxDataSize;i++)
data[i] = message[i+2];
}
};
I need to convert this to a char* , maximum length MaxMailSize > MaxDataSize for sending over network and then deserialize it at the other end
I can't use tpl or any other library.
Is there any way to make this better I am not that comfortable with this, or is this the best we can do.

since this is to be sent over a network, i strongly advise you to convert those data into network byte order before transmitting, and back into host byte order when receiving. this is because the byte ordering is not the same everywhere, and once your bytes are not in the right order, it may become very difficult to reverse them (depending on the programming language used on the receiving side). byte ordering functions are defined along with sockets, and are named htons(), htonl(), ntohs() and ntohl(). (in those name: h means 'host' or your computer, n means 'network', s means 'short' or 16bit value, l means 'long' or 32 bit value).
then you are on your own with serialization, C and C++ have no automatic way to perform it. some softwares can generate code to do it for you, like the ASN.1 implementation asn1c, but they are difficult to use because they involve much more than just copying data over the network.

Depending if you have enough place or not... you might simply use the streams :)
std::string Serialize() {
std::ostringstream out;
char version = '1';
out << version << senderId << '|' << sequenceNumber << '|' << data;
return out.str();
}
void Deserialize(const std::string& iString)
{
std::istringstream in(iString);
char version = 0, check1 = 0, check2 = 0;
in >> version;
switch(version)
{
case '1':
senderId >> check1 >> sequenceNumber >> check2 >> data;
break;
default:
// Handle
}
// You can check here than 'check1' and 'check2' both equal to '|'
}
I readily admit it takes more place... or that it might.
Actually, on a 32 bits architecture an int usually cover 4 bytes (4 char). Serializing them using streams only take more than 4 'char' if the value is superior to 9999, which usually gives some room.
Also note that you should probably include some guards in your stream, just to check when you get it back that it's alright.
Versioning is probably a good idea, it does not cost much and allows for unplanned later development.

You can have a class reprensenting the object you use in your software with all the niceties and member func and whatever you need. Then you have a 'serialized' struct that's more of a description of what will end up on the network.
To ensure the compiler will do whatever you tell him to do, you need to instruct it to 'pack' the structure. The directive I used here is for gcc, see your compiler doc if you're not using gcc.
Then the serialize and deserialize routine just convert between the two, ensuring byte order and details like that.
#include <arpa/inet.h> /* ntohl htonl */
#include <string.h> /* memcpy */
class Packet {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
public:
char* Serialize();
void Deserialize(char *message);
};
struct SerializedPacket {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
} __attribute__((packed));
void* Packet::Serialize() {
struct SerializedPacket *s = new SerializedPacket();
s->senderId = htonl(this->senderId);
s->sequenceNumber = htonl(this->sequenceNumber);
memcpy(s->data, this->data, MaxDataSize);
return s;
}
void Packet::Deserialize(void *message) {
struct SerializedPacket *s = (struct SerializedPacket*)message;
this->senderId = ntohl(s->senderId);
this->sequenceNumber = ntohl(s->sequenceNumber);
memcpy(this->data, s->data, MaxDataSize);
}

int senderId;
int sequenceNumber;
...
char *message = new char[MaxMailSize];
message[0] = senderId;
message[1] = sequenceNumber;
You're overwriting values here. senderId and sequenceNumber are both ints and will take up more than sizeof(char) bytes on most architectures. Try something more like this:
char * message = new char[MaxMailSize];
int offset = 0;
memcpy(message + offset, &senderId, sizeof(senderId));
offset += sizeof(senderId);
memcpy(message + offset, &sequenceNumber, sizeof(sequenceNumber));
offset += sizeof(sequenceNumber);
memcpy(message + offset, data, MaxDataSize);
EDIT:
fixed code written in a stupor. Also, as noted in comment, any such packet is not portable due to endian differences.

To answer your question generally, C++ has no reflection mechanism, and so manual serialize and unserialize functions defined on a per-class basis is the best you can do. That being said, the serialization function you wrote will mangle your data. Here is a correct implementation:
char * message = new char[MaxMailSize];
int net_senderId = htonl(senderId);
int net_sequenceNumber = htonl(sequenceNumber);
memcpy(message, &net_senderId, sizeof(net_senderId));
memcpy(message + sizeof(net_senderId), &net_sequenceNumber, sizeof(net_sequenceNumber));

As mentioned in other posts, senderId and sequenceNumber are both of type int, which is likely to be larger than char, so these values will be truncated.
If that's acceptable, then the code is OK. If not, then you need to split them into their constituent bytes. Given that the protocol you are using will specifiy the byte order of multi-byte fields, the most portable, and least ambiguous, way of doing this is through shifting.
For example, let's say that senderId and sequenceNumber are both 2 bytes long, and the protocol requires that the higher byte goes first:
char* Serialize() {
char *message = new char[MaxMailSize];
message[0] = senderId >> 8;
message[1] = senderId;
message[2] = sequenceNumber >> 8;
message[3] = sequenceNumber;
memcpy(&message[4], data, MaxDataSize);
return message;
}
I'd also recommend replacing the for loop with memcpy (if available), as it's unlikely to be less efficient, and it makes the code shorter.
Finally, this all assumes that char is one byte long. If it isn't, then all the data will need to be masked, e.g.:
message[0] = (senderId >> 8) & 0xFF;

You can use Protocol Buffers for defining and serializing of structs and classes. This is what google uses internally, and has a very small transfer mechanism.
http://code.google.com/apis/protocolbuffers/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Managing buffers of binary data in C++ - c++

Related

Subsetting char array without copying it in C++

Build struct element with correct size

array in C/C++ for AVR keeps appending

Typecasting from byte[] to struct

Serialization/Deserialization of a struct to a char* in C

Categories

Resources