Deciphering unsigned char* - c++

I have a process that listens to an UDP multi-cast broadcast and reads in the data as a unsigned char*.
I have a specification that indicates fields within this unsigned char*.
Fields are defined in the specification with a type and size.
Types are: uInt32, uInt64, unsigned int, and single byte string.
For the single byte string I can merely access the offset of the field in the unsigned char* and cast to a char, such as:
char character = (char)(data[1]);
Single byte uint32 i've been doing the following, which also seems to work:
uint32_t integer = (uint32_t)(data[20]);
However, for multiple byte conversions I seem to be stuck.
How would I convert several bytes in a row (substring of data) to its corresponding datatype?
Also, is it safe to wrap data in a string (for use of substring functionality)? I am worried about losing information, since I'd have to cast unsigned char* to char*, like:
std::string wrapper((char*)(data),length); //Is this safe?
I tried something like this:
std::string wrapper((char*)(data),length); //Is this safe?
uint32_t integer = (uint32_t)(wrapper.substr(20,4).c_str()); //4 byte int
But it doesn't work.
Thoughts?
Update
I've tried the suggest bit shift:
void function(const unsigned char* data, size_t data_len)
{
//From specifiction: Field type: uInt32 Byte Length: 4
//All integer fields are big endian.
uint32_t integer = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]);
}
This sadly gives me garbage (same number for every call --from a callback).

I think you should be very explicit, and not just do "clever" tricks with casts and pointers. Instead, write a function like this:
uint32_t read_uint32_t(unsigned char **data)
{
const unsigned char *get = *data;
*data += 4;
return (get[0] << 24) | (get[1] << 16) | (get[2] << 8) | get[3];
}
This extracts a single uint32_t value from a buffer of unsigned char, and increases the buffer pointer to point at the next byte of data in the buffer.
This assumes big-endian data, you need to have a well-defined idea of the buffer's endian-mode in order to interpret it.

Depends on the byte ordering of the protocol, for big-endian or so called network byte order do:
uint32_t i = data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];

Without commenting on whether it's a good idea or not, the reason why it doesn't work for you is that the result of wrapper.substring(20,4).c_str() is (uint32_t *), not (uint32_t). So if you do:
uint32_t * integer = (uint32_t *)(wrapper.substr(20,4).c_str(); it should work.

uint32_t integer = ntohl(*reinterpret_cast<const uint32_t*>(data + 20));
or (handles alignment issues):
uint32_t integer;
memcpy(&integer, data+20, sizeof integer);
integer = ntohl(integer);

The pointer way:
uint32_t n = *(uint32_t*)&data[20];
You will run into problems on different endian architectures though. The solution with bit shifts is better and consistent.
std::string wrapper((char*)(data),length); //Is this safe?
This should be safe since you specified the length of the data.
On the other hand if you did this:
std::string wrapper((char*)data);
The string length would be determined wherever the first 0 byte occurs, and you will more than likely chop off some data.

Related

memcpy unsigned char to int

I'm trying to get an int value from a file I read. The trick is that I don't know how many bytes this value lays on, so I first read the length octet, then try to read as many data bytes as length octet tells me. The issue comes when I try to put the data octets in an int variable, and eventually print it - if the first data octet is 0, only the one that comes after is copied, so the int I try to read is wrong, as 0x00A2 is not the same as 0xA200. If i use ntohs or ntohl, then 0xA200 is decoded wrong as 0x00A2, so it does not resolve the hole problem. I am using memcpy like this:
memcpy(&dst, (const *)src, bytes2read)
where dst is int, src is unsigned char * and bytes2read is a size_t.
So what am I doing wrong? Thank you!
You cannot use memcpy to portably store bytes in an integer, because the order of bytes is not specified by the standard, not speaking of possible padding bits. The portable way is to use bitwise operations and shift:
unsigned char b, len;
unsigned int val = 0;
fdin >> len; // read the field len
if (len > sizeof(val)) { // ensure it will fit into an
// process error: cannot fit in an int variable
...
}
while (len-- > 0) { // store and shift one byte at a bite
val <<= 8; // shift previous value to leave room for new byte
fdin >> b; // read it
val |= b; // and store..
}

Compare uint32_t with a loaded char[] from file C++

I have a binary file from which I load whole text in unsigned char[] and a variable const uint32_t LITTLE_ENDIAN_ID = 0x49696949;
I need to compare first four characters from loaded char[] with given uint32_t.
Is that possible somehow?
If buff is your unsigned char[] buffer, you can do:
memcmp((unsigned char*)&LITTLE_ENDIAN_ID, buff, 4) == 0
memcmp is defined in string.h
yes, it's absolutely possible, but your question is underspecified. What you want to do is to take the first 4 characters of your character array and convert them into a uint32_t; the obvious question: which character corresponds to which byte of the 32-bit int? This is probably equivalent of asking if these bytes are stored in little-endian or big-endian order. Though now that I see your LITTLE_ENDIAN_ID I realize that it doesn't matter - it's (oddly) the same forwards and backwards.
Anyhow, what you want is either:
unsigned char[] text = ...
uint32_t x = text[0] << 24 + text[1] << 16 + text[2] << 8 + text[3];
if (x == LITTLE_ENDIAN_ID)
// do something
Or the same thing, but with
uint32_t x = text[3] << 24 + text[2] << 16 + text[1] << 8 + text[0];
Alternatively we could do something a little more unusual like
union {
uint32_t int_value;
unsigned char[4] characters;
} converter;
unsigned char[] text = ...
converter x;
for (int i=0; i < 4; i++)
x.characters[i] = text[i];
if (x.int_value == LITTLE_ENDIAN_ID)
// do something
This is probably closer to what you want if you are actually looking to test the endianness of the current system.

unsigned char concatenation

I am creating a C++ program for communication with a gripper on a serial port.
I have to send a buffer of type "unsigned char [8]", but of these 8 bytes, 4 are entered from the keyboard, and 2 are the CRC, calculated at the time.
So, how can I concatenate several pieces in a single buffer of 8 bytes unsigned char?
For example:
unsigned char buffer[8];
----
unsigned char DLEN[1]={0x05};
----
unsigned char CMD[1]={0x01};
----
unsigned char data[4]={0x00,0x01,0x20,0x41};
----
unsigned char CRC[2]={0xFF,0x41};
----
how can I get this buffer: {0x05,0x01,0x00,0x01,0x20,0x41,0xFF,0x41} that is the union of DLEN,CMD,data and CRC?
This:
buffer[0] = DLEN[0];
buffer[1] = CMD[0];
buffer[2] = data[0];
buffer[3] = data[1];
buffer[4] = data[2];
buffer[5] = data[3];
buffer[6] = CRC[0];
buffer[7] = CRC[1];
An alternative solution is this:
Start off with an unsigned char array of 8 characters.
When you need to pass it off to other methods to have data inserted in them, pass it by reference like this: updateCRC(&buffer[6]) with the method signature taking an unsigned char pointer. Assuming you respect the respective sizes of the inputs, the result is the best of both worlds, handling the buffer as if they were separate strings, and not having to merge it into a single array afterwards.
You can use bit shifting, the << and >> operators, to get the appropriate fields to the right places in the buffer.
Something like buffer |= (DLEN << 7);
Just make sure your buffer is cleared to be all 0's first.
My version of hmjd's answer:
buffer[0] = DLEN[0];
buffer[1] = CMD[0];
std::copy(begin(data),end(data),buffer+sizeof DLEN+sizeof CMD);
std::copy(begin(CRC) ,end(CRC) ,buffer+sizeof DLEN+sizeof CMD+sizeof data);

"Right" way to retrieve an int from a big-endian binary file in c++

I have a binary file in big-endian format from which I am retrieving 2-bit and 4-bit integer data. The machine I'm running on is little-endian.
Does anyone have any suggestions or a best-practice on pulling integer data from a known format binary and switching endianness on the fly? I'm not sure that my current solution is even correct:
int myInt;
ifstream dataFile(dataFileLocation, ios::in | ios::binary);
dataFile.seekg(99, ios::beg); //Pull data starting at byte 100;
//For 4-byte value:
char chunk[4];
dataFile.read(chunk, 4);
myInt = (int)(chunk[0] << 24 | chunk[1] << 16 | chunk[2] << 8 | chunk[3]);
//For 2-byte value:
char chunk[2];
dataFile.read(chunk, 4);
myInt = (int)(chunk[0] << 8 | chunk[1]);
This seems to work fine for 2-byte data but gives what I believe are incorrect values on 4-byte data. I've read about htonl() but from what I've read that's not a smart way to go for flexibility.
Use unsigned integral types only and you'll be fine:
unsigned char buf[4];
infile.read(reinterpret_cast<char*>(buf), 4);
unsigned int b4 = (buf[0] << 24) + ... + (buf[3]);
unsigned int b2 = (buf[0] << 8) + (buf[1]);
Shifting involves type promotions, and indefinite sign extensions (given the implementation-defined nature of char). Basically you always want everything to be unsigned when manipulating bits.

Bitwise operators and converting an int to 2 bytes and back again

My background is php so entering the world of low-level stuff like char is bytes, which are bits, which is binary values, etc is taking some time to get the hang of.
What I am trying to do here is sent some values from an Ardunio board to openFrameWorks (both are c++).
What this script currently does (and works well for one sensor I might add) when asked for the data to be sent is:
int value_01 = analogRead(0); // which outputs between 0-1024
unsigned char val1;
unsigned char val2;
//some Complicated bitshift operation
val1 = value_01 &0xFF;
val2 = (value_01 >> 8) &0xFF;
//send both bytes
Serial.print(val1, BYTE);
Serial.print(val2, BYTE);
Apparently this is the most reliable way of getting the data across.
So now that it is send via serial port, the bytes are added to a char string and converted back by:
int num = ( (unsigned char)bytesReadString[1] << 8 | (unsigned char)bytesReadString[0] );
So to recap, im trying to get 4 sensors worth of data (which I am assuming will be 8 of those serialprints?) and to have int num_01 - num_04... at the end of it all.
Im assuming this (as with most things) might be quite easy for someone with experience in these concepts.
Write a function to abstract sending the data (I've gotten rid of your temporary variables because they don't add much value):
void send16(int value)
{
//send both bytes
Serial.print(value & 0xFF, BYTE);
Serial.print((value >> 8) & 0xFF, BYTE);
}
Now you can easily send any data you want:
send16(analogRead(0));
send16(analogRead(1));
...
Just send them one after the other.
Note that the serial driver lets you send one byte (8 bits) at a time. A value between 0 and 1023 inclusive (which looks like what you're getting) fits in 10 bits. So 1 byte is not enough. 2 bytes, i.e. 16 bits, are enough (there is some extra space, but unless transfer speed is an issue, you don't need to worry about this wasted space).
So, the first two bytes can carry the data for your first sensor. The next two bytes carry the data for the second sensor, the next two bytes for the third sensor, and the last two bytes for the last sensor.
I suggest you use the function that R Samuel Klatchko suggested on the sending side, and hopefully you can work out what you need to do on the receiving side.
int num = ( (unsigned char)bytesReadString[1] << 8 |
(unsigned char)bytesReadString[0] );
That code will not do what you expect.
When you shift an 8-bit unsigned char, you lose the extra bits.
11111111 << 3 == 11111000
11111111 << 8 == 00000000
i.e. any unsigned char, when shifted 8 bits, must be zero.
You need something more like this:
typedef unsigned uint;
typedef unsigned char uchar;
uint num = (static_cast<uint>(static_cast<uchar>(bytesReadString[1])) << 8 ) |
static_cast<uint>(static_cast<uchar>(bytesReadString[0]));
You might get the same result from:
typedef unsigned short ushort;
uint num = *reinterpret_cast<ushort *>(bytesReadString);
If the byte ordering is OK. Should work on Little Endian (x86 or x64), but not on Big Endian (PPC, Sparc, Alpha, etc.)
To generalise the "Send" code a bit --
void SendBuff(const void *pBuff, size_t nBytes)
{
const char *p = reinterpret_cast<const char *>(pBuff);
for (size_t i=0; i<nBytes; i++)
Serial.print(p[i], BYTE);
}
template <typename T>
void Send(const T &t)
{
SendBuff(&t, sizeof(T));
}