how to parse unsigned char array to numerical data

how to parse unsigned char array to numerical data - c++

The setup of my question is as follows:
I have a source sending a UDP Packet to my receiving computer
Receiving computer takes the UDP packet and receives it into unsigned char *message.
I can print the packet out byte-wise using
for(int i = 0; i < sizeof(message); i++) {
printf("0x%02 \n", message[i];
}
And this is where I am! Now I'd like to start parsing these bytes I recieved into the network as shorts, ints, longs, and strings.
I've written a series of functions like:
short unsignedShortToInt(char[] c) {
short i = 0;
i |= c[1] & 0xff;
i <<= 8;
i |= c[0] & 0xff;
return i;
}
to parse the bytes and shift them into ints, longs, and shorts. I can use sprintf() to create strings from byte arrays.
My question is -- what's the best way to get the substrings from my massive UDP packet? The packet is over 100 character in lengths, so I'd like an easy way to pass in message[0:6] or message[20:22] to these variation utility functions.
Possible options:
I can use strcpy() to create a temporary array for each function call, but that seems a bit messy.
I can turn the entire packet into a string and use std::string::substr. This seems nice, but I'm concerned that converting the unsigned chars into signed chars (part of the string conversion process) might cause some errors (maybe this concern is unwarranted?).
Maybe another way?
So I ask you, stackoverflow, to recommend a clean, concise way to do this task!
thanks!

Why not use proper serialization ?
i.e. MsgPack
You'll need a scheme how to differentiate messages. You could for example make them self-describing, something like:
struct my_message {
string protocol;
string data;
};
and dispatch decoding based on the protocol.
You'll most probably be better off using a tested serialization library than finding out that your system is vulnerable to buffer overflow attacks and malfunction.

I think you have two problems to solve here. First you need to make sure the integer data are properly aligned in memory after extracting them from the character buffer. next you need to ensure the correct byte order of the integer data after their extraction.
The alignment problem can be solved with a union containing the integral data type super-imposed upon a character array of the correct size. The network byte order problem can be solved using the standard ntohs() and ntohl() functions. This will only work if the sending software also used the standard byte-order produced by the inverse of these functions.
See: http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html
Here are a couple of UNTESTED functions you may find useful. I think they should just about do what you are after.
#include <netinet/in.h>
/**
* General routing to extract aligned integral types
* from the UDP packet.
*
* #param data Pointer into the UDP packet data
* #param type Integral type to extract
*
* #return data pointer advanced to next position after extracted integral.
*/
template<typename Type>
unsigned char const* extract(unsigned char const* data, Type& type)
{
// This union will ensure the integral data type is correctly aligned
union tx_t
{
unsigned char cdata[sizeof(Type)];
Type tdata;
} tx;
for(size_t i(0); i < sizeof(Type); ++i)
tx.cdata[i] = data[i];
type = tx.tdata;
return data + sizeof(Type);
}
/**
* If strings are null terminated in the buffer then this could be used to extract them.
*
* #param data Pointer into the UDP packet data
* #param s std::string type to extract
*
* #return data pointer advanced to next position after extracted std::string.
*/
unsigned char const* extract(unsigned char const* data, std::string& s)
{
s.assign((char const*)data, std::strlen((char const*)data));
return data + s.size();
}
/**
* Function to parse entire UDP packet
*
* #param data The entire UDP packet data
*/
void read_data(unsigned char const* const data)
{
uint16_t i1;
std::string s1;
uint32_t i2;
std::string s2;
unsigned char const* p = data;
p = extract(p, i1); // p contains next position to read
i1 = ntohs(i1);
p = extract(p, s1);
p = extract(p, i2);
i2 = ntohl(i2);
p = extract(p, s2);
}
Hope that helps.
EDIT:
I have edited the example to include strings. It very much depends on how the strings are stored in the stream. This example assumes the strings are null-terminated c-strings.
EDIT2:
Whoopse, changed code to accept unsigned chars as per question.

If the array is only 100 characters in length just create a char buffer[100] and a queue of them so you don't miss processing any of the messages.
Next you could just index that buffer as you described and if you know the struct of the message then you know the index points.
next you can union the types i.e
union myType{
char buf[4];
int x;
}
giving you the value as an int from a char if thats what you need

Related

What is the safe approach to convert incoming network `char*` data to `uint8_t` and back

This question on SO deals with the char <-> uint8_t issue mainly from the perspective of the Strict Aliasing Rule. Roughly speaking, it clarifies that as long as uint8_t is implemented as either char or unsigned char, we're fine.
I'm interested in understanding whether or not the possible incompatability of the signed/unsignedness of uint8_t with char matters when using reinterpret_cast.
When I need to deal directly with bytes, I prefer using uint8_t. However, the Winsock API deals with char*s.
I would like to understand how to handle these conversions correctly, in order to not run into Undefined Behavior or other phenomenons that damage the portability of the app.
The following functions takes a std::array<uint8_t, 4> and converts it to an uint32_t - i.e., takes 4 bytes and converts them to an integer.
uint32_t bytes_to_u32(const std::array<uint8_t, 4>& bytes) {
return (bytes[0] << 24) + (bytes[1] << 16) + (bytes[2] << 8) + bytes[3];
}
However, the data incoming from the socket (using the recv function) comes in char* form.
One approach is the following:
std::array<uint8_t, 4> length_buffer;
int bytes_received = 0;
while (bytes_received < 4) {
bytes_received += recv(sock, reinterpret_cast<char*>(length_buffer.data()) + bytes_received, 4 - bytes_received, 0);
}
It seems to work on my machine. However - is this safe? If I'm not mistaken, on a different machine or compiler, a char may be signed, meaning the length_buffer will hold wrong values after the conversion. Am I wrong?
I know that reinterpret_cast does not change the bit pattern at all - it leaves the binary data the same. Knowing this, it still doesn't fully register in my brain whether or not this technique is the right way to go.
Please explain how to approach this problem.
EDIT: Also noting, after converting the char* to uint8_t*, I need to be able to convert the uint8_t* to a valid numeric value, or sometimes test the numeric values of individual bytes in the buffer. In order to interpret the "commands" I was sent over the network, and send some back to the other side.

I hope I did understand your question correctly, you can solve this problem using unions:
//Union is template so you can use this for any given type
template<typename T>
union ConvertBytes
{
T value;
char byte[sizeof(T)];
};
void process()
{
recv(socket, buffer, bufferLength, 0); //Recieve data
ConvertBytes<uint32_t> converter;
for (int i = 0; i < sizeof(uint32_t); i++) //Considering that you recieve only that one uint32
{
converter.byte[i] = buffer[i]; //Assign all bytes into union
}
uint32_t result = converter.value; //Get uint32_t value from union
}

Convert char* to uint8_t

I transfer message trough a CAN protocol.
To do so, the CAN message needs data of uint8_t type. So I need to convert my char* to uint8_t. With my research on this site, I produce this code :
char* bufferSlidePressure = ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();//My char*
/* Conversion */
uint8_t slidePressure [8];
sscanf(bufferSlidePressure,"%c",
&slidePressure[0]);
As you may see, my char* must fit in sliderPressure[0].
My problem is that even if I have no error during compilation, the data in slidePressure are totally incorrect. Indeed, I test it with a char* = 0 and I 've got unknow characters ... So I think the problem must come from conversion.
My datas can be Bool, Uchar, Ushort and float.
Thanks for your help.

Is your string an integer? E.g. char* bufferSlidePressure = "123";?
If so, I would simply do:
uint8_t slidePressure = (uint8_t)atoi(bufferSlidePressure);
Or, if you need to put it in an array:
slidePressure[0] = (uint8_t)atoi(bufferSlidePressure);
Edit: Following your comment, if your data could be anything, I guess you would have to copy it into the buffer of the new data type. E.g. something like:
/* in case you'd expect a float*/
float slidePressure;
memcpy(&slidePressure, bufferSlidePressure, sizeof(float));
/* in case you'd expect a bool*/
bool isSlidePressure;
memcpy(&isSlidePressure, bufferSlidePressure, sizeof(bool));
/*same thing for uint8_t, etc */
/* in case you'd expect char buffer, just a byte to byte copy */
char * slidePressure = new char[ size ]; // or a stack buffer
memcpy(slidePressure, (const char*)bufferSlidePressure, size ); // no sizeof, since sizeof(char)=1

uint8_t is 8 bits of memory, and can store values from 0 to 255
char is probably 8 bits of memory
char * is probably 32 or 64 bits of memory containing the address of a different place in memory in which there is a char
First, make sure you don't try to put the memory address (the char *) into the uint8 - put what it points to in:
char from;
char * pfrom = &from;
uint8_t to;
to = *pfrom;
Then work out what you are really trying to do ... because this isn't quite making sense. For example, a float is probably 32 or 64 bits of memory. If you think there is a float somewhere in your char * data you have a lot of explaining to do before we can help :/

char * is a pointer, not a single character. It is possible that it points to the character you want.
uint8_t is unsigned but on most systems will be the same size as a char and you can simply cast the value.
You may need to manage the memory and lifetime of what your function returns. This could be done with vector< unsigned char> as the return type of your function rather than char *, especially if toUtf8() has to create the memory for the data.
Your question is totally ambiguous.
ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();
That is a lot of cascading calls. We have no idea what any of them do and whether they are yours or not. It looks dangerous.

More safe example in C++ way
char* bufferSlidePressure = "123";
std::string buffer(bufferSlidePressure);
std::stringstream stream;
stream << str;
int n = 0;
// convert to int
if (!(stream >> n)){
//could not convert
}
Also, if boost is availabe
int n = boost::lexical_cast<int>( str )

Python's struct.pack/unpack equivalence in C++

I used struct.pack in Python to transform a data into serialized byte stream.
>>> import struct
>>> struct.pack('i', 1234)
'\xd2\x04\x00\x00'
What is the equivalence in C++?

You'll probably be better off in the long run using a third party library (e.g. Google Protocol Buffers), but if you insist on rolling your own, the C++ version of your example might be something like this:
#include <stdint.h>
#include <string.h>
int32_t myValueToPack = 1234; // or whatever
uint8_t myByteArray[sizeof(myValueToPack)];
int32_t bigEndianValue = htonl(myValueToPack); // convert the value to big-endian for cross-platform compatibility
memcpy(&myByteArray[0], &bigEndianValue, sizeof(bigEndianValue));
// At this point, myByteArray contains the "packed" data in network-endian (aka big-endian) format
The corresponding 'unpack' code would look like this:
// Assume at this point we have the packed array myByteArray, from before
int32_t bigEndianValue;
memcpy(&bigEndianValue, &myByteArray[0], sizeof(bigEndianValue));
int32_t theUnpackedValue = ntohl(bigEndianValue);
In real life you'd probably be packing more than one value, which is easy enough to do (by making the array size larger and calling htonl() and memcpy() in a loop -- don't forget to increase memcpy()'s first argument as you go, so that your second value doesn't overwrite the first value's location in the array, and so on).
You'd also probably want to pack (aka serialize) different data types as well. uint8_t's (aka chars) and booleans are simple enough as no endian-handling is necesary for them -- you can just copy each of them into the array verbatim as a single byte. uint16_t's you can convert to big-endian via htons(), and convert back to native-endian via ntohs(). Floating point values are a bit tricky, since there is no built-in htonf(), but you can roll your own that will work on IEEE754-compliant machines:
uint32_t htonf(float f)
{
uint32_t x;
memcpy(&x, &f, sizeof(float));
return htonl(x);
}
.... and the corresponding ntohf() to unpack them:
float ntohf(uint32_t nf)
{
float x;
nf = ntohl(nf);
memcpy(&x, &nf, sizeof(float));
return x;
}
Lastly for strings you can just add the bytes of the string to the buffer (including the NUL terminator) via memcpy:
const char * s = "hello";
int slen = strlen(s);
memcpy(myByteArray, s, slen+1); // +1 for the NUL byte

There isn't one. C++ doesn't have built-in serialization.
You would have to write individual objects to a byte array/vector, and being careful about endianness (if you want your code to be portable).

https://github.com/karkason/cppystruct
#include "cppystruct.h"
// icmp_header can be any type that supports std::size and std::data and holds bytes
auto [type, code, checksum, p_id, sequence] = pystruct::unpack(PY_STRING("bbHHh"), icmp_header);
int leet = 1337;
auto runtimePacked = pystruct::pack(PY_STRING(">2i10s"), leet, 20, "String!");
// runtimePacked is an std::array filled with "\x00\x00\x059\x00\x00\x00\x10String!\x00\x00\x00"
// The format is "compiled" and has zero overhead in runtime
constexpr auto packed = pystruct::pack(PY_STRING("<2i10s"), 10, 20, "String!");
// packed is an std::array filled with "\x00\x01\x00\x00\x10\x00\x00\x00String!\x00\x00\x00"

You could check out Boost.Serialization, but I doubt you can get it to use the same format as Python's pack.

I was also looking for the same thing. Luckily I found https://github.com/mpapierski/struct
with a few additions you can add missing types into struct.hpp, I think it's the best so far.
To use it, just define you params like this
DEFINE_STRUCT(test,
((2, TYPE_UNSIGNED_INT))
((20, TYPE_CHAR))
((20, TYPE_CHAR))
)
The just call this function which will be generated at compilation
pack(unsigned int p1, unsigned int p2, const char * p3, const char * p4)
The number and type of parameters will depend on what you defined above.
The return type is a char* which contains your packed data.
There is also another unpack() function which you can use to read the buffer

You can use union to get different view into the same memory.
For example:
union Pack{
int i;
char c[sizeof(int)];
};
Pack p = {};
p.i = 1234;
std::string packed(p.c, sizeof(int)); // "\xd2\x04\x00\0"
As mentioned in the other answers, you have to notice the endianness.

How to send float[] via UDP + Unsigned-Long-Operator-Curiosity

I am writing an C++ Application which reads several voltages from a device. I receive these measurements in an float[] and I want to send this array via UDP to a MATLAB-Script.
the C++-function sendto needs to get an char[] buffer and I really have no idea how to convert the float[] into a char[] buffer so i can reassemble it easily in MATLAB. Any Ideas?
Another problem i encountered is that line
addr.sin_addr = inet_addr("127.0.0.1");
inet_addr returns an unsigned long, but my compiler tells me that the = operator does not accept an unsigend long datatype on its right side. Any Iideas about this?

You can always treat any object variable as a sequence of bytes. For this very purpose, it is explicitly allowed (and does not violate aliasing or constitute type punning) to reinterpret any object pointer as a pointer to the first element in an array of bytes (i.e. any char type).
Example:
T x;
char const * p = reinterpret_cast<char const *>(&x);
for (std::size_t i = 0; i != sizeof x; ++i) { /* p[i] is the ith byte in x */ }
For your case:
float data[N];
char const * p = reinterpret_cast<char const *>(data);
write(fd, p, sizeof data);

Decide if you want to format the UDP messages as text or binary. If text, you can convert floats to strings using boost::lexical_cast. You can frame the string valus in the UDP message any way you want (comma separated values, newline separated, etc.), or you could use a known format such as JSON.
If you want to transmit binary data, select an known format, such as XDR which is used by ONC RPC and use existing library tools to create the binary messages.
As for the inet_addr error, addr.sin_addr is a struct in_addr. You need to assign the result to the s_addr member of the sin_addr struture like this:
addr.sin_addr.s_addr = inet_addr("127.0.0.1");

There are two questions in your post. I believe that is not how it's supposed to be.
As for float[]->byte[] conersion - you should check how matlab stores it's floating point variables. If, by any chance, it uses the same format as you compiler, for your computer setup etc etc only, you can simply send these as a byte array[]. In any other case - incompatible float byte format, multiple machines - you have to write a manual conversion. First each float to (for example) string, then many floats. Your line could look like:
1.41234;1.63756;456345.45634
As for the addr.sin_addr - I think you are doing it wrong. You should access
addr.sin_addr.s_addr = inet_addr("1.1.1.1");

Reinterpret float vector as unsigned char array and back

I've searched and searched stackoverflow for the answer, but have not found what I needed.
I have a routine that takes an unsigned char array as a parameter in order to encode it as Base64. I would like to encode an STL float vector (vector) in Base64, and therefore would need to reinterpret the bytes in the float vector as an array of unsigned characters in order to pass it to the encode routine. I have tried a number of things from reinterpret and static casts, to mem copies, etc, but none of them seem to work (at least not the way I implemented them).
Likewise, I'll need to do the exact opposite when decoding the encoded data back to a float array. The decode routine will provide the decoded data as an unsigned char array, and I will need to reinterpret that array of bytes, converting it to a float vector again.
Here is a stripped down version of my C++ code to do the encoding:
std::string
EncodeBase64FloatVector( const vector<float>& p_vector )
{
unsigned char* sourceArray;
// SOMEHOW FILL THE sourceArray WITH THE FLOAT VECTOR DATA BITS!!
char* target;
size_t targetSize = p_vector.size() * sizeof(float);
target = new char[ targetSize ];
int result = EncodeBase64( sourceArray, floatArraySizeInUChars, target, targetSize );
string returnResult;
if( result != -1 )
{
returnResult = target;
}
delete target;
delete sourceArray;
return returnResult;
}
Any help would be greatly appreciated. Thanks.
Raymond.

std::vector guarantees the data will be contiguous, and you can get a pointer to the first element in the vector by taking the address of the first element (assuming it's not empty).
typedef unsigned char byte;
std::vector<float> original_data;
...
if (!original_data.empty()) {
const float *p_floats = &(original_data[0]); // parens for clarity
Now, to treat that as an array of unsigned char, you use a reinterpret_cast:
const byte *p_bytes = reinterpret_cast<const byte *>(p_floats);
// pass p_bytes to your base-64 encoder
}
You might want to encode the length of the vector before the rest of the data, in order to make it easier to decode them.
CAUTION: You still have to worry about endianness and representation details. This will only work if you read back on the same platform (or a compatible one) that you wrote with.

sourceArray = reinterpret_cast<const unsigned char *>(&(p_vector[0]))

I would highly recommend checking out Google's protobuf to solve your problem. Floats and doubles can vary in size and layout between platforms and that package has solved all those problems for you. Additionally, it can easily handle your data structure should it ever become more complicated than a simple array of floats.
If you do use that, you will have to do your own base64 encoding still as protobuf encodes data assuming you have an 8-bit clean channel to work with. But that's fairly trivial.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to parse unsigned char array to numerical data - c++

Related

What is the safe approach to convert incoming network `char*` data to `uint8_t` and back

Convert char* to uint8_t

Python's struct.pack/unpack equivalence in C++

How to send float[] via UDP + Unsigned-Long-Operator-Curiosity

Reinterpret float vector as unsigned char array and back

Categories

Resources