Convert integer to char array and then convert it back - c++

I am a little confused on how casts work in C++.
I have a 4 bytes integer which I need to convert to a char[32] and then convert it back in some other function.
I am doing the following :
uint32_t v = 100;
char ch[32]; // This is 32 bytes reserved memory
memcpy(ch,&v,4);
uint32_t w = *(reinterpret_cast<int*>(ch)); // w should be equal to v
I am getting the correct results on my compiler, but I want to make sure if this is a correct way to do it.

Technically, no. You are at risk of falling foul of your CPU's alignment rules, if it has any.
You may alias an object byte-by-byte using char*, but you can't take an actual char array (no matter where its values came from) and pretend it's some other object.
You will see that reinterpret_cast<int*> method a lot, and on many systems it will appear to work. However, the "proper" method (if you really need to do this at all) is:
const auto INT_SIZE = sizeof(int);
char ch[INT_SIZE] = {};
// Convert to char array
const int x = 100;
std::copy(
reinterpret_cast<const char*>(&x),
reinterpret_cast<const char*>(&x) + INT_SIZE,
&ch[0]
);
// Convert back again
int y = 0;
std::copy(
&ch[0],
&ch[0] + INT_SIZE,
reinterpret_cast<char*>(&y)
);
(live demo)
Notice that I only ever pretend an int is a bunch of chars, never the other way around.
Notice also that I have also swapped your memcpy for type-safe std::copy (although since we're nuking the types anyway, that's sort of by-the-by).

Related

What is *(uint32_t *) &buffer[index]?

In some supposed-C++ code I found, I have buffer defined as const void *buffer; (it's arbitrary binary data that, I think, gets interpreted as a stream of 32-bit unsigned integers) and in many places, I have
*(uint32_t *) &buffer[index]
where index is some kind of integer (I think it was long or unsigned long and got swept up in my replacing those with int32_t and uint32_t when I was making the code work on a 64-bit system).
I recognize that this is taking the address of buffer (&buffer), casting it as a pointer to a uint32_t, and dereferencing that, at least based on this question... but then I'm confused by how the [index] part interacts with that or where I missed inserting the [index] part in between the steps I listed.
What, conceptually, is this doing? Is there some way I could define another variable to be a better type, with the casting there once, and then use that, rather than having this complicated expression throughout the code? Is this actually C++ or is this C99?
edit: The first couple of lines of the code are:
const void *buffer = data.bytes;
if (ntohl(*(int32_t *) buffer) != 'ttcf') {
return;
}
uint32_t ttf_count = ntohl(*(uint32_t *) &buffer[0x08]);
where data.bytes has type const void *. Before I was getting buffer from data.bytes, it was char *.
edit 2: Apparently, having const void *buffer work is not normal C (though it absolutely works in my situation), so if it makes more sense, assume it's const char *buffer.
Putting parenthesis in place to make the order of operations more explicit:
*((uint32_t *) &(buffer[index]))
So you're treating buffer as an array, however because buffer is a void * you can't dereference it directly.
Assuming you want to treat this buffer as an array of uint32_t, what you want to do is this:
((uint32_t *)buffer)[index]
Which can also be written as:
*((uint32_t *)buffer + index)
EDIT:
If index is the byte offset in the buffer, that changes things. In that case, I'd recommend defining the buffer as const char * instead of const void *. That way, you can be sure the dereferencing of the array is working properly.
So to break down the expression:
*(uint32_t *) &buffer[index]
You're going index bytes into buffer: buffer[index]
Then taking the address of that byte: &buffer[index]
Then casting that address to a uint32_t: (uint32_t *) &buffer[index]
Then dereferencing the uint32_t value: *(uint32_t *) &buffer[index]
Lots of issues here! First of all, a void * cannot be dereferenced. buffer[index] is illegal in ISO C, although some compilers apparently have an extension that will treat it as (void)((char *)buffer)[index].
You suggest in comments that the code originally used char * - I recommend you leave it that way. Assuming buffer returns to being const char *:
if (ntohl(*(int32_t *) buffer) != 'ttcf') { return; }
The intent here is to pretend that the first four bytes of buffer contain an integer; read that integer, and compare it to 'ttcf'. The latter is a multibyte character constant, the behaviour of which is implementation-defined. It could represent four characters 't', 't', 'c', 'f', or 'f', 'c', 't', 't', or in fact anything else at all of type int.
A greater problem is that pretending a buffer contains an int when it did not actually get written via an expression of type int violates the strict aliasing rule. This is unfortunately a common technique in older code, but even since the first C standard it has caused undefined behaviour. If you use a compiler that performs type-based aliasing optimization it could wreck your code.
A way to write this code avoiding both of those problems is:
if ( memcmp(buffer, "ttcf", 4) ) { return; }
The later line uint32_t ttf_count = ntohl(*(uint32_t *) &buffer[0x08]); has similar issues. In this case there is no doubt that the best fix is:
uint32_t ttf_count;
memcpy(&ttf_count, buffer + 0x08, sizeof ttf_count);
ttf_count = ntohl(ttf_count);
As discussed in comments, you could make an inline function to keep this tidy. In my own code I do something like:
static inline uint32_t be_to_uint32(void const *ptr)
{
unsigned char const *p = ptr;
return p[0] * 0x1000000ul + p[1] * 0x10000ul + p[2] * 0x100 + p[3];
}
and a similar version le_to_uint32 that reads bytes in the opposite order; then I use whichever of those corresponds to the input format instead of using ntohl.

Convert char* to uint8_t

I transfer message trough a CAN protocol.
To do so, the CAN message needs data of uint8_t type. So I need to convert my char* to uint8_t. With my research on this site, I produce this code :
char* bufferSlidePressure = ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();//My char*
/* Conversion */
uint8_t slidePressure [8];
sscanf(bufferSlidePressure,"%c",
&slidePressure[0]);
As you may see, my char* must fit in sliderPressure[0].
My problem is that even if I have no error during compilation, the data in slidePressure are totally incorrect. Indeed, I test it with a char* = 0 and I 've got unknow characters ... So I think the problem must come from conversion.
My datas can be Bool, Uchar, Ushort and float.
Thanks for your help.
Is your string an integer? E.g. char* bufferSlidePressure = "123";?
If so, I would simply do:
uint8_t slidePressure = (uint8_t)atoi(bufferSlidePressure);
Or, if you need to put it in an array:
slidePressure[0] = (uint8_t)atoi(bufferSlidePressure);
Edit: Following your comment, if your data could be anything, I guess you would have to copy it into the buffer of the new data type. E.g. something like:
/* in case you'd expect a float*/
float slidePressure;
memcpy(&slidePressure, bufferSlidePressure, sizeof(float));
/* in case you'd expect a bool*/
bool isSlidePressure;
memcpy(&isSlidePressure, bufferSlidePressure, sizeof(bool));
/*same thing for uint8_t, etc */
/* in case you'd expect char buffer, just a byte to byte copy */
char * slidePressure = new char[ size ]; // or a stack buffer
memcpy(slidePressure, (const char*)bufferSlidePressure, size ); // no sizeof, since sizeof(char)=1
uint8_t is 8 bits of memory, and can store values from 0 to 255
char is probably 8 bits of memory
char * is probably 32 or 64 bits of memory containing the address of a different place in memory in which there is a char
First, make sure you don't try to put the memory address (the char *) into the uint8 - put what it points to in:
char from;
char * pfrom = &from;
uint8_t to;
to = *pfrom;
Then work out what you are really trying to do ... because this isn't quite making sense. For example, a float is probably 32 or 64 bits of memory. If you think there is a float somewhere in your char * data you have a lot of explaining to do before we can help :/
char * is a pointer, not a single character. It is possible that it points to the character you want.
uint8_t is unsigned but on most systems will be the same size as a char and you can simply cast the value.
You may need to manage the memory and lifetime of what your function returns. This could be done with vector< unsigned char> as the return type of your function rather than char *, especially if toUtf8() has to create the memory for the data.
Your question is totally ambiguous.
ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();
That is a lot of cascading calls. We have no idea what any of them do and whether they are yours or not. It looks dangerous.
More safe example in C++ way
char* bufferSlidePressure = "123";
std::string buffer(bufferSlidePressure);
std::stringstream stream;
stream << str;
int n = 0;
// convert to int
if (!(stream >> n)){
//could not convert
}
Also, if boost is availabe
int n = boost::lexical_cast<int>( str )

Python's struct.pack/unpack equivalence in C++

I used struct.pack in Python to transform a data into serialized byte stream.
>>> import struct
>>> struct.pack('i', 1234)
'\xd2\x04\x00\x00'
What is the equivalence in C++?
You'll probably be better off in the long run using a third party library (e.g. Google Protocol Buffers), but if you insist on rolling your own, the C++ version of your example might be something like this:
#include <stdint.h>
#include <string.h>
int32_t myValueToPack = 1234; // or whatever
uint8_t myByteArray[sizeof(myValueToPack)];
int32_t bigEndianValue = htonl(myValueToPack); // convert the value to big-endian for cross-platform compatibility
memcpy(&myByteArray[0], &bigEndianValue, sizeof(bigEndianValue));
// At this point, myByteArray contains the "packed" data in network-endian (aka big-endian) format
The corresponding 'unpack' code would look like this:
// Assume at this point we have the packed array myByteArray, from before
int32_t bigEndianValue;
memcpy(&bigEndianValue, &myByteArray[0], sizeof(bigEndianValue));
int32_t theUnpackedValue = ntohl(bigEndianValue);
In real life you'd probably be packing more than one value, which is easy enough to do (by making the array size larger and calling htonl() and memcpy() in a loop -- don't forget to increase memcpy()'s first argument as you go, so that your second value doesn't overwrite the first value's location in the array, and so on).
You'd also probably want to pack (aka serialize) different data types as well. uint8_t's (aka chars) and booleans are simple enough as no endian-handling is necesary for them -- you can just copy each of them into the array verbatim as a single byte. uint16_t's you can convert to big-endian via htons(), and convert back to native-endian via ntohs(). Floating point values are a bit tricky, since there is no built-in htonf(), but you can roll your own that will work on IEEE754-compliant machines:
uint32_t htonf(float f)
{
uint32_t x;
memcpy(&x, &f, sizeof(float));
return htonl(x);
}
.... and the corresponding ntohf() to unpack them:
float ntohf(uint32_t nf)
{
float x;
nf = ntohl(nf);
memcpy(&x, &nf, sizeof(float));
return x;
}
Lastly for strings you can just add the bytes of the string to the buffer (including the NUL terminator) via memcpy:
const char * s = "hello";
int slen = strlen(s);
memcpy(myByteArray, s, slen+1); // +1 for the NUL byte
There isn't one. C++ doesn't have built-in serialization.
You would have to write individual objects to a byte array/vector, and being careful about endianness (if you want your code to be portable).
https://github.com/karkason/cppystruct
#include "cppystruct.h"
// icmp_header can be any type that supports std::size and std::data and holds bytes
auto [type, code, checksum, p_id, sequence] = pystruct::unpack(PY_STRING("bbHHh"), icmp_header);
int leet = 1337;
auto runtimePacked = pystruct::pack(PY_STRING(">2i10s"), leet, 20, "String!");
// runtimePacked is an std::array filled with "\x00\x00\x059\x00\x00\x00\x10String!\x00\x00\x00"
// The format is "compiled" and has zero overhead in runtime
constexpr auto packed = pystruct::pack(PY_STRING("<2i10s"), 10, 20, "String!");
// packed is an std::array filled with "\x00\x01\x00\x00\x10\x00\x00\x00String!\x00\x00\x00"
You could check out Boost.Serialization, but I doubt you can get it to use the same format as Python's pack.
I was also looking for the same thing. Luckily I found https://github.com/mpapierski/struct
with a few additions you can add missing types into struct.hpp, I think it's the best so far.
To use it, just define you params like this
DEFINE_STRUCT(test,
((2, TYPE_UNSIGNED_INT))
((20, TYPE_CHAR))
((20, TYPE_CHAR))
)
The just call this function which will be generated at compilation
pack(unsigned int p1, unsigned int p2, const char * p3, const char * p4)
The number and type of parameters will depend on what you defined above.
The return type is a char* which contains your packed data.
There is also another unpack() function which you can use to read the buffer
You can use union to get different view into the same memory.
For example:
union Pack{
int i;
char c[sizeof(int)];
};
Pack p = {};
p.i = 1234;
std::string packed(p.c, sizeof(int)); // "\xd2\x04\x00\0"
As mentioned in the other answers, you have to notice the endianness.

Proper Way To Initialize Unsigned Char*

What is the proper way to initialize unsigned char*? I am currently doing this:
unsigned char* tempBuffer;
tempBuffer = "";
Or should I be using memset(tempBuffer, 0, sizeof(tempBuffer)); ?
To "properly" initialize a pointer (unsigned char * as in your example), you need to do just a simple
unsigned char *tempBuffer = NULL;
If you want to initialize an array of unsigned chars, you can do either of following things:
unsigned char *tempBuffer = new unsigned char[1024]();
// and do not forget to delete it later
delete[] tempBuffer;
or
unsigned char tempBuffer[1024] = {};
I would also recommend to take a look at std::vector<unsigned char>, which you can initialize like this:
std::vector<unsigned char> tempBuffer(1024, 0);
The second method will leave you with a null pointer. Note that you aren't declaring any space for a buffer here, you're declaring a pointer to a buffer that must be created elsewhere. If you initialize it to "", that will make the pointer point to a static buffer with exactly one byte—the null terminator. If you want a buffer you can write characters into later, use Fred's array suggestion or something like malloc.
As it's a pointer, you either want to initialize it to NULL first like this:
unsigned char* tempBuffer = NULL;
unsigned char* tempBuffer = 0;
or assign an address of a variable, like so:
unsigned char c = 'c';
unsigned char* tempBuffer = &c;
EDIT:
If you wish to assign a string, this can be done as follows:
unsigned char myString [] = "This is my string";
unsigned char* tmpBuffer = &myString[0];
If you know the size of the buffer at compile time:
unsigned char buffer[SIZE] = {0};
For dynamically allocated buffers (buffers allocated during run-time or on the heap):
1.Prefer the new operator:
unsigned char * buffer = 0; // Pointer to a buffer, buffer not allocated.
buffer = new unsigned char [runtime_size];
2.Many solutions to "initialize" or fill with a simple value:
std::fill(buffer, buffer + runtime_size, 0); // Prefer to use STL
memset(buffer, 0, runtime_size);
for (i = 0; i < runtime_size; ++i) *buffer++ = 0; // Using a loop
3.The C language side provides allocation and initialization with one call.
However, the function does not call the object's constructors:
buffer = calloc(runtime_size, sizeof(unsigned char))
Note that this also sets all bits in the buffer to zero; you don't get a choice in the initial value.
It depends on what you want to achieve (e.g. do you ever want to modify the string). See e.g. http://c-faq.com/charstring/index.html for more details.
Note that if you declare a pointer to a string literal, it should be const, i.e.:
const unsigned char *tempBuffer = "";
If the plan is for it to be a buffer and you want to move it later to point to something, then initialise it to NULL until it really points somewhere to which you want to write, not an empty string.
unsigned char * tempBuffer = NULL;
std::vector< unsigned char > realBuffer( 1024 );
tempBuffer = &realBuffer[0]; // now it really points to writable memory
memcpy( tempBuffer, someStuff, someSizeThatFits );
The answer depends on what you inted to use the unsigned char for. A char is nothing else but a small integer, which is of size 8 bits on 99% of all implementations.
C happens to have some string support that fits well with char, but that doesn't limit the usage of char to strings.
The proper way to initialize a pointer depends on 1) its scope and 2) its intended use.
If the pointer is declared static, and/or declared at file scope, then ISO C/C++ guarantees that it is initialized to NULL. Programming style purists would still set it to NULL to keep their style consistent with local scope variables, but theoretically it is pointless to do so.
As for what to initialize it to... set it to NULL. Don't set it to point at "", because that will allocate a static dummy byte containing a null termination, which will become a tiny little static memory leak as soon as the pointer is assigned to something else.
One may question why you need to initialize it to anything at all in the first place. Just set it to something valid before using it. If you worry about using a pointer before giving it a valid value, you should get a proper static analyzer to find such simple bugs. Even most compilers will catch that bug and give you a warning.

Reinterpret float vector as unsigned char array and back

I've searched and searched stackoverflow for the answer, but have not found what I needed.
I have a routine that takes an unsigned char array as a parameter in order to encode it as Base64. I would like to encode an STL float vector (vector) in Base64, and therefore would need to reinterpret the bytes in the float vector as an array of unsigned characters in order to pass it to the encode routine. I have tried a number of things from reinterpret and static casts, to mem copies, etc, but none of them seem to work (at least not the way I implemented them).
Likewise, I'll need to do the exact opposite when decoding the encoded data back to a float array. The decode routine will provide the decoded data as an unsigned char array, and I will need to reinterpret that array of bytes, converting it to a float vector again.
Here is a stripped down version of my C++ code to do the encoding:
std::string
EncodeBase64FloatVector( const vector<float>& p_vector )
{
unsigned char* sourceArray;
// SOMEHOW FILL THE sourceArray WITH THE FLOAT VECTOR DATA BITS!!
char* target;
size_t targetSize = p_vector.size() * sizeof(float);
target = new char[ targetSize ];
int result = EncodeBase64( sourceArray, floatArraySizeInUChars, target, targetSize );
string returnResult;
if( result != -1 )
{
returnResult = target;
}
delete target;
delete sourceArray;
return returnResult;
}
Any help would be greatly appreciated. Thanks.
Raymond.
std::vector guarantees the data will be contiguous, and you can get a pointer to the first element in the vector by taking the address of the first element (assuming it's not empty).
typedef unsigned char byte;
std::vector<float> original_data;
...
if (!original_data.empty()) {
const float *p_floats = &(original_data[0]); // parens for clarity
Now, to treat that as an array of unsigned char, you use a reinterpret_cast:
const byte *p_bytes = reinterpret_cast<const byte *>(p_floats);
// pass p_bytes to your base-64 encoder
}
You might want to encode the length of the vector before the rest of the data, in order to make it easier to decode them.
CAUTION: You still have to worry about endianness and representation details. This will only work if you read back on the same platform (or a compatible one) that you wrote with.
sourceArray = reinterpret_cast<const unsigned char *>(&(p_vector[0]))
I would highly recommend checking out Google's protobuf to solve your problem. Floats and doubles can vary in size and layout between platforms and that package has solved all those problems for you. Additionally, it can easily handle your data structure should it ever become more complicated than a simple array of floats.
If you do use that, you will have to do your own base64 encoding still as protobuf encodes data assuming you have an 8-bit clean channel to work with. But that's fairly trivial.