What is *(uint32_t *) &buffer[index]? - c++

In some supposed-C++ code I found, I have buffer defined as const void *buffer; (it's arbitrary binary data that, I think, gets interpreted as a stream of 32-bit unsigned integers) and in many places, I have
*(uint32_t *) &buffer[index]
where index is some kind of integer (I think it was long or unsigned long and got swept up in my replacing those with int32_t and uint32_t when I was making the code work on a 64-bit system).
I recognize that this is taking the address of buffer (&buffer), casting it as a pointer to a uint32_t, and dereferencing that, at least based on this question... but then I'm confused by how the [index] part interacts with that or where I missed inserting the [index] part in between the steps I listed.
What, conceptually, is this doing? Is there some way I could define another variable to be a better type, with the casting there once, and then use that, rather than having this complicated expression throughout the code? Is this actually C++ or is this C99?
edit: The first couple of lines of the code are:
const void *buffer = data.bytes;
if (ntohl(*(int32_t *) buffer) != 'ttcf') {
return;
}
uint32_t ttf_count = ntohl(*(uint32_t *) &buffer[0x08]);
where data.bytes has type const void *. Before I was getting buffer from data.bytes, it was char *.
edit 2: Apparently, having const void *buffer work is not normal C (though it absolutely works in my situation), so if it makes more sense, assume it's const char *buffer.

Putting parenthesis in place to make the order of operations more explicit:
*((uint32_t *) &(buffer[index]))
So you're treating buffer as an array, however because buffer is a void * you can't dereference it directly.
Assuming you want to treat this buffer as an array of uint32_t, what you want to do is this:
((uint32_t *)buffer)[index]
Which can also be written as:
*((uint32_t *)buffer + index)
EDIT:
If index is the byte offset in the buffer, that changes things. In that case, I'd recommend defining the buffer as const char * instead of const void *. That way, you can be sure the dereferencing of the array is working properly.
So to break down the expression:
*(uint32_t *) &buffer[index]
You're going index bytes into buffer: buffer[index]
Then taking the address of that byte: &buffer[index]
Then casting that address to a uint32_t: (uint32_t *) &buffer[index]
Then dereferencing the uint32_t value: *(uint32_t *) &buffer[index]

Lots of issues here! First of all, a void * cannot be dereferenced. buffer[index] is illegal in ISO C, although some compilers apparently have an extension that will treat it as (void)((char *)buffer)[index].
You suggest in comments that the code originally used char * - I recommend you leave it that way. Assuming buffer returns to being const char *:
if (ntohl(*(int32_t *) buffer) != 'ttcf') { return; }
The intent here is to pretend that the first four bytes of buffer contain an integer; read that integer, and compare it to 'ttcf'. The latter is a multibyte character constant, the behaviour of which is implementation-defined. It could represent four characters 't', 't', 'c', 'f', or 'f', 'c', 't', 't', or in fact anything else at all of type int.
A greater problem is that pretending a buffer contains an int when it did not actually get written via an expression of type int violates the strict aliasing rule. This is unfortunately a common technique in older code, but even since the first C standard it has caused undefined behaviour. If you use a compiler that performs type-based aliasing optimization it could wreck your code.
A way to write this code avoiding both of those problems is:
if ( memcmp(buffer, "ttcf", 4) ) { return; }
The later line uint32_t ttf_count = ntohl(*(uint32_t *) &buffer[0x08]); has similar issues. In this case there is no doubt that the best fix is:
uint32_t ttf_count;
memcpy(&ttf_count, buffer + 0x08, sizeof ttf_count);
ttf_count = ntohl(ttf_count);
As discussed in comments, you could make an inline function to keep this tidy. In my own code I do something like:
static inline uint32_t be_to_uint32(void const *ptr)
{
unsigned char const *p = ptr;
return p[0] * 0x1000000ul + p[1] * 0x10000ul + p[2] * 0x100 + p[3];
}
and a similar version le_to_uint32 that reads bytes in the opposite order; then I use whichever of those corresponds to the input format instead of using ntohl.

Related

cast to pointer from integer of different size when converting uint64_t to bytes

[EDIT]I wanted write uint64_t to char* array in network byte order to send it as UDP datagram with sendto, uint64_t has 8 bytes so I convert them as follow:
void strcat_number(uint64_t v, char* datagram) {
uint64_t net_order = htobe64(v);
for (uint8_t i=0; i<8 ;++i) {
strcat(datagram, (const char*)((uint8_t*)&net_order)[i]);
}
}
wchich give me
warning: cast to pointer from integer of different size [-Wint-to-pointer-xast]
strcat(datagram, (const char*)((uint8_t*)&net_order)[i]);
how can I get rid of this warning or maybe do this number converting simpler or clearer?
((uint8_t*)&net_order)
this is a pointer to net_order casted to a uint8_t pointer
((uint8_t*)&net_order)[i]
this is the i-th byte of the underlying representation of net_order.
(const char*)((uint8_t*)&net_order)[i]
this is the same as above, but brutally casted to a const char *. This is an invalid pointer, and it is what the compiler is warning you about; even just creating this pointer is undefined behavior, and using it in any way will almost surely result in a crash.
Notice that, even if you somehow managed to make this kludge work, strcat is still the wrong function, as it deals with NUL-terminated strings, while here you are trying to put binary data inside your buffer, and binary data can naturally contain embedded NULs. strcat will append at the first NUL (and stop at the first NUL in the second parameter) instead of at the "real" end.
If you are building a buffer of binary data you have to use straight memcpy, and most importantly you cannot use string-related functions that rely on the final NUL to know where the string ends, but you have to keep track explicitly of how many bytes you used (i.e. the current position in the datagram).

Convert integer to char array and then convert it back

I am a little confused on how casts work in C++.
I have a 4 bytes integer which I need to convert to a char[32] and then convert it back in some other function.
I am doing the following :
uint32_t v = 100;
char ch[32]; // This is 32 bytes reserved memory
memcpy(ch,&v,4);
uint32_t w = *(reinterpret_cast<int*>(ch)); // w should be equal to v
I am getting the correct results on my compiler, but I want to make sure if this is a correct way to do it.
Technically, no. You are at risk of falling foul of your CPU's alignment rules, if it has any.
You may alias an object byte-by-byte using char*, but you can't take an actual char array (no matter where its values came from) and pretend it's some other object.
You will see that reinterpret_cast<int*> method a lot, and on many systems it will appear to work. However, the "proper" method (if you really need to do this at all) is:
const auto INT_SIZE = sizeof(int);
char ch[INT_SIZE] = {};
// Convert to char array
const int x = 100;
std::copy(
reinterpret_cast<const char*>(&x),
reinterpret_cast<const char*>(&x) + INT_SIZE,
&ch[0]
);
// Convert back again
int y = 0;
std::copy(
&ch[0],
&ch[0] + INT_SIZE,
reinterpret_cast<char*>(&y)
);
(live demo)
Notice that I only ever pretend an int is a bunch of chars, never the other way around.
Notice also that I have also swapped your memcpy for type-safe std::copy (although since we're nuking the types anyway, that's sort of by-the-by).

Taking an index out of const char* argument

I have the following code:
int some_array[256] = { ... };
int do_stuff(const char* str)
{
int index = *str;
return some_array[index];
}
Apparently the above code causes a bug in some platforms, because *str can in fact be negative.
So I thought of two possible solutions:
Casting the value on assignment (unsigned int index = (unsigned char)*str;).
Passing const unsigned char* instead.
Edit: The rest of this question did not get a treatment, so I moved it to a new thread.
The signedness of char is indeed platform-dependent, but what you do know is that there are as many values of char as there are of unsigned char, and the conversion is injective. So you can absolutely cast the value to associate a lookup index with each character:
unsigned char idx = *str;
return arr[idx];
You should of course make sure that the arr has at least UCHAR_MAX + 1 elements. (This may cause hilarious edge cases when sizeof(unsigned long long int) == 1, which is fortunately rare.)
Characters are allowed to be signed or unsigned, depending on the platform. An assumption of unsigned range is what causes your bug.
Your do_stuff code does not treat const char* as a string representation. It uses it as a sequence of byte-sized indexes into a look-up table. Therefore, there is nothing wrong with forcing unsigned char type on the characters of your string inside do_stuff (i.e. use your solution #1). This keeps re-interpretation of char as an index localized to the implementation of do_stuff function.
Of course, this assumes that other parts of your code do treat str as a C string.

String encryption function works with char[], but not a plain string

I'm using version xtea encryption from wikipedia that's written in C++. I wrote a function to encrypt a string
const char* charPtrDecrypt(const char* encString, int len, bool encrypt)
{
/********************************************************
* This currently uses a hard-coded key, but I'll implement
* a dynamic key based on string length or something.
*********************************************************/
unsigned int key[4] = { 0xB5D1, 0x22BA, 0xC2BC, 0x9A4E };
int n_blocks=len/BLOCK_SIZE;
if (len%BLOCK_SIZE != 0)
++n_blocks;
for (int i = 0; i < n_blocks; i++)
{
if (encrypt)
xtea::Encrypt(32, (uint32_t*)(encString + (i*BLOCK_SIZE)), key);
else
xtea::Decrypt(32, (uint32_t*)(encString + (i*BLOCK_SIZE)), key);
}
return encString;
}
It works when I supply a const char encString[] = "Hello, World!", but when I supply a raw string e.g. const char* a = charPtrDecrypt("Hello, World!", 14, true) It crashes.
There's an old saying (I know it's old, because I first posted it to Usenet around 1992 or so) that: "If you lie to the compiler, it will get its revenge." That's what's happening here.
Here:
const char* charPtrDecrypt(const char* encString, int len, bool encrypt)
...you promise that you will not modify the characters that encString points at. That's what the const says/means/does.
Here, however:
xtea::Encrypt(32, (uint32_t*)(encString + (i*BLOCK_SIZE)), key);
...you cast away that constness (cast to uint32_t *, with no const qualifier), and pass the pointer to a function that modifies the buffer it points at.
Then the compiler gets its revenge: it allows you to pass a pointer to data you can't modify, because you promise not to modify it--but then when you turn around and try to modify it anyway, your program crashes and burns because you try to modify read-only data.
This can be avoided in any number of ways. One would be to get away from the relatively low-level constructs you're using now, and pass/return std::strings instead of pointers to [const] char.
The code has still more problems than just that though. For one thing, it treats the input as a block of uint32_t items, and rounds its view of the length up to the next multiple of the size of a uint32_t (typically 4). Unfortunately, it doesn't actually change the size of the buffer, so even when the buffer is writable, it doesn't really work correctly--it still reads and writes beyond the end of the buffer.
Here again, std::string will be helpful: it lets us resize the string up to the correct size instead of just reading/writing past the end of the fixed-size buffer.
Along with that, there's a fact the compiler won't care about, but you (and any reader of this code) will (or at least should): the name of the function is misleading, and has parameters whose meaning isn't at all apparent--particularly the Boolean that governs whether to encrypt or decrypt. I'd advise using an enumeration instead, and renaming the function to something that can encompass either encryption or decryption:
Finally, I'd move the if statement that determines whether to encrypt or decrypt outside the loop, since we aren't going to change from one to the other as we process one input string.
Taking all those into account, we could end up with code something like this:
enum direction { ENCRYPT, DECRYPT };
std::string xtea_process(std::string enc_string, direction d) {
unsigned int key[4] = { 0xB5D1, 0x22BA, 0xC2BC, 0x9A4E };
size_t len = enc_string.size();
len += len % BLOCK_SIZE; // round up to next multiple of BLOCK_SIZE
enc_string.resize(len); // enlarge the string to that size, if necessary
if (direction == DECRYPT)
for (size_t i = 0; i < len; i+=BLOCK_SIZE)
xtea::Decrypt(32, reinterpret_cast<uint32_t *>(&encString[i]), key);
else
for (size_t i = 0; i < len; i += BLOCK_SIZE)
xtea::Encrypt(32, reinterpret_cast<uint32_t *>(&encString[i]), key);
}
return encString;
}
This does still leave (at least) one point that I haven't bothered to deal with: some machines may have stricter alignment requirements for a uint32_t than for char, and it's theoretically possible that the buffer used in a string won't meet those stricter alignment requirements. You could run into a situation where you need to copy the data out of the string, into a buffer that's properly aligned for uint32_t access, do the encryption/decryption, then copy the result back.
You pass a constant const char* to the function but cast it to a non-constant uint32_t*. I guess that xtea::Encrypt modifies the string buffer in place.
In the first version const char encString[] = "Hello, World!" the variable --while being const-- most likely lies on the stack which is modifiable. So it's "not nice" to remove the const, but it works.
In the second version you string most likely lies in a read-only data segment. So casting away const let's you call the Encrypt function, but crashes as soon as the function really tries to modify the string.

Convert char* to uint8_t

I transfer message trough a CAN protocol.
To do so, the CAN message needs data of uint8_t type. So I need to convert my char* to uint8_t. With my research on this site, I produce this code :
char* bufferSlidePressure = ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();//My char*
/* Conversion */
uint8_t slidePressure [8];
sscanf(bufferSlidePressure,"%c",
&slidePressure[0]);
As you may see, my char* must fit in sliderPressure[0].
My problem is that even if I have no error during compilation, the data in slidePressure are totally incorrect. Indeed, I test it with a char* = 0 and I 've got unknow characters ... So I think the problem must come from conversion.
My datas can be Bool, Uchar, Ushort and float.
Thanks for your help.
Is your string an integer? E.g. char* bufferSlidePressure = "123";?
If so, I would simply do:
uint8_t slidePressure = (uint8_t)atoi(bufferSlidePressure);
Or, if you need to put it in an array:
slidePressure[0] = (uint8_t)atoi(bufferSlidePressure);
Edit: Following your comment, if your data could be anything, I guess you would have to copy it into the buffer of the new data type. E.g. something like:
/* in case you'd expect a float*/
float slidePressure;
memcpy(&slidePressure, bufferSlidePressure, sizeof(float));
/* in case you'd expect a bool*/
bool isSlidePressure;
memcpy(&isSlidePressure, bufferSlidePressure, sizeof(bool));
/*same thing for uint8_t, etc */
/* in case you'd expect char buffer, just a byte to byte copy */
char * slidePressure = new char[ size ]; // or a stack buffer
memcpy(slidePressure, (const char*)bufferSlidePressure, size ); // no sizeof, since sizeof(char)=1
uint8_t is 8 bits of memory, and can store values from 0 to 255
char is probably 8 bits of memory
char * is probably 32 or 64 bits of memory containing the address of a different place in memory in which there is a char
First, make sure you don't try to put the memory address (the char *) into the uint8 - put what it points to in:
char from;
char * pfrom = &from;
uint8_t to;
to = *pfrom;
Then work out what you are really trying to do ... because this isn't quite making sense. For example, a float is probably 32 or 64 bits of memory. If you think there is a float somewhere in your char * data you have a lot of explaining to do before we can help :/
char * is a pointer, not a single character. It is possible that it points to the character you want.
uint8_t is unsigned but on most systems will be the same size as a char and you can simply cast the value.
You may need to manage the memory and lifetime of what your function returns. This could be done with vector< unsigned char> as the return type of your function rather than char *, especially if toUtf8() has to create the memory for the data.
Your question is totally ambiguous.
ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();
That is a lot of cascading calls. We have no idea what any of them do and whether they are yours or not. It looks dangerous.
More safe example in C++ way
char* bufferSlidePressure = "123";
std::string buffer(bufferSlidePressure);
std::stringstream stream;
stream << str;
int n = 0;
// convert to int
if (!(stream >> n)){
//could not convert
}
Also, if boost is availabe
int n = boost::lexical_cast<int>( str )