Inconsistent results when dealing with endianness and arrays in C++

Inconsistent results when dealing with endianness and arrays in C++ - c++

I'm wondering why — in my sample code — when converting a reference to the first element of m_arr to a pointer of bigger size the program reads the memory to m_val in little-endian byte order? With this way of thinking *(std::uint8_t*)m_arr should point to 0x38, but it doesn't.
My CPU uses little-endian byte order.
#include <iostream>
#include <iomanip>
int main() {
std::uint8_t m_arr[2] = { 0x5a, 0x38 };
// explain why m_val is 0x385a and not 0x5a38
std::uint16_t m_val = *(std::uint16_t*)m_arr;
std::cout << std::hex << m_val << std::endl;
return 0;
}

Byte ordering is the order in which bytes are laid out when referenced as their native type. Regardless of whether your machine is big or little endian, a sequence of bytes is always in its natural order.
The situation you describe (where the first byte is 0x38) is what you would observe if you created a uint16_t and got a uint8_t pointer to it. Instead, you have a uint8_t array and you get a uint16_t pointer to it.

Little endian means that the least significant byte goes first:
So translate that logic to your array, { 0x5a, 0x38 }. On a little endian system, the 0x5a is least significant and 0x38 is most significant... hence you get 0x385a.

Related

C++ byte array to int

Now there is a unsigned char bytes[4] and I've already known that the byte array is generated from an int in C++. How can I convert the array back to int?

You can do that using std::memcpy():
#include <iostream>
#include <cstring>
int main() {
unsigned char bytes[4]{ 0xdd, 0xcc, 0xbb, 0xaa };
int value;
std::memcpy(&value, bytes, sizeof(int));
std::cout << std::hex << value << '\n';
}

I've already known that the byte array is generated from an int in C++.
It is crucial to know how the array is generated from an int. If the array was generated by simply copying the bytes on the same CPU, then you can convert back by simply copying:
int value;
assert(sizeof value == sizeof bytes);
std::memcpy(&value, bytes, sizeof bytes);
However, if the array may follow another representation than what your CPU uses (for example, if you've received the array from another computer, over the network), then you must convert the representation. In order to convert the representation, you must know what representation the source data follows.
Theoretically, you would need to handle different sign representations, but in practice, 2's complement is fairly ubiquitous. A consideration that is actually relevant in practice is the byte-endianness.

Convert BYTE array into unsigned long long int

I'm trying to convert a BYTE array into an equivalent unsigned long long int value but my coding is not working as expected. Please help with fixing it or suggest an alternative method for the same.
Extra Information: These 4 bytes are combined as a hexadecimal number and an equivalent decimal number is an output. Say for a Given byteArray= {0x00, 0xa8, 0x4f, 0x00}, Hexadecimal number is 00a84f00 and it's equivalent decimal number is 11030272.
#include <iostream>
#include <string>
typedef unsigned char BYTE;
int main(int argc, char *argv[])
{
BYTE byteArray[4] = { 0x00, 0x08, 0x00, 0x00 };
std::string str(reinterpret_cast<char*>(&byteArray[0]), 4);
std::cout << str << std::endl;
unsigned long long ull = std::strtoull(str.c_str(), NULL, 0);
printf ("The decimal equivalents are: %llu", ull);
return EXIT_SUCCESS;
}
I'm getting the following output:
The decimal equivalents are: 0
While the expected output was:
The decimal equivalents are: 2048

When you call std::strtoull(str.c_str(), NULL, 0);, its first argument supplied is equivalent to an empty string, as string is essentially a null-terminated sequence of characters.
Second, std::strtoull() does not convert with byte sequences, it converts with the literal meaning of strings. i.e. you'll get 2048 with std::strtoull("2048", NULL, 10).
Another thing to note is that unsigned long long is a 64-bit data type, whereas your byte array only provides 32 bits. You need to fill the other 32 bytes with zero to get the correct result. I use a direct assignment, but you could also use std::memset() here.
What you want to do is:
ull = 0ULL;
std::memcpy(&ull, byteArray, 4);
Given your platform has little-endian, the result should be 2048.

What you first must remember is that a string, is really a null-terminated string. Secondly, a string is a string of characters, which is not what you have. The third problem is that you have an array of four bytes, which corresponds to an unsigned 32-bit integer, and you want an (at least) 64-bit types which is 8 bytes.
You can solve all these problems with a temporary variable, a simple call to std::memcpy, and an assignment:
uint32_t temp;
std::memcpy(&temp, byteArray, 4);
ull = temp;
Of course, this assumes that the endianness is correct.
Note that I use std::memcpy instead of std::copy (or std::copy_n) because std::memcpy is explicitly mentioned to be able to bypass strict aliasing this way, while I don't think the std::copy functions are. Also the std::copy functions are more for copying elements and not anonymous bytes (even if they can do that too, but with a clunkier syntax).

Given the answers are using std::memcpy, I want to point out that there's a more idiomatic way of doing this operation:
char byteArray[] = { 0x00, 0x08, 0x00, 0x00 };
uint32_t cp;
std::copy(byteArray, byteArray + sizeof(cp), reinterpret_cast<char*>(&cp));
std::copy is similar to std::memcpy, but is the C++ way of doing it.
Note that you need to cast the address of the output variable cp to one of: char *, unsigned char *, signed char *, or std::byte *, because otherwise the operation wouldn't be byte oriented.

Printing array vs. integer as hex

I am using an API that uses a struct to represent an array (and allows filling up that array while accessing the struct).
If data is a struct object, and direction is a uint32_t, run the following:
printf("0x%08X", data->magic);
I get the value: 0xAAAABEEF
while printing the array directly as such:
printf("0x");
for (int i = 0; i < size; ++i) {
printf("%02X", payload[i]);
}
I get the value: 0xEFBEAAAA
the struct definition goes like this:
struct Data {
uint32_t magic;
} __attribute__((packed));
I believe the data variable is declared something like this:
// Declared and initialized somewhat like this:
uint8_t payload[kMaxSize];
Data* data = reinterpret_cast<Data*>(payload);
data->magic = 0xAAAABEEF;
I am curious why the printf does not return the same value. Is it because the machine is storing the data as LSB (least significant byte)?

Your guess is correct. In a little-endian processor (eg: x86), the least significant byte is stored first in memory. So the number 0xAAAABEEF will be stored as four bytes in memory: {0xEF, 0xBE, 0xAA, 0xAA}
When your program looks at those four bytes in memory, the way the data is interpreted - its type - determines how it looks. If {0xEF, 0xBE, 0xAA, 0xAA} is interpreted as individual bytes, you get "EF BE AA AA". But if it's interpreted as a uint32_t, then the computer knows to reverse the order and display it as "0xAAAABEEF".

With out type casting how I can fill the bit fields

#include <iostream>
#include <bitset>
typedef struct
{
int i;
char a[4];
uint8_t j:1;
uint8_t k:1;
} abctest;
int main()
{
abctest tryabc;
memset(&tryabc, 0x00, sizeof(tryabc));
std::bitset<1> b;
b = false;
std::cout << b << '\n';
b = true;
std::cout << sizeof(b) << '\n';
}
My doubt is like I have a char array, it is basically a structure received in some module, in this structure I have bit fields also, I can use memcpy but I cannot
Type cast the buffer to structure (for e.g if my char* arr is actually of type struct abc, I cannot do abc* temp = (abc*)arr)
All I can do is memcpy only, So I want to know with out type casting how I can fill the bit fields.

If you know the literal data type and its size in bytes, a variable can be used with bit-shifting to store and extract bits into the array. This is a lower-level function that still exists in C++ but is more related to the low-level programming style of C.
Another way is to use the division and modulo with powers of 2 to encode bits at exact locations. I'd suggest you look up how binary works first and then figure out that shifting to the right by 1 actually divides by 2.
Cheers!

Why can't you typecast a char array into an abctest pointer? I tested it and all works well:
typedef struct
{
int i;
char a[4];
uint8_t j:1;
uint8_t k:1;
} abctest;
int main(int argc, char **argv)
{
char buf[9];
abctest *abc = (abctest*)buf;
memset(buf, 0x00, sizeof(buf));
printf("%d\n", abc->j);
}
However, while you definitely CAN typecast a char array into an abctest pointer, it doesn't mean you SHOULD do that. I think you should definitely learn about data serialization and unserialization. If you want to convert a complex data structure into a character array, typecasting is not the solution as the data structure members may have different sizes or alignment constraints on 64-bit machine than on a 32-bit machine. Furthermore, if you typecast a char array into a struct pointer, the alignment may be incorrect which may result in problems using RISC processors.
You could serialize the code by e.g. writing i as a 32-bit integer in network byte order, a as 4 characters and j and k as two bits in one character (the rest 6 being unused). Then when you unserialize it, you read i from the 32-bit integer in network byte order, the a from 4 characters and the remaining character gives j and k.

Pointers Casting Endianness

#include "stdio.h"
typedef struct CustomStruct
{
short Element1[10];
}CustomStruct;
void F2(char* Y)
{
*Y=0x00;
Y++;
*Y=0x1F;
}
void F1(CustomStruct* X)
{
F2((char *)X);
printf("s = %x\n", (*X).Element1[0]);
}
int main(void)
{
CustomStruct s;
F1(&s);
return 0;
}
The above C code prints 0x1f00 when compiled and ran on my PC.
But when I flash it to an embedded target (uController) and debugging, I find that
(*X).Element1[0] = 0x001f.
1- Why the results are different on PC and on the embedded target?
2- What can I modify in this code so that it prints 0x001f in the PC case,
without changing the core of code (by adding a compiler option or something maybe).

shorts are typically two bytes and 16 bits. When you say:
short s;
((char*)&s)[0] = 0x00;
((char*)&s)[1] = 0x1f;
This sets the first of those two bytes to 0x00 and the second of those two bytes to 0x1f. The thing is that C++ doesn't specify what setting the first or second byte does to the value of the overall short, so different platforms can do different things. In particular, some platforms say that setting the first byte affects the 'most significant' bits of the short's 16 bits and setting the second byte affects the 'least significant' bits of the short's 16 bits. Other platforms say the opposite; That setting the first byte affect the least significant bits and setting the second byte affects the most significant bits. These two platform behaviors are referred to as big-endian and little-endian respectively.
The solution to getting consistent behavior independent of these differences is to not access the bytes of the short this way. Instead you should simply manipulate the value of the short using methods that the language does define, such as with bitwise and arithmetic operators.
short s;
s = (0x1f << 8) | (0x00 << 0); // set the most significant bits to 0x1f and the least significant bits to 0x00.
The problem is that, for many reasons, I can only change the body of the function F2. I can not change its prototype. Is there a way to find the sizeof Y before it have been castled or something?
You cannot determine the original type and size using only the char*. You have to know the correct type and size through some other means. If F2 is never called except with CustomStruct then you can simply cast the char* back to CustomStruct like this:
void F2(char* Y)
{
CustomStruct *X = (CustomStruct*)Y;
X->Element[0] = 0x1F00;
}
But remember, such casts are not safe in general; you should only cast a pointer back to what it was originally cast from.

The portable way is to change the definition of F2:
void F2(short * p)
{
*p = 0x1F;
}
void F1(CustomStruct* X)
{
F2(&X.Element1[0]);
printf("s = %x\n", (*X).Element1[0]);
}
When you reinterpret an object as an array of chars, you expose the implementation details of the representation, which is inherently non-portable and... implementation-dependent.
If you need to do I/O, i.e. interface with a fixed, specified, external wire format, use functions like htons and ntohs to convert and leave the platform specifics to your library.

It appears that the PC is little endian and the target is either big-endian, or has 16-bit char.
There isn't a great way to modify the C code on the PC, unless you replace your char * references with short * references, and perhaps use macros to abstract the differences between your microcontroller and your PC.
For example, you might make a macro PACK_BYTES(hi, lo) that packs two bytes into a short the same way, regardless of machine endian. Your example becomes:
#include "stdio.h"
#define PACK_BYTES(hi,lo) (((short)((hi) & 0xFF)) << 8 | (0xFF & (lo)))
typedef struct CustomStruct
{
short Element1[10];
}CustomStruct;
void F2(short* Y)
{
*Y = PACK_BYTES(0x00, 0x1F);
}
void F1(CustomStruct* X)
{
F2(&(X->Element1[0]));
printf("s = %x\n", (*X).Element1[0]);
}
int main(void)
{
CustomStruct s;
F1(&s);
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Inconsistent results when dealing with endianness and arrays in C++ - c++

Little endian means that the least significant byte goes first: So translate that logic to your array, { 0x5a, 0x38 }. On a little endian system, the 0x5a is least significant and 0x38 is most significant... hence you get 0x385a.

Related

C++ byte array to int

Convert BYTE array into unsigned long long int

Printing array vs. integer as hex

With out type casting how I can fill the bit fields

Pointers Casting Endianness

Categories

Resources