I am using an API that uses a struct to represent an array (and allows filling up that array while accessing the struct).
If data is a struct object, and direction is a uint32_t, run the following:
printf("0x%08X", data->magic);
I get the value: 0xAAAABEEF
while printing the array directly as such:
printf("0x");
for (int i = 0; i < size; ++i) {
printf("%02X", payload[i]);
}
I get the value: 0xEFBEAAAA
the struct definition goes like this:
struct Data {
uint32_t magic;
} __attribute__((packed));
I believe the data variable is declared something like this:
// Declared and initialized somewhat like this:
uint8_t payload[kMaxSize];
Data* data = reinterpret_cast<Data*>(payload);
data->magic = 0xAAAABEEF;
I am curious why the printf does not return the same value. Is it because the machine is storing the data as LSB (least significant byte)?
Your guess is correct. In a little-endian processor (eg: x86), the least significant byte is stored first in memory. So the number 0xAAAABEEF will be stored as four bytes in memory: {0xEF, 0xBE, 0xAA, 0xAA}
When your program looks at those four bytes in memory, the way the data is interpreted - its type - determines how it looks. If {0xEF, 0xBE, 0xAA, 0xAA} is interpreted as individual bytes, you get "EF BE AA AA". But if it's interpreted as a uint32_t, then the computer knows to reverse the order and display it as "0xAAAABEEF".
Related
Now there is a unsigned char bytes[4] and I've already known that the byte array is generated from an int in C++. How can I convert the array back to int?
You can do that using std::memcpy():
#include <iostream>
#include <cstring>
int main() {
unsigned char bytes[4]{ 0xdd, 0xcc, 0xbb, 0xaa };
int value;
std::memcpy(&value, bytes, sizeof(int));
std::cout << std::hex << value << '\n';
}
I've already known that the byte array is generated from an int in C++.
It is crucial to know how the array is generated from an int. If the array was generated by simply copying the bytes on the same CPU, then you can convert back by simply copying:
int value;
assert(sizeof value == sizeof bytes);
std::memcpy(&value, bytes, sizeof bytes);
However, if the array may follow another representation than what your CPU uses (for example, if you've received the array from another computer, over the network), then you must convert the representation. In order to convert the representation, you must know what representation the source data follows.
Theoretically, you would need to handle different sign representations, but in practice, 2's complement is fairly ubiquitous. A consideration that is actually relevant in practice is the byte-endianness.
We know integer variable take 4-byte memory address. I just wonder, if we initialize integer variables and make a pointer to it. I can get the value of it from the pointer (which have the address of the variable: 0x22fef8 in my computer). But how about the memory address after 0x22fef8 which is 0x22fef9, 0x22fefa, 0x22fefb? What is in there? Are we will get the value of the variable if we dereference this address? How to access them?
You're right: in a 32-bit computer an integer takes up four bytes. In C, that can be expressed by the following code:
int i = 0x12345678;
int *p_i = &i;`
If p_i gets the value 0x22fef8, then p_i++ would become 0x22fefc since it would point to the next integer. If you want to see what's in the bytes that make up i, you need to use a different pointer:
typedef uint_8 byte;
byte *p_b = (byte *)&i;`
That means that you change the pointer-to-int that &i represents and typecast it to be a pointer-to-byte. It will still have the value 0x22fef8 since that's where the first byte of i is - but now if you do a p_b++ it will change to 0x22fef9. And note that if you print out the original value of *p_b (that is, the byte that it is pointing to), it will not give the same value as i. Depending on the computer, it will print out either the first byte or the last byte: 0x12 or 0x78, or at least the decimal versions thereof.
This is due to the "endianness" of the computer, which affects the storage of multi-byte values. Little-endian computers like the x86 store the littlest part of the value first - the 0x78 - while Power PC computers store the biggest part of the value first - the 0x12.
int types take up four bytes on your system, so those are all occupied by the int. All are inaccessible except the first one.
It is to note some ints take up two bytes on other systems. It isn't regulated by the standard.
If you want to see the values in those addresses, just do this and you will get values of next 10 addresses
int main()
{
int a = 5;
int *p_i = &a; // The address of 'a' is stored in pointer 'p_i'
// Now you want to check the values in 10 further addresses
for ( int i = 0; i < 10; i++ )
{
int value = *p_i; // Here it is getting the address of pointer as integer value
cout << value << endl;
p_i++;
}
return 0;
}
I've got an array of bytes, declared like so:
typedef unsigned char byte;
vector<byte> myBytes = {255, 0 , 76 ...} //individual bytes no larger in value than 255
The problem I have is I need to access the raw data of the vector (without any copying of course), but I need to assign an arbitrary amount of bits to any given pointer to an element.
In other words, I need to assign, say an unsigned int to a certain position in the vector.
So given the example above, I am looking to do something like below:
myBytes[0] = static_cast<unsigned int>(76535); //assign n-bit (here 32-bit) value to any index in the vector
So that the vector data would now look like:
{2, 247, 42, 1} //raw representation of a 32-bit int (76535)
Is this possible? I kind of need to use a vector and am just wondering whether the raw data can be accessed in this way, or does how the vector stores raw data make this impossible or worse - unsafe?
Thanks in advance!
EDIT
I didn't want to add complication, but I'm constructing variously sized integer as follows:
//**N_TYPES
u16& VMTypes::u8sto16(u8& first, u8& last) {
return *new u16((first << 8) | last & 0xffff);
}
u8* VMTypes::u16to8s(u16& orig) {
u8 first = (u8)orig;
u8 last = (u8)(orig >> 8);
return new u8[2]{ first, last };
}
What's terrible about this, is I'm not sure of the endianness of the numbers generated. But I know that I am constructing and destructing them the same everywhere (I'm writing a stack machine), so if I'm not mistaken, endianness is not effected with what I'm trying to do.
EDIT 2
I am constructing ints in the following horrible way:
u32 a = 76535;
u16* b = VMTypes::u32to16s(a);
u8 aa[4] = { VMTypes::u16to8s(b[0])[0], VMTypes::u16to8s(b[0])[1], VMTypes::u16to8s(b[1])[0], VMTypes::u16to8s(b[1])[1] };
Could this then work?:
memcpy(&_stack[0], aa, sizeof(u32));
Yes, it is possible. You take the starting address by &myVector[n] and memcpy your int to that location. Make sure that you stay in the bounds of your vector.
The other way around works too. Take the location and memcpy out of it to your int.
As suggested: by using memcpy you will copy the byte representation of your integer into the vector. That byte representation or byte order may be different from your expectation. Keywords are big and little endian.
As knivil says, memcpy will work if you know the endianess of your system. However, if you want to be safe, you can do this with bitwise arithmetic:
unsigned int myInt = 76535;
const int ratio = sizeof(int) / sizeof(byte);
for(int b = 0; b < ratio; b++)
{
myBytes[b] = byte(myInt >> (8*sizeof(byte)*(ratio - b)));
}
The int can be read out of the vector using a similar pattern, if you want me to show you how let me know.
I'm wondering why — in my sample code — when converting a reference to the first element of m_arr to a pointer of bigger size the program reads the memory to m_val in little-endian byte order? With this way of thinking *(std::uint8_t*)m_arr should point to 0x38, but it doesn't.
My CPU uses little-endian byte order.
#include <iostream>
#include <iomanip>
int main() {
std::uint8_t m_arr[2] = { 0x5a, 0x38 };
// explain why m_val is 0x385a and not 0x5a38
std::uint16_t m_val = *(std::uint16_t*)m_arr;
std::cout << std::hex << m_val << std::endl;
return 0;
}
Byte ordering is the order in which bytes are laid out when referenced as their native type. Regardless of whether your machine is big or little endian, a sequence of bytes is always in its natural order.
The situation you describe (where the first byte is 0x38) is what you would observe if you created a uint16_t and got a uint8_t pointer to it. Instead, you have a uint8_t array and you get a uint16_t pointer to it.
Little endian means that the least significant byte goes first:
So translate that logic to your array, { 0x5a, 0x38 }. On a little endian system, the 0x5a is least significant and 0x38 is most significant... hence you get 0x385a.
#include <iostream>
#include <bitset>
typedef struct
{
int i;
char a[4];
uint8_t j:1;
uint8_t k:1;
} abctest;
int main()
{
abctest tryabc;
memset(&tryabc, 0x00, sizeof(tryabc));
std::bitset<1> b;
b = false;
std::cout << b << '\n';
b = true;
std::cout << sizeof(b) << '\n';
}
My doubt is like I have a char array, it is basically a structure received in some module, in this structure I have bit fields also, I can use memcpy but I cannot
Type cast the buffer to structure (for e.g if my char* arr is actually of type struct abc, I cannot do abc* temp = (abc*)arr)
All I can do is memcpy only, So I want to know with out type casting how I can fill the bit fields.
If you know the literal data type and its size in bytes, a variable can be used with bit-shifting to store and extract bits into the array. This is a lower-level function that still exists in C++ but is more related to the low-level programming style of C.
Another way is to use the division and modulo with powers of 2 to encode bits at exact locations. I'd suggest you look up how binary works first and then figure out that shifting to the right by 1 actually divides by 2.
Cheers!
Why can't you typecast a char array into an abctest pointer? I tested it and all works well:
typedef struct
{
int i;
char a[4];
uint8_t j:1;
uint8_t k:1;
} abctest;
int main(int argc, char **argv)
{
char buf[9];
abctest *abc = (abctest*)buf;
memset(buf, 0x00, sizeof(buf));
printf("%d\n", abc->j);
}
However, while you definitely CAN typecast a char array into an abctest pointer, it doesn't mean you SHOULD do that. I think you should definitely learn about data serialization and unserialization. If you want to convert a complex data structure into a character array, typecasting is not the solution as the data structure members may have different sizes or alignment constraints on 64-bit machine than on a 32-bit machine. Furthermore, if you typecast a char array into a struct pointer, the alignment may be incorrect which may result in problems using RISC processors.
You could serialize the code by e.g. writing i as a 32-bit integer in network byte order, a as 4 characters and j and k as two bits in one character (the rest 6 being unused). Then when you unserialize it, you read i from the 32-bit integer in network byte order, the a from 4 characters and the remaining character gives j and k.