I'm trying this pointer trick and I can't figure out how to fix it, I'm running g++ 4.6 on ubuntu 12.04 64-bit. Check out this code below:
int arr[5];
arr[3] = 50;
((short*) arr)[6] = 2;
cout << arr[3] << endl;
The logic is: since short is 2 bytes, int is 4 bytes, I want to change the first 2 bytes in arr[3], while keeping the value of the second 2 bytes as 50. So I'm just messing with the bit pattern. Unfortunately, sizeof(int*) and sizeof(short*) are both 8 bytes. Is there a type cast that returns a pointer of size 2 bytes?
Update:
I realized that the question is poorly written, so I'll fix that:
The output from cout << arr[3] << endl; I'm getting is 2. The output I would like to get is neither 2 nor 50, but rather a large number, indicating that the left part of the int bit pattern has been changed, while the right part (the second 2-bits) of the int stored in arr[3] is still unchanged.
sizeof(int*) and sizeof(short*) are both going to be the same -- as will sizeof(void*) -- you're asking for the size of a pointer, not the size of the thing the pointer points to.
Use sizeof(int) or sizeof(short) instead.
Now, as for your code snippet, you are making assumptions about the endianness of the machine on which you're running. The "first" part of the int on a given platform may be the bytes with higher address, or the bytes with lower address.
For instance, your memory block may be laid out like this. Let's say the least significant byte has index zero, and the most significant byte has index one. On a big endian architecture an int may look like this:
<------------- 4 bytes --------------->
+---------+---------+---------+---------+
| int:3 | int:2 | int:1 | int:0 |
| short:1 | short:0 | short:1 | short:0 |
+---------+---------+---------+---------+
Notice how the first short in the int -- which in your case would have been ((short*) arr)[6] -- contains the most significant bits of the int, not the least significant. So if you overwrite ((short*) arr)[6], you are overwriting the most significant bits of arr[3], which appears to be what you wanted. But x64 is not a big endian machine.
On a little endian architecture, you would see this instead:
<------------- 4 bytes --------------->
+---------+---------+---------+---------+
| int:0 | int:1 | int:2 | int:3 |
| short:0 | short:1 | short:0 | short:1 |
+---------+---------+---------+---------+
leading to the opposite behavior -- ((short*) arr)[6] would be the least significant bits of arr[3], and ((short*) arr)[7] would be the most significant.
Here's what my machine happens to do -- your machine may be different:
C:\Users\Billy\Desktop>type example.cpp
#include <iostream>
int main()
{
std::cout << "Size of int is " << sizeof(int) << " and size of short is "
<< sizeof(short) << std::endl;
int arr[5];
arr[3] = 50;
((short*) arr)[6] = 2;
std::cout << arr[3] << std::endl;
((short*) arr)[7] = 2;
std::cout << arr[3] << std::endl;
}
C:\Users\Billy\Desktop>cl /W4 /EHsc /nologo example.cpp && example.exe
example.cpp
Size of int is 4 and size of short is 2
2
131074
Your problem is due to endianness. Intel CPU's are little endian meaning that the first byte of an int is stored in the the first address. Let me how you can example:
Let's assume that arr[3] is at address 10:
Then arr[3] = 50; Writes the following to memory
10: 0x32
11: 0x00
12: 0x00
13: 0x00
And ((short*) arr)[6] = 2; writes the following to memory
10: 0x02
11: 0x00
When you index a pointer, it adds the index multiplied by the size of the pointed-to type. So, you don't need a 2-byte pointer.
You are making a lot of assumptions which might not hold water. Also what does the sizeof pointers have to do with the problem at hand?
Why not just use bit masks:
arr[3] |= (top_2_bytes << 16);
This should set the upper 16 bytes without disturbing the lower 16. (You may get into signed/unsigned dramas)
Said all the above, the standard prohibits doing such things: Setting a variable through a pointer to another type is calling undefined behaviour. If you know how your machine works (sizes of int and short, endianness, ...) and you know how your compiler (is likely to) translate your code, then you might get away with it. Does for neat parlor tricks, and spectacular explosions when the machine/compiler/phase of the moon change.
If it wins any performance, the win will be minimal, and it could even be a net loss (one compiler long ago I fiddled with, playing the "I can implement this loop better than the compiler, I know exactly what goes on here" generated much worse code for my `label: ... if() goto label; than the exact same written natively as a loop: My "smart" code confused the compiler, its "pattern for loops" did not apply).
You wouldn't want your actual pointer to be of size two bytes; this would mean that it could only access ~16k of memory addresses. However, using a cast, as you are, to a short *, will let you access the memory every two bytes (since the compiler will view the array as an array of shorts, not of ints).
Related
I am studying a C++ code in Arduino example for reading external data.
The code use casting (int16_t) and also (int).
int16_t is fixed integer type, but I don't understand it's purpose in the code.
_vRaw[0] = (int)(int16_t)(Wire.read() | Wire.read() << 8);
Is there any difference if I write like below?
_vRaw[0] = (int)(Wire.read() | Wire.read() << 8);
I've written code like this before so I can explain the purpose of the cast to int16_t.
The Arduino command Wire.read() generally returns a number between 0 and 255 representing a byte read from the I2C device.
Sometimes, two of the bytes you read from a device will represent a single signed 16-bit number using Two's complement. So, for example, if the first byte is 1 and the second byte is 2, we want to interpret that as a value of 1 + 2*256 = 513. If both bytes are 255, we want to interpret that as -1.
Let's assume that your code is being compiled for a system where int is 32-bit (almost any 32-bit microcontroller), and both bytes are 255, and think about this expression:
(Wire.read() | Wire.read() << 8)
The value of this expression would simply be 255 | (255 << 8) which is 65535. That is bad; we wanted the value to be -1.
The easy way to fix that bug is to add a cast to int16_t:
(int16_t)(Wire.read() | Wire.read() << 8)
When we cast 65535 to an int16_t type, we get -1 because an int16_t cannot hold 65535 (it can only hold values from -32768 to 32767). (The C++ standard does not guarantee that we get -1, but it is implementation defined behavior so you can check the manual of your compiler if you want to make sure.) If we later cast it back to an int (either with an explicit cast or by setting some int equal to it), the compiler will do the correct signed extension operation and give us an int with a value of -1.
By the way, I'm not confident that your compiler will run the two calls to Wire.read() in the right order unless you have a semicolon between then. Also, the final cast to int is probably not needed. So I would rewrite the code to be something like this:
uint8_t lsb = Wire.read();
_vRaw[0] = (int16_t)(lsb | Wire.read() << 8);
The (int16_t) cast will truncate the value to 16 bits. The (int) cast does nothing, unless _vRaw[0] has a type that is implicitly convertable from int but not int16_t, which is very unlikely.
In all probability, Wire.read() returns char, and neither of these casts do anything, because (Wire.read() | Wire.read() << 8) is already an int expression that fits within 16 bits.
If the string is:
char message[] = "HI THERE";
How would I take the first four bytes ("HI T") and typecast them to a 4 byte int?
Total value should equal 1411402056.
The safe way to do it which will always work is to just shift each byte individually:
uint32_t i = (uint8_t(message[0]) << 24) | (uint8_t(message[1]) << 16) | (uint8_t(message[2]) << 8) | uint8_t(message[3]);
You might need to reverse the order of the bytes depending on the endianess of your string.
You may also be able to simply reinterpret cast to an integer depending on the endianess of the string matching the endianess of your processor, whether the string is aligned to the correct byte boundary etc:
Int i = *reinterpret_cast<int*>(message);
The simplest way is:
int32_t num = *reinterpret_cast<int32_t*>(message);
But this is technically a violation of Strict Aliasing. A safer way is:
int32_t num;
memcpy(&num, message, sizeof(num));
Though, to be really safe, you should use the bit-shifting approach described in Alan Birtles's answer.
You might have to swap the order of the int's bytes afterwards, depending on the endian of your system.
So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!
There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.
In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.
If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15
Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.
One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.
if I have the following code:
int i = 5;
void * ptr = &i;
printf("%p", ptr);
Will I get the LSB address of i, or the MSB?
Will it act differently between platforms?
Is there a difference here between C and C++?
Consider the size of int is 4 bytes. Always &i will gives you the first address of those 4 bytes.
If the architecture is little endian, then the lower address will have the LSB like below.
+------+------+------+------+
Address | 1000 | 1001 | 1002 | 1003 |
+------+------+------+------+
Value | 5 | 0 | 0 | 0 |
+------+------+------+------+
If the architecture is big endian, then the lower address will have the MSB like below.
+------+------+------+------+
Address | 1000 | 1001 | 1002 | 1003 |
+------+------+------+------+
Value | 0 | 0 | 0 | 5 |
+------+------+------+------+
So &i will give LSB address of i if little endian or it will give MSB address of i if big endian
In mixed endian mode also, either little or big endian will be chosen for each task dynamically.
Below logic will tells you the endianess
int i = 5;
void * ptr = &i;
char * ch = (char *) ptr;
printf("%p", ptr);
if (5 == (*ch))
printf("\nlittle endian\n");
else
printf("\nbig endian\n");
This behaviour will be same for both c and c++
Will I get the LSB address of i, or the MSB?
This is platform dependent: it will be the lowest addressed byte, which may be MSB or LSB depending on your platform's endianness.
Although this is not written in the standard directly, this is what's implied by section 6.3.2.3.7:
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.
Will it act differently between platforms?
Yes
Is there a difference here between c and c++?
No: it is platform-dependent in both C and C++
It depends on the endianness of the platform; if it's a little-endian platform, you'll get a pointer to the LSB, if it's a big-endian platform it will point the MSB. There are even some mixed-endian plaforms, in that case may God have mercy of your soul check the specific documentation of your compiler/CPU.
Still, you can perform a quick check at runtime:
uint32_t i=0x01020304;
char le[4]={4, 3, 2, 1};
char be[4]={1, 2, 3, 4};
if(memcmp(&i, le, 4)==0)
puts("Little endian");
else if(memcmp(&i, be, 4)==0)
puts("Big endian");
else
puts("Mixed endian");
By the way, to print pointers you must use the %p placeholder, not %d.
ptr stores the address of the starting byte of the integer object. Whether this is where the most or the least significant byte is stored depends on your platform. Some weird platforms even use mixed endianness in which case it'll be neither the MSB nor the LSB.
There is no difference between C and C++ in that respect.
What it points is MSB for my VC++ 2010 and Digital Mars. But it is related to endianness.
This question's answers give some infor for you:
Detecting endianness programmatically in a C++ program.
Here, user "none" says:
#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1
int TestByteOrder()
{
short int word = 0x0001;
char *byte = (char *) &word;
return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}
This gives some endianness info
well I get the LSB address of i, or the MSB?
It depends on the machine and the OS. On big endian machines and OS's you will get the MSB and on little endian machines and OS's you will get the LSB.
Windows is always little endian. All (most ?) flavors of Linux/Unix on x86 is little endian. Linux/Unix on Motorola machines is big endian. Mac OS on x86 machines is little endian. On PowerPC machines it's big endian.
well it act differently between platforms?
Yes it will.
is there a difference here between c and c++?
Probably not.
I want to or two big chunks of memory... but it doesn't work
Consider I have three char * bm, bm_old, and bm_res.
#define to_uint64(buffer,n) {(uint64_t)buffer[n] << 56 | (uint64_t)buffer[n+1] << 48 | (uint64_t)buffer[n+2] << 40 | (uint64_t)buffer[n+3] << 32 | (uint64_t) buffer[n+4] << 24 | (uint64_t)buffer[n+5] << 16 | (uint64_t)buffer[n+6] << 8 | (uint64_t)buffer[n+7];}
...
for (unsigned int i=0; i<bitmapsize(size)/8; i++){
uint64_t or_res = (to_uint64(bm_old,i*8)) | (to_uint64(bm,i*8));
memcpy(bm_res+i*sizeof(uint64_t), &or_res, sizeof(uint64_t));
}
bm_res is not correct!
Have any clue?
Thanks,
Amir.
Enclose the definition of to_uint64 in parentheses () instead of braces {} and get rid of the semicolon at the end. Using #define creates a macro whose text is inserted verbatim wherever it's used, not an actual function, so you were attempting to |-together two blocks rather than those blocks' "return values."
I think you need to advance your output pointer by the correct size:
memcpy(bm_res + i * sizeof(uint64_t), &or_res, sizeof(uint64_t));
^^^^^^^^^^^^^^^^^^^^
Since bm_res is a char-pointer, + 1 advances by just one byte.
You're incrementing bm_res by one for every eight-byte block you move. Further, you never increment bm or bm_old at all. So you're basically tiling the first byte of or_res over bm_res, which is probably not what you want.
More importantly, your code is byte-order sensitive - whether or_res is represented in memory as least-order-byte first or highest-order-byte first matters.
I would recommend you just do a byte-by-byte or first, and only try to optimize it if that is too slow. When you do optimize it, don't use your crazy to_uint64 macro there - it'll be slower than just going byte-by-byte. Instead, cast to uint64_t * directly. While this is, strictly speaking, undefined behavior, it works on every platform I've ever seen, and should be byteorder agnostic.