Does a pointer point to the LSB or MSB? - c++

if I have the following code:
int i = 5;
void * ptr = &i;
printf("%p", ptr);
Will I get the LSB address of i, or the MSB?
Will it act differently between platforms?
Is there a difference here between C and C++?

Consider the size of int is 4 bytes. Always &i will gives you the first address of those 4 bytes.
If the architecture is little endian, then the lower address will have the LSB like below.
+------+------+------+------+
Address | 1000 | 1001 | 1002 | 1003 |
+------+------+------+------+
Value | 5 | 0 | 0 | 0 |
+------+------+------+------+
If the architecture is big endian, then the lower address will have the MSB like below.
+------+------+------+------+
Address | 1000 | 1001 | 1002 | 1003 |
+------+------+------+------+
Value | 0 | 0 | 0 | 5 |
+------+------+------+------+
So &i will give LSB address of i if little endian or it will give MSB address of i if big endian
In mixed endian mode also, either little or big endian will be chosen for each task dynamically.
Below logic will tells you the endianess
int i = 5;
void * ptr = &i;
char * ch = (char *) ptr;
printf("%p", ptr);
if (5 == (*ch))
printf("\nlittle endian\n");
else
printf("\nbig endian\n");
This behaviour will be same for both c and c++

Will I get the LSB address of i, or the MSB?
This is platform dependent: it will be the lowest addressed byte, which may be MSB or LSB depending on your platform's endianness.
Although this is not written in the standard directly, this is what's implied by section 6.3.2.3.7:
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.
Will it act differently between platforms?
Yes
Is there a difference here between c and c++?
No: it is platform-dependent in both C and C++

It depends on the endianness of the platform; if it's a little-endian platform, you'll get a pointer to the LSB, if it's a big-endian platform it will point the MSB. There are even some mixed-endian plaforms, in that case may God have mercy of your soul check the specific documentation of your compiler/CPU.
Still, you can perform a quick check at runtime:
uint32_t i=0x01020304;
char le[4]={4, 3, 2, 1};
char be[4]={1, 2, 3, 4};
if(memcmp(&i, le, 4)==0)
puts("Little endian");
else if(memcmp(&i, be, 4)==0)
puts("Big endian");
else
puts("Mixed endian");
By the way, to print pointers you must use the %p placeholder, not %d.

ptr stores the address of the starting byte of the integer object. Whether this is where the most or the least significant byte is stored depends on your platform. Some weird platforms even use mixed endianness in which case it'll be neither the MSB nor the LSB.
There is no difference between C and C++ in that respect.

What it points is MSB for my VC++ 2010 and Digital Mars. But it is related to endianness.
This question's answers give some infor for you:
Detecting endianness programmatically in a C++ program.
Here, user "none" says:
#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1
int TestByteOrder()
{
short int word = 0x0001;
char *byte = (char *) &word;
return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}
This gives some endianness info

well I get the LSB address of i, or the MSB?
It depends on the machine and the OS. On big endian machines and OS's you will get the MSB and on little endian machines and OS's you will get the LSB.
Windows is always little endian. All (most ?) flavors of Linux/Unix on x86 is little endian. Linux/Unix on Motorola machines is big endian. Mac OS on x86 machines is little endian. On PowerPC machines it's big endian.
well it act differently between platforms?
Yes it will.
is there a difference here between c and c++?
Probably not.

Related

c++ combining 2 uint8_t into one uint16_t not working?

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!
There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.
In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.
If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15
Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.
One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.

Escape sequence in char confusion

I came across this example here
.
#include <iostream>
int main()
{
int i = 7;
char* p = reinterpret_cast<char*>(&i);
if(p[0] == '\x7') //POINT OF INTEREST
std::cout << "This system is little-endian\n";
else
std::cout << "This system is big-endian\n";
}
What I'm confused about is the if statement. How do the escape sequences behave here? I get the same result with p[0] == '\07' (\x being hexadecimal escape sequence). How would checking if p[0] == '\x7' tell me if the system is little or big endian?
The layout of a (32-bit) integer in memory is;
Big endian:
+-----+-----+-----+-----+
| 0 | 0 | 0 | 7 |
+-----+-----+-----+-----+
^ pointer to int points here
Little endian:
+-----+-----+-----+-----+
| 7 | 0 | 0 | 0 |
+-----+-----+-----+-----+
^ pointer to int points here
What the code basically does is read the first char that the integer pointer points to, which in case of little endian is \x0, and big endian is \x7.
Hex 7 and octal 7 happens to be the same value, as is decimal 7.
The check is intended to try to determine if the value ends up in the first or last byte of the int.
A little endian system will store the bytes of the value in "reverse" order, with the lower part first
07 00 00 00
A big endian system would store the "big" end first
00 00 00 07
By reading the first byte, the code will see if the 7 ends up there, or not.
7 in decimal is the same as 7 in hexadecimal and 7 in octal, so it doesn't matter if you use '\x7', '\07', or even just 7 (numeric literal, not a character one).
As for the endianness test: the value of i is 7, meaning it will have the number 7 in its least significant byte, and 0 in all other bytes. The cast char* p = reinterpret_cast<char*>(&i); makes p point to the first byte in the representation of i. The test then checks whether that byte's value is 7. If so, it's the least significant byte, implying a little-endian system. If the value is not 7, it's not a little-endian system. The code assumes that it's a big-endian, although that's not strictly established (I believe there were exotic systems with some sort of mixed endianness as well, although the code will probably not run on such in practice).

Shifting syntax error

I have a byte array:
byte data[2]
I want to to keep the 7 less significant bits from the first and the 3 most significant bits from the second.
I do this:
unsigned int the=((data[0]<<8 | data[1])<<1)>>6;
Can you give me a hint why this does not work?
If I do it in different lines it works fine.
Can you give me a hint why this does not work?
Hint:
You have two bytes and want to preserve 7 less significant bits from the first and the 3 most significant bits from the second:
data[0]: -xxxxxxx data[1]: xxx-----
-'s represent bits to remove, x's represent bits to preserve.
After this
(data[0]<<8 | data[1])<<1
you have:
the: 00000000 0000000- xxxxxxxx xx-----0
Then you make >>6 and result is:
the: 00000000 00000000 00000-xx xxxxxxxx
See, you did not remove high bit from data[0].
Keep the 7 less significant bits from the first and the 3 most significant bits from the second.
Assuming the 10 bits to be preserved should be the LSB of the unsigned int value, and should be contiguous, and that the 3 bits should be the LSB of the result, this should do the job:
unsigned int value = ((data[0] & 0x7F) << 3) | ((data[1] & 0xE0) >> 5);
You might not need all the masking operands; it depends in part on the definition of byte (probably unsigned char, or perhaps plain char on a machine where char is unsigned), but what's written should work anywhere (16-bit, 32-bit or 64-bit int; signed or unsigned 8-bit (or 16-bit, or 32-bit, or 64-bit) values for byte).
Your code does not remove the high bit from data[0] at any point — unless, perhaps, you're on a platform where unsigned int is a 16-bit value, but if that's the case, it is unusual enough these days to warrant a comment.

C/C++ pointer trick fix

I'm trying this pointer trick and I can't figure out how to fix it, I'm running g++ 4.6 on ubuntu 12.04 64-bit. Check out this code below:
int arr[5];
arr[3] = 50;
((short*) arr)[6] = 2;
cout << arr[3] << endl;
The logic is: since short is 2 bytes, int is 4 bytes, I want to change the first 2 bytes in arr[3], while keeping the value of the second 2 bytes as 50. So I'm just messing with the bit pattern. Unfortunately, sizeof(int*) and sizeof(short*) are both 8 bytes. Is there a type cast that returns a pointer of size 2 bytes?
Update:
I realized that the question is poorly written, so I'll fix that:
The output from cout << arr[3] << endl; I'm getting is 2. The output I would like to get is neither 2 nor 50, but rather a large number, indicating that the left part of the int bit pattern has been changed, while the right part (the second 2-bits) of the int stored in arr[3] is still unchanged.
sizeof(int*) and sizeof(short*) are both going to be the same -- as will sizeof(void*) -- you're asking for the size of a pointer, not the size of the thing the pointer points to.
Use sizeof(int) or sizeof(short) instead.
Now, as for your code snippet, you are making assumptions about the endianness of the machine on which you're running. The "first" part of the int on a given platform may be the bytes with higher address, or the bytes with lower address.
For instance, your memory block may be laid out like this. Let's say the least significant byte has index zero, and the most significant byte has index one. On a big endian architecture an int may look like this:
<------------- 4 bytes --------------->
+---------+---------+---------+---------+
| int:3 | int:2 | int:1 | int:0 |
| short:1 | short:0 | short:1 | short:0 |
+---------+---------+---------+---------+
Notice how the first short in the int -- which in your case would have been ((short*) arr)[6] -- contains the most significant bits of the int, not the least significant. So if you overwrite ((short*) arr)[6], you are overwriting the most significant bits of arr[3], which appears to be what you wanted. But x64 is not a big endian machine.
On a little endian architecture, you would see this instead:
<------------- 4 bytes --------------->
+---------+---------+---------+---------+
| int:0 | int:1 | int:2 | int:3 |
| short:0 | short:1 | short:0 | short:1 |
+---------+---------+---------+---------+
leading to the opposite behavior -- ((short*) arr)[6] would be the least significant bits of arr[3], and ((short*) arr)[7] would be the most significant.
Here's what my machine happens to do -- your machine may be different:
C:\Users\Billy\Desktop>type example.cpp
#include <iostream>
int main()
{
std::cout << "Size of int is " << sizeof(int) << " and size of short is "
<< sizeof(short) << std::endl;
int arr[5];
arr[3] = 50;
((short*) arr)[6] = 2;
std::cout << arr[3] << std::endl;
((short*) arr)[7] = 2;
std::cout << arr[3] << std::endl;
}
C:\Users\Billy\Desktop>cl /W4 /EHsc /nologo example.cpp && example.exe
example.cpp
Size of int is 4 and size of short is 2
2
131074
Your problem is due to endianness. Intel CPU's are little endian meaning that the first byte of an int is stored in the the first address. Let me how you can example:
Let's assume that arr[3] is at address 10:
Then arr[3] = 50; Writes the following to memory
10: 0x32
11: 0x00
12: 0x00
13: 0x00
And ((short*) arr)[6] = 2; writes the following to memory
10: 0x02
11: 0x00
When you index a pointer, it adds the index multiplied by the size of the pointed-to type. So, you don't need a 2-byte pointer.
You are making a lot of assumptions which might not hold water. Also what does the sizeof pointers have to do with the problem at hand?
Why not just use bit masks:
arr[3] |= (top_2_bytes << 16);
This should set the upper 16 bytes without disturbing the lower 16. (You may get into signed/unsigned dramas)
Said all the above, the standard prohibits doing such things: Setting a variable through a pointer to another type is calling undefined behaviour. If you know how your machine works (sizes of int and short, endianness, ...) and you know how your compiler (is likely to) translate your code, then you might get away with it. Does for neat parlor tricks, and spectacular explosions when the machine/compiler/phase of the moon change.
If it wins any performance, the win will be minimal, and it could even be a net loss (one compiler long ago I fiddled with, playing the "I can implement this loop better than the compiler, I know exactly what goes on here" generated much worse code for my `label: ... if() goto label; than the exact same written natively as a loop: My "smart" code confused the compiler, its "pattern for loops" did not apply).
You wouldn't want your actual pointer to be of size two bytes; this would mean that it could only access ~16k of memory addresses. However, using a cast, as you are, to a short *, will let you access the memory every two bytes (since the compiler will view the array as an array of shorts, not of ints).

convert 4 bytes to 3 bytes in C++

I have a requirement, where 3 bytes (24 bits) need to be populated in a binary protocol. The original value is stored in an int (32 bits). One way to achieve this would be as follows:-
Technique1:-
long x = 24;
long y = htonl(x);
long z = y>>8;
memcpy(dest, z, 3);
Please let me know if above is the correct way to do it?
The other way, which i dont understand was implemented as below
Technique2:-
typedef struct {
char data1;
char data2[3];
} some_data;
typedef union {
long original_data;
some_data data;
} combined_data;
long x = 24;
combined_data somedata;
somedata.original_data = htonl(x);
memcpy(dest, &combined_data.data.data2, 3);
What i dont understand is, how did the 3 bytes end up in combined_data.data.data2 as opposed to first byte should go into combined_data.data.data1 and next 2 bytes should go into
combined_data.data.data2?
This is x86_64 platform running 2.6.x linux and gcc.
PARTIALLY SOLVED:-
On x86_64 platform, memory is addressed from right to left. So a variable of type long with value 24, will have following memory representation
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0 0 0 0x18
With htonl() performed on above long type, the memory becomes
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0x18 0 0 0
In the struct some_data, the
data1 = Byte1
data2[0] = Byte2
data2[1] = Byte3
data4[2] = Byte4
But my Question still holds, Why not simply right shift by 8 as shown in technique 1 ?
A byte takes 8 bits :-)
int x = 24;
int y = x<<8;
moving by 0 you are changing nothing. By 1 - *2, by 2 - *4, by 8 - *256.
if we are on the BIG ENDIAN machine, 4 bytes are put in memory as so: 2143. And such algorythms won't work for numbers greater than 2^15. On the other way, on the BIG ENDIAN machine you should define, what means " putting integer in 3 bytes"
Hmm. I think, the second proposed algorythm will be ok, but change the order of bytes:
You have them as 2143. You need 321, I think. But better check it.
Edit: I checked on wiki - x86 is little endian, they say, so algorythms are OK