Can someone explain this "endian-ness" function for me? - c++

Write a program to determine whether a computer is big-endian or little-endian.
bool endianness() {
int i = 1;
char *ptr;
ptr = (char*) &i;
return (*ptr);
}
So I have the above function. I don't really get it. ptr = (char*) &i, which I think means a pointer to a character at address of where i is sitting, so if an int is 4 bytes, say ABCD, are we talking about A or D when you call char* on that? and why?
Would some one please explain this in more detail? Thanks.
So specifically, ptr = (char*) &i; when you cast it to char*, what part of &i do I get?

If you have a little-endian architecture, i will look like this in memory (in hex):
01 00 00 00
^
If you have a big-endian architecture, i will look like this in memory (in hex):
00 00 00 01
^
The cast to char* gives you a pointer to the first byte of the int (to which I have pointed with a ^), so the value pointed to by the char* will be 01 if you are on a little-endian architecture and 00 if you are on a big-endian architecture.
When you return that value, 0 is converted to false and 1 is converted to true. So, if you have a little-endian architecture, this function will return true and if you have a big-endian architecture, it will return false.

If ptr points to byte A or D depends on the endianness of the machine. ptr points to that byte of the integer that is at the lowest address (the other bytes would be at ptr+1,...).
On a big-endian machine the most significant byte of the integer (which is 0x00) will be stored at this lowest address, so the function will return zero.
On a litte-endian machine it is the opposite, the least significant byte of the integer (0x01) will be stored at the lowest address, so the function will return one in this case.

This is using type punning to access an integer as an array of characters. If the machine is big endian, this will be the major byte, and will have a value of zero, but if the machine is little endian, it will be the minor byte, which will have a value of one. (Instead of accessing i as a single integer, the same memory is accessed as an array of four chars).

Whether *((char*)&i) is byte A or byte D gets to the heart of endianness. On a little endian system, the integer 0x41424344 will be laid out in memory as: 0x44 43 42 41 (least significant byte first; in ASCII, this is "DCBA"). On a big endian system, it will be laid out as: 0x41 42 43 44. A pointer to this integer will hold the address of the first byte. Considering the pointer as an integer pointer, and you get the whole integer. Consider the pointer as a char pointer, and you get the first byte, since that's the size of a char.

Assume int is 4 bytes (in C it may not be). This assumption is just to simplify the example...
You can look at each of these 4 bytes individually.
char is a byte, so it's looking at the first byte of a 4 byte buffer.
If the first byte is non 0 then that tells you if the lowest bit is contained in the first byte.
I randomly chose the number 42 to avoid confusion of any special meaning in the value 1.
int num = 42;
if(*(char *)&num == 42)
{
printf("\nLittle-Endian\n");
}
else
{
printf("Big-Endian\n");
}
Breakdown:
int num = 42;
//memory of the 4 bytes is either: (where each byte is 0 to 255)
//1) 0 0 0 42
//2) 42 0 0 0
char*p = #/*Cast the int pointer to a char pointer, pointing to the first byte*/
bool firstByteOf4Is42 = *p == 42;/*Checks to make sure the first byte is 1.*/
//Advance to the 2nd byte
++p;
assert(*p == 0);
//Advance to the 3rd byte
++p;
assert(*p == 0);
//Advance to the 4th byte
++p;
bool lastByteOf4Is42 = *p == 42;
assert(firstByteOf4Is42 == !lastByteOf4Is42);
If firstByteOf4Is42 is true you have little-endian. If lastByteOf4Is42 is true then you have big-endian.

Sure, let's take a look:
bool endianness() {
int i = 1; //This is 0x1:
char *ptr;
ptr = (char*) &i; //pointer to 0001
return (*ptr);
}
If the machine is Little endian, then data will be in *ptr will be 0000 0001.
If the machine is Big Endian, then data will be inverted, that is, i will be
i = 0000 0000 0000 0001 0000 0000 0000 0000
So *ptr will hold 0x0
Finally, the return *ptr is equivalent to
if (*ptr = 0x1 ) //little endian
else //big endian

Related

Assign char address to an int pointer, and consequently write an int-size piece of memory

from the book Stroustrup - Programming: Principles and practices using C++. In §17.3, about Memory, addresses and pointers, it is supposed to be allowed to assign a char* to int*:
char ch1 = 'a';
char ch2 = 'b';
char ch3 = 'c';
char ch4 = 'd';
int* pi = &ch3; // point to ch3, a char-size piece of memory
*pi = 12345; // write to an int-size piece of memory
*pi = 67890;
graphically we have something like this:
quoting from the source:
Had the compiler allowed the code, we would have been writing 12345 to the memory starting at &ch3. That would definitely have changed the value of some nearby memory, such as ch2 or ch4, or we would have overwritten part of pi itself.
In that case, the next assignment *pi = 67890 would place 67890 in some completely different part of memory.
I don't understand, why the next assignment would place it: in some completely different part of memory? The address stored in int *pi is still &ch3, so that assignment would be overwrite the content at that address, i.e. 12345. Why it isn't so?
Please, can you help me? Many thanks!
char ch3 = 'c';
int* pi = &ch3;
it is supposed to be allowed to assign a char* to int*:
Not quite - there is an alignment concern. It is undefined behavior (UB) when
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.3 7
Example: Some processor require int * to be even and if &ch3 was odd, storing the address might fail, and de-referencing the address certainly fails: bus error.
The next is certainly UB as the destination is outside the memory of ch3.
ch1, ch2, ch4 might be nearby and provide some reasonable undefined behavior, but the result is UB.
// undefined behavior
*pi = 12345; // write to an int-size piece of memory`
When code attempts to write outside its bounds - it is UB, anything may happen, including writing into neighboring data.
The address stored in int *pi is still &ch3
Maybe, maybe not. UB has occurred.
why the next assignment would place it: in some completely different part of memory?
The abusive code suggests that pi itself is overwritten by *pi = 12345;. This might happen, it might not. It is UB. A subsequent use of *pi is simply more UB.
Recall with UB you might get what you hope for, you might not - it is not defined by C.
You seem to have skipped part of the explanation you quoted:
or we would have overwritten part of pi itself
Think of it this way, since ints are larger than chars, if an int* points to an address location that stores a char, there will be overflowing memory when you attempt to assign an integer value to that location, as you only have a single byte of memory allocated but are assigning 4 bytes worth of data. i.e. you cannot fit 4 bytes of data into one, so the other 3 bytes will go somewhere.
Assume then that the overflowing bytes partially change the value stored in pi. Now the next assignment will go to a random memory location.
Let's assume the memory address layout is:
0 1 2 3 4 5 6 7
From the left 0, 1, 2 and 3 are characters. From the right 4, 5, 6 and 7 are an int*.
The values in each byte in hex may be:
61 62 63 64 02 00 00 00
Note how the first four are ascii values and the last four are the address of ch3. Writing *pi = 12345; Changes the values like so:
61 62 39 30 00 00 00 00
With 0x39300000 being 12345 in little endian hexadecimal.
The next write *pi = 67890; would start from memory adress 00 00 00 00 not 02 00 00 00 as one could expect.
Firstly, you have to understand that everything is a number i.e., a char, int, int* all contain numbers. Memory addresses are also numbers. Let's assume the current example compiles and we have memory like following:
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 c
0x04 | ch4 d
0x05 | pi &ch3 = 0x03
Now let's dereference pi and reassign a new value to ch3:
*pi = 12345;
Let's assume int is 4 bytes. Since pi is an int pointer, it will write a 4 byte value to the location pointed by pi. Now, char can only contain one byte values so, what would happen if we try to write 4 bytes to that location? Strictly speaking, this is undefined behaviour but I will try to explain what the author means.
Since char cannot contain values larger than 1 byte, *pi = 12345 will overflow ch3. When this overflow happens, the remaining 3 bytes out of the 4 bytes may get written in the memory location nearby. What memory locations do we have nearby? ch4 and pi itself! ch4 can only contain 1 byte as well, that leaves us with 2 bytes and the next location is pi itself. Meaning pi will overwrite it's own value!
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 12 //12 ended up here
0x04 | ch4 34 //34 ended up here
0x05 | pi &ch3 = 0x03 // 5 gets written here
As you can see that pi is now pointing to some other memory address which is definitely not ch3.

Escape sequence in char confusion

I came across this example here
.
#include <iostream>
int main()
{
int i = 7;
char* p = reinterpret_cast<char*>(&i);
if(p[0] == '\x7') //POINT OF INTEREST
std::cout << "This system is little-endian\n";
else
std::cout << "This system is big-endian\n";
}
What I'm confused about is the if statement. How do the escape sequences behave here? I get the same result with p[0] == '\07' (\x being hexadecimal escape sequence). How would checking if p[0] == '\x7' tell me if the system is little or big endian?
The layout of a (32-bit) integer in memory is;
Big endian:
+-----+-----+-----+-----+
| 0 | 0 | 0 | 7 |
+-----+-----+-----+-----+
^ pointer to int points here
Little endian:
+-----+-----+-----+-----+
| 7 | 0 | 0 | 0 |
+-----+-----+-----+-----+
^ pointer to int points here
What the code basically does is read the first char that the integer pointer points to, which in case of little endian is \x0, and big endian is \x7.
Hex 7 and octal 7 happens to be the same value, as is decimal 7.
The check is intended to try to determine if the value ends up in the first or last byte of the int.
A little endian system will store the bytes of the value in "reverse" order, with the lower part first
07 00 00 00
A big endian system would store the "big" end first
00 00 00 07
By reading the first byte, the code will see if the 7 ends up there, or not.
7 in decimal is the same as 7 in hexadecimal and 7 in octal, so it doesn't matter if you use '\x7', '\07', or even just 7 (numeric literal, not a character one).
As for the endianness test: the value of i is 7, meaning it will have the number 7 in its least significant byte, and 0 in all other bytes. The cast char* p = reinterpret_cast<char*>(&i); makes p point to the first byte in the representation of i. The test then checks whether that byte's value is 7. If so, it's the least significant byte, implying a little-endian system. If the value is not 7, it's not a little-endian system. The code assumes that it's a big-endian, although that's not strictly established (I believe there were exotic systems with some sort of mixed endianness as well, although the code will probably not run on such in practice).

memcpy from Byte * to unsigned int Is Reversing Byte Order

I have a CFBitVector that looks like '100000000000000'
I pass the byte array to CFBitVectorGetBits, which then contains the values from this CFBitVector. After this call, bytes[2] looks like:
bytes[0] == '0x80'
bytes[1] == '0x00'
This is exactly what I would expect. However, when copying the contents of bytes[2] to unsigned int bytesValue, the value is 128 when it should be 32768. The decimal value 128 is represented by the hex value 0x0080. Essentially it seems that the byte order is reversed while performing memcpy. What is going on here? Is this just an issue with endianness?
Thanks
CFMutableBitVectorRef bitVector = CFBitVectorCreateMutable(kCFAllocatorDefault, 16);
CFBitVectorSetCount(bitVector, 16);
CFBitVectorSetBitAtIndex(bitVector, 0, 1);
CFRange range = CFRangeMake(0, 16);
Byte bytes[2] = {0,0};
unsigned int bytesValue = 0;
CFBitVectorGetBits(bitVector, range, bytes);
memcpy(&bytesValue, bytes, sizeof(bytes));
return bytesValue;
What is going on here? Is this just an issue with endianness?
Yes.
Your computer is little endian. The 16-bit value 32768 would be represented in-memory as:
00 80
On a little endian machine. You have:
80 00
Which is the opposite, representing 128 as you're seeing.

Code to find Endianness-pointer typecasting

I was trying to search for a code to determine the endianness of the system, and this is what I found:
int main()
{
unsigned int i= 1;
char *c = (char *)&i;
if (*c) {
printf("Little Endian\n");
} else {
printf("Big Endian\n");
}
}
Could someone tell me how this code works? More specifically, why is the ampersand needed here in this typecasting :
char *c = (char *)&i;
What is getting stored into the pointer c.. the value i contains or the actual address i is contained in? Also why is this a char for this program?
While dereferencing a character pointer, only one byte is interpreted(Assuming a char variable takes one byte).And in little-endian mode,the least-significant-byte of an integer is stored first.So for a 4-byte integer,say 3,it is stored as
00000011 00000000 00000000 00000000
while for big-endian mode it is stored as:
00000000 00000000 00000000 00000011
So in the first case, the char* interprets the first byte and displays 3 but in the second case it displays 0.
Had you not typecasted it as :
char *c = (char *)&i;
it will show a warning about incompatible pointer type.Had c been an integer pointer, dereferencing it will get an integer value 3 irrespective of the endianness,as all 4 bytes will be interpreted.
NB You need to initialize the variable i to see the whole picture.Else a garbage value is stored in the variable by default.
Warning!! OP,we discussed the difference between little-endian and big-endian,but it's more important to know the difference between little-endian and little-indian.I noticed that you used the latter.Well, the difference is that little-indian can cost you your dream job in Google or a $3 million in venture capital if your interviewer is a Nikesh Arora,Sundar Pichai,Vinod Dham or Vinod Khosla :-)
Let's try to walk through this: (in comments)
int main(void){ /
unsigned int i = 1; // i is an int in memory that can be conceptualized as
// int[0x00 00 00 01]
char *c = *(char *)&i; // We take the address of i and then cast it to a char pointer
// which we then dereference. This cast from int(4 bytes)
// to char(1 byte) results in only keeping the lowest byte by
if(*c){ // Endian-ness.
puts("little!\n"); // This means that on a Little Endian machine, 0x01 will be
} else { // the byte kept, but on a Big Endian machine, 0x00 is kept.
puts("big!\n"); // int[0x00 00 00 (char)[01]] vs int[0x01 00 00 (char)[00]]
}
return 0;
}

unsigned char array to unsigned int back to unsigned char array via memcpy is reversed

This isn't cross-platform code... everything is being performed on the same platform (i.e. endianess is the same.. little endian).
I have this code:
unsigned char array[4] = {'t', 'e', 's', 't'};
unsigned int out = ((array[0]<<24)|(array[1]<<16)|(array[2]<<8)|(array[3]));
std::cout << out << std::endl;
unsigned char buff[4];
memcpy(buff, &out, sizeof(unsigned int));
std::cout << buff << std::endl;
I'd expect the output of buff to be "test" (with a garbage trailing character because of the lack of '/0') but instead the output is "tset." Obviously changing the order of characters that I'm shifting (3, 2, 1, 0 instead of 0, 1, 2, 3) fixes the problem, but I don't understand the problem. Is memcpy not acting the way I expect?
Thanks.
This is because your CPU is little-endian. In memory, the array is stored as:
+----+----+----+----+
array | 74 | 65 | 73 | 74 |
+----+----+----+----+
This is represented with increasing byte addresses to the right. However, the integer is stored in memory with the least significant bytes at the left:
+----+----+----+----+
out | 74 | 73 | 65 | 74 |
+----+----+----+----+
This happens to represent the integer 0x74657374. Using memcpy() to copy that into buff reverses the bytes from your original array.
You're running this on a little-endian platform.
On a little-endian platform, a 32-bit int is stored in memory with the least significant byte in the lowest memory address. So bits 0-7 are stored at address P, bits 8-15 in address P + 1, bits 16-23 in address P + 2 and bits 24-31 in address P + 3.
In your example: bits 0-7 = 't', bits 8-15 = 's', bits 16-23 = 'e', bits 24-31 = 't'
So that's the order that the bytes are written to memory: "tset"
If you address the memory then as separate bytes (unsigned chars), you'll read them in the order they are written to memory.
On a little-endian platform the output should be tset. The original sequence was test from lower addresses to higher addresses. Then you put it into an unsigned int with first 't' going into the most significant byte and the last 't' going into the least significant byte. On a little-endian machine the least significant byte is stored at lower address. This is how it will be copied to the final buf. This is how it is going to be output: from the last 't' to the first 't', i.e. tset.
On a big-endian machine you would not observe the reversal.
You have written a test for platform byte order, and it has concluded: little endian.
How about adding a '\0' to your buff?