Code to find Endianness-pointer typecasting - c++

I was trying to search for a code to determine the endianness of the system, and this is what I found:
int main()
{
unsigned int i= 1;
char *c = (char *)&i;
if (*c) {
printf("Little Endian\n");
} else {
printf("Big Endian\n");
}
}
Could someone tell me how this code works? More specifically, why is the ampersand needed here in this typecasting :
char *c = (char *)&i;
What is getting stored into the pointer c.. the value i contains or the actual address i is contained in? Also why is this a char for this program?

While dereferencing a character pointer, only one byte is interpreted(Assuming a char variable takes one byte).And in little-endian mode,the least-significant-byte of an integer is stored first.So for a 4-byte integer,say 3,it is stored as
00000011 00000000 00000000 00000000
while for big-endian mode it is stored as:
00000000 00000000 00000000 00000011
So in the first case, the char* interprets the first byte and displays 3 but in the second case it displays 0.
Had you not typecasted it as :
char *c = (char *)&i;
it will show a warning about incompatible pointer type.Had c been an integer pointer, dereferencing it will get an integer value 3 irrespective of the endianness,as all 4 bytes will be interpreted.
NB You need to initialize the variable i to see the whole picture.Else a garbage value is stored in the variable by default.
Warning!! OP,we discussed the difference between little-endian and big-endian,but it's more important to know the difference between little-endian and little-indian.I noticed that you used the latter.Well, the difference is that little-indian can cost you your dream job in Google or a $3 million in venture capital if your interviewer is a Nikesh Arora,Sundar Pichai,Vinod Dham or Vinod Khosla :-)

Let's try to walk through this: (in comments)
int main(void){ /
unsigned int i = 1; // i is an int in memory that can be conceptualized as
// int[0x00 00 00 01]
char *c = *(char *)&i; // We take the address of i and then cast it to a char pointer
// which we then dereference. This cast from int(4 bytes)
// to char(1 byte) results in only keeping the lowest byte by
if(*c){ // Endian-ness.
puts("little!\n"); // This means that on a Little Endian machine, 0x01 will be
} else { // the byte kept, but on a Big Endian machine, 0x00 is kept.
puts("big!\n"); // int[0x00 00 00 (char)[01]] vs int[0x01 00 00 (char)[00]]
}
return 0;
}

Related

Assign char address to an int pointer, and consequently write an int-size piece of memory

from the book Stroustrup - Programming: Principles and practices using C++. In §17.3, about Memory, addresses and pointers, it is supposed to be allowed to assign a char* to int*:
char ch1 = 'a';
char ch2 = 'b';
char ch3 = 'c';
char ch4 = 'd';
int* pi = &ch3; // point to ch3, a char-size piece of memory
*pi = 12345; // write to an int-size piece of memory
*pi = 67890;
graphically we have something like this:
quoting from the source:
Had the compiler allowed the code, we would have been writing 12345 to the memory starting at &ch3. That would definitely have changed the value of some nearby memory, such as ch2 or ch4, or we would have overwritten part of pi itself.
In that case, the next assignment *pi = 67890 would place 67890 in some completely different part of memory.
I don't understand, why the next assignment would place it: in some completely different part of memory? The address stored in int *pi is still &ch3, so that assignment would be overwrite the content at that address, i.e. 12345. Why it isn't so?
Please, can you help me? Many thanks!
char ch3 = 'c';
int* pi = &ch3;
it is supposed to be allowed to assign a char* to int*:
Not quite - there is an alignment concern. It is undefined behavior (UB) when
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.3 7
Example: Some processor require int * to be even and if &ch3 was odd, storing the address might fail, and de-referencing the address certainly fails: bus error.
The next is certainly UB as the destination is outside the memory of ch3.
ch1, ch2, ch4 might be nearby and provide some reasonable undefined behavior, but the result is UB.
// undefined behavior
*pi = 12345; // write to an int-size piece of memory`
When code attempts to write outside its bounds - it is UB, anything may happen, including writing into neighboring data.
The address stored in int *pi is still &ch3
Maybe, maybe not. UB has occurred.
why the next assignment would place it: in some completely different part of memory?
The abusive code suggests that pi itself is overwritten by *pi = 12345;. This might happen, it might not. It is UB. A subsequent use of *pi is simply more UB.
Recall with UB you might get what you hope for, you might not - it is not defined by C.
You seem to have skipped part of the explanation you quoted:
or we would have overwritten part of pi itself
Think of it this way, since ints are larger than chars, if an int* points to an address location that stores a char, there will be overflowing memory when you attempt to assign an integer value to that location, as you only have a single byte of memory allocated but are assigning 4 bytes worth of data. i.e. you cannot fit 4 bytes of data into one, so the other 3 bytes will go somewhere.
Assume then that the overflowing bytes partially change the value stored in pi. Now the next assignment will go to a random memory location.
Let's assume the memory address layout is:
0 1 2 3 4 5 6 7
From the left 0, 1, 2 and 3 are characters. From the right 4, 5, 6 and 7 are an int*.
The values in each byte in hex may be:
61 62 63 64 02 00 00 00
Note how the first four are ascii values and the last four are the address of ch3. Writing *pi = 12345; Changes the values like so:
61 62 39 30 00 00 00 00
With 0x39300000 being 12345 in little endian hexadecimal.
The next write *pi = 67890; would start from memory adress 00 00 00 00 not 02 00 00 00 as one could expect.
Firstly, you have to understand that everything is a number i.e., a char, int, int* all contain numbers. Memory addresses are also numbers. Let's assume the current example compiles and we have memory like following:
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 c
0x04 | ch4 d
0x05 | pi &ch3 = 0x03
Now let's dereference pi and reassign a new value to ch3:
*pi = 12345;
Let's assume int is 4 bytes. Since pi is an int pointer, it will write a 4 byte value to the location pointed by pi. Now, char can only contain one byte values so, what would happen if we try to write 4 bytes to that location? Strictly speaking, this is undefined behaviour but I will try to explain what the author means.
Since char cannot contain values larger than 1 byte, *pi = 12345 will overflow ch3. When this overflow happens, the remaining 3 bytes out of the 4 bytes may get written in the memory location nearby. What memory locations do we have nearby? ch4 and pi itself! ch4 can only contain 1 byte as well, that leaves us with 2 bytes and the next location is pi itself. Meaning pi will overwrite it's own value!
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 12 //12 ended up here
0x04 | ch4 34 //34 ended up here
0x05 | pi &ch3 = 0x03 // 5 gets written here
As you can see that pi is now pointing to some other memory address which is definitely not ch3.

C++ : int* / float* to char*, why get different result using reinterpret_cast?

int* to char* :
int* pNum = new int[1];
pNum[0] = 57;
char* pChar = reinterpret_cast< char* >(pNum);
Result : pChar[0] = '9'; //'9' ASCII 57
float* to char* :
float* pFloat = new float[1];
pFloat[0] = 57; //assign the same value as before
char* pChar = reinterpret_cast< char* >(pFloat);
Result : pChar[0] = 'a';
So why I'm getting two different results ?
Thanks for your help.
You have this because floating point values don't use the same encoding as integer values (IEEE encoding with mantissa+exponent or something like that)
Besides, I suppose you're running a little endian CPU, otherwise your first test would have yielded 0 (I mean '\0').
Both float and int are data types which are (usually) represented by four bytes:
b1 b2 b3 b4
However, those bytes are interpreted quite differently across the two types - if they wouldn't, there would be hardly any need for two types.
Now if you reinterpret the pointers to pointers-to-char, the result points only to the first byte, as this is the length of a char:
b1 b2 b3 b4
^^
your char* points to here
As said, this first byte has a very different meaning for the two data types, and this is why the representation as a char in general differs.
Application to your example:
The number 57 in float (IEEE754 Single precision 32-bit) is represented in bits as
01000010 01100100 00000000 00000000
In contrast, the representation in a 32-bit integer format is
00000000 00000000 00000000 00111001
Here the number seems to be represented in "big-endian" format, where the most important byte (the one which changes the value of the int the most) comes first. As mentioned by #Jean-FrançoisFabre, in your PC it seems to be the other way round, but nevermind. For both conversions, I used this site.
Now your char* pointers point to the first of those 8-bit-blocks, respectively. And obviously they're different.

Casting pointer to other pointer

Right now I'm watching this lecture:
https://www.youtube.com/watch?v=jTSvthW34GU
In around 50th minute of the film he says that this code will return non-zero value:
float f = 7.0;
short s = *(short*)&f;
Correct me if I'm mistaking:
&f is a pointer to float.
We take &f and cast it to pointer to short.
Then we dereference (don't know if it's a verb) that pointer so eventually the whole statement represents a value of 7.
If I print that it displays 0. Why?
Dereferencing through a cast pointer does not cause a conversion to take place the way casting a value does. No bits are changed. So, while
float f = 7.0;
short s = (short)f;
will result in s having the integer value 7,
short s = *(short *)&f;
will simply copy the first 16 bits (depending on platform) of the floating point representation of the value 7.0 into the short. On my system, using little-endian IEEE-754, those bits are all zero, so the value is zero.
Floats are represented internally as 4byte floating point numbers (1 signal bit, 8 exponent bits, 23 mantissa bits) while shorts are 2byte integer types (two's compliment numbers). The code above will reinterpret the top two or bottom two bytes (depending on endianness) of the floating point number as an short integer.
So in the case of 7.0, the floating point number looks like:
0_1000000 1_1100000 00000000 00000000
So on some machines, it will take the bottom 2bytes (all 0s) and on others, it will take the top bytes (non-zero).
For more, see:
Floating-point: http://en.wikipedia.org/wiki/Floating_point
Endianness: http://en.wikipedia.org/wiki/Endianness
Casting a pointer to a different type does not cause any conversion of the pointed-to value; you are just interpreting the pointed-to bytes through the "lens" of a different type.
In the general case, casting a pointer to a different pointer type causes undefined behavior. In this case that behavior happens to depend on your architecture.
To get a picture of what is going on, we can write a general function that will display the bits of an object given a pointer to it:
template <typename T>
void display_bits(T const * p)
{
char const * c = reinterpret_cast<char const *>(p);
for (int i = 0; i < sizeof(T); ++i) {
unsigned char b = static_cast<unsigned char>(*(c++));
for (int j = 0; j < 8; ++j) {
std::cout << ((b & 0x80) ? '1' : '0');
b <<= 1;
}
std::cout << ' ';
}
std::cout << std::endl;
}
If we run the following code, this will give you a good idea of what is going on:
int main() {
float f = 7.0;
display_bits(&f);
display_bits(reinterpret_cast<short*>(&f));
return 0;
}
The output on my system is:
00000000 00000000 11100000 01000000
00000000 00000000
The result you get should now be pretty clear, but again it depends on the compiler and/or architecture. For example, using the same representation for float but on a big-endian machine, the result would be quite different because the bytes in the float would be reversed. In that case the short* would be pointing at the bytes 01000000 11100000.

Interesting problem on pointers..Please help

#include<iostream>
#include<conio.h>
using namespace std;
int main()
{
int x = 65;
int *ptr = &x;
char * a= (char *)ptr;
cout<<(int)*(a);
getch();return 0;
}
Sixeof(ptr) and Sizeof(a) display 4
Sizeof(int) displays 4 and sizeof(char) displays 1
So 65 is stored in 4 bytes ie
00000000 00000000 00000000 01000001 and address of first bytes is stored in ptr
In the above code I have type casted the int* to char* in a motive to print the value stored in x(type int) first byte.
So after typecasting "a" stores the first byte address ie contained in ptr as well
Now on displaying (int)*a shouldn it consider only first byte for showing the value..??
but the output is 65 instead of 0(first byte value)..Where am I going wrong..?
what I have learnt is
char * ptr1;
ptr1++; //Ptr1 goes to the next byte..*ptr1 will display only 1 byte value
int * ptr2;
ptr1++; //Ptr2 goes to the next 4 byte..*ptr2 will display value conmtain in 4 bytes
PS - I am working on Dev-C++
Your machine is little-endian, and least significant bytes go first.

Can someone explain this "endian-ness" function for me?

Write a program to determine whether a computer is big-endian or little-endian.
bool endianness() {
int i = 1;
char *ptr;
ptr = (char*) &i;
return (*ptr);
}
So I have the above function. I don't really get it. ptr = (char*) &i, which I think means a pointer to a character at address of where i is sitting, so if an int is 4 bytes, say ABCD, are we talking about A or D when you call char* on that? and why?
Would some one please explain this in more detail? Thanks.
So specifically, ptr = (char*) &i; when you cast it to char*, what part of &i do I get?
If you have a little-endian architecture, i will look like this in memory (in hex):
01 00 00 00
^
If you have a big-endian architecture, i will look like this in memory (in hex):
00 00 00 01
^
The cast to char* gives you a pointer to the first byte of the int (to which I have pointed with a ^), so the value pointed to by the char* will be 01 if you are on a little-endian architecture and 00 if you are on a big-endian architecture.
When you return that value, 0 is converted to false and 1 is converted to true. So, if you have a little-endian architecture, this function will return true and if you have a big-endian architecture, it will return false.
If ptr points to byte A or D depends on the endianness of the machine. ptr points to that byte of the integer that is at the lowest address (the other bytes would be at ptr+1,...).
On a big-endian machine the most significant byte of the integer (which is 0x00) will be stored at this lowest address, so the function will return zero.
On a litte-endian machine it is the opposite, the least significant byte of the integer (0x01) will be stored at the lowest address, so the function will return one in this case.
This is using type punning to access an integer as an array of characters. If the machine is big endian, this will be the major byte, and will have a value of zero, but if the machine is little endian, it will be the minor byte, which will have a value of one. (Instead of accessing i as a single integer, the same memory is accessed as an array of four chars).
Whether *((char*)&i) is byte A or byte D gets to the heart of endianness. On a little endian system, the integer 0x41424344 will be laid out in memory as: 0x44 43 42 41 (least significant byte first; in ASCII, this is "DCBA"). On a big endian system, it will be laid out as: 0x41 42 43 44. A pointer to this integer will hold the address of the first byte. Considering the pointer as an integer pointer, and you get the whole integer. Consider the pointer as a char pointer, and you get the first byte, since that's the size of a char.
Assume int is 4 bytes (in C it may not be). This assumption is just to simplify the example...
You can look at each of these 4 bytes individually.
char is a byte, so it's looking at the first byte of a 4 byte buffer.
If the first byte is non 0 then that tells you if the lowest bit is contained in the first byte.
I randomly chose the number 42 to avoid confusion of any special meaning in the value 1.
int num = 42;
if(*(char *)&num == 42)
{
printf("\nLittle-Endian\n");
}
else
{
printf("Big-Endian\n");
}
Breakdown:
int num = 42;
//memory of the 4 bytes is either: (where each byte is 0 to 255)
//1) 0 0 0 42
//2) 42 0 0 0
char*p = &num;/*Cast the int pointer to a char pointer, pointing to the first byte*/
bool firstByteOf4Is42 = *p == 42;/*Checks to make sure the first byte is 1.*/
//Advance to the 2nd byte
++p;
assert(*p == 0);
//Advance to the 3rd byte
++p;
assert(*p == 0);
//Advance to the 4th byte
++p;
bool lastByteOf4Is42 = *p == 42;
assert(firstByteOf4Is42 == !lastByteOf4Is42);
If firstByteOf4Is42 is true you have little-endian. If lastByteOf4Is42 is true then you have big-endian.
Sure, let's take a look:
bool endianness() {
int i = 1; //This is 0x1:
char *ptr;
ptr = (char*) &i; //pointer to 0001
return (*ptr);
}
If the machine is Little endian, then data will be in *ptr will be 0000 0001.
If the machine is Big Endian, then data will be inverted, that is, i will be
i = 0000 0000 0000 0001 0000 0000 0000 0000
So *ptr will hold 0x0
Finally, the return *ptr is equivalent to
if (*ptr = 0x1 ) //little endian
else //big endian