Interesting problem on pointers..Please help - c++

#include<iostream>
#include<conio.h>
using namespace std;
int main()
{
int x = 65;
int *ptr = &x;
char * a= (char *)ptr;
cout<<(int)*(a);
getch();return 0;
}
Sixeof(ptr) and Sizeof(a) display 4
Sizeof(int) displays 4 and sizeof(char) displays 1
So 65 is stored in 4 bytes ie
00000000 00000000 00000000 01000001 and address of first bytes is stored in ptr
In the above code I have type casted the int* to char* in a motive to print the value stored in x(type int) first byte.
So after typecasting "a" stores the first byte address ie contained in ptr as well
Now on displaying (int)*a shouldn it consider only first byte for showing the value..??
but the output is 65 instead of 0(first byte value)..Where am I going wrong..?
what I have learnt is
char * ptr1;
ptr1++; //Ptr1 goes to the next byte..*ptr1 will display only 1 byte value
int * ptr2;
ptr1++; //Ptr2 goes to the next 4 byte..*ptr2 will display value conmtain in 4 bytes
PS - I am working on Dev-C++

Your machine is little-endian, and least significant bytes go first.

Related

Assign char address to an int pointer, and consequently write an int-size piece of memory

from the book Stroustrup - Programming: Principles and practices using C++. In §17.3, about Memory, addresses and pointers, it is supposed to be allowed to assign a char* to int*:
char ch1 = 'a';
char ch2 = 'b';
char ch3 = 'c';
char ch4 = 'd';
int* pi = &ch3; // point to ch3, a char-size piece of memory
*pi = 12345; // write to an int-size piece of memory
*pi = 67890;
graphically we have something like this:
quoting from the source:
Had the compiler allowed the code, we would have been writing 12345 to the memory starting at &ch3. That would definitely have changed the value of some nearby memory, such as ch2 or ch4, or we would have overwritten part of pi itself.
In that case, the next assignment *pi = 67890 would place 67890 in some completely different part of memory.
I don't understand, why the next assignment would place it: in some completely different part of memory? The address stored in int *pi is still &ch3, so that assignment would be overwrite the content at that address, i.e. 12345. Why it isn't so?
Please, can you help me? Many thanks!
char ch3 = 'c';
int* pi = &ch3;
it is supposed to be allowed to assign a char* to int*:
Not quite - there is an alignment concern. It is undefined behavior (UB) when
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.3 7
Example: Some processor require int * to be even and if &ch3 was odd, storing the address might fail, and de-referencing the address certainly fails: bus error.
The next is certainly UB as the destination is outside the memory of ch3.
ch1, ch2, ch4 might be nearby and provide some reasonable undefined behavior, but the result is UB.
// undefined behavior
*pi = 12345; // write to an int-size piece of memory`
When code attempts to write outside its bounds - it is UB, anything may happen, including writing into neighboring data.
The address stored in int *pi is still &ch3
Maybe, maybe not. UB has occurred.
why the next assignment would place it: in some completely different part of memory?
The abusive code suggests that pi itself is overwritten by *pi = 12345;. This might happen, it might not. It is UB. A subsequent use of *pi is simply more UB.
Recall with UB you might get what you hope for, you might not - it is not defined by C.
You seem to have skipped part of the explanation you quoted:
or we would have overwritten part of pi itself
Think of it this way, since ints are larger than chars, if an int* points to an address location that stores a char, there will be overflowing memory when you attempt to assign an integer value to that location, as you only have a single byte of memory allocated but are assigning 4 bytes worth of data. i.e. you cannot fit 4 bytes of data into one, so the other 3 bytes will go somewhere.
Assume then that the overflowing bytes partially change the value stored in pi. Now the next assignment will go to a random memory location.
Let's assume the memory address layout is:
0 1 2 3 4 5 6 7
From the left 0, 1, 2 and 3 are characters. From the right 4, 5, 6 and 7 are an int*.
The values in each byte in hex may be:
61 62 63 64 02 00 00 00
Note how the first four are ascii values and the last four are the address of ch3. Writing *pi = 12345; Changes the values like so:
61 62 39 30 00 00 00 00
With 0x39300000 being 12345 in little endian hexadecimal.
The next write *pi = 67890; would start from memory adress 00 00 00 00 not 02 00 00 00 as one could expect.
Firstly, you have to understand that everything is a number i.e., a char, int, int* all contain numbers. Memory addresses are also numbers. Let's assume the current example compiles and we have memory like following:
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 c
0x04 | ch4 d
0x05 | pi &ch3 = 0x03
Now let's dereference pi and reassign a new value to ch3:
*pi = 12345;
Let's assume int is 4 bytes. Since pi is an int pointer, it will write a 4 byte value to the location pointed by pi. Now, char can only contain one byte values so, what would happen if we try to write 4 bytes to that location? Strictly speaking, this is undefined behaviour but I will try to explain what the author means.
Since char cannot contain values larger than 1 byte, *pi = 12345 will overflow ch3. When this overflow happens, the remaining 3 bytes out of the 4 bytes may get written in the memory location nearby. What memory locations do we have nearby? ch4 and pi itself! ch4 can only contain 1 byte as well, that leaves us with 2 bytes and the next location is pi itself. Meaning pi will overwrite it's own value!
--------------------------
Address | Variable | Value
--------------------------
0x01 | ch1 a
0x02 | ch2 b
0x03 | ch3 12 //12 ended up here
0x04 | ch4 34 //34 ended up here
0x05 | pi &ch3 = 0x03 // 5 gets written here
As you can see that pi is now pointing to some other memory address which is definitely not ch3.

Why can't I access int with different pointer in C++?

I want to be able to access value with p pointer. But when I use p pointer I'm always getting b variable equal to zero. Please refer to code snippet below.
basepointer = malloc(512);
*((int*)basepointer+32) = 455; // putting int value in memory
void *p = basepointer + 32; // creating other pointer
int a,b;
a = *((int*)basepointer+32); // 455, retrieving value using basepointer
b = *((int*)p); // 0, retrieving value using p
Why it happens so? How can I access value with my p pointer?
I can't find a good duplicate answer, so here's what's going on:
Pointer arithmetic always happens in units of the pointer's base type. That is, when you have T *ptr (a pointer to some type T), then ptr + 1 is not the next byte in memory, but the next T.
In other words, you can imagine a pointer like a combination of an array and an index:
T *ptr;
T array[/*some size*/];
ptr = &array[n];
If ptr is a pointer to array[n] (the nth element), then ptr + i is a pointer to array[n + i] (the (n+i)th element).
Let's take a look at your code:
*((int*)basepointer+32) = 455;
Here you're casting basepointer to (int*), then adding 32 to it. This gives you the address of the 32nd int after basepointer. If your platform uses 4 byte ints, then the actual offset is 32 * 4 = 128 bytes. This is where we store 455.
Then you do
void *p = basepointer + 32;
This is technically invalid code because basepointer is a void *, and you can't do arithmetic in terms of void because void has no size. As an extension, gcc supports this and pretends void has size 1. (But you really shouldn't rely on this: cast to unsigned char * if you want bytewise addressing.)
Now p is at offset 32 after basepointer.
a = *((int*)basepointer+32);
This repeats the pointer arithmetic from above and retrieves the value from int offset 32 (i.e. byte offset 128), which is still 455.
b = *((int*)p);
This retrieves the int value stored at byte offset 32 (which would correspond to int offset 8 in this example). We never stored anything here, so b is essentially garbage (it happens to be 0 on your platform).
The smallest change to make this code work as expected is probably
void *p = (int *)basepointer + 32; // use int-wise arithmetic to compute p
void pointer arithmetic is illegal in C/C++, GCC allows it as extension.
you need to modify this line:
void *p = basepointer + 32;
to
void *p = basepointer + 32 * sizeof(int);
because pointer arithmetic is calculated relatively to the type size.
for example:
if sizeof (int) = 4
and sizeof(short) = 2
and int *p has address of 3000
and short *sp has address of 4000
then p + 3 = p + 3 * sizeof(int) = 3012
and sp + 3 = sp + 3 *sizeof(short) = 4006
for void it is 1 if the compiler allows it.

Memory layout in memset

I have this "buggy" code :
int arr[15];
memset(arr, 1, sizeof(arr));
memset sets each byte to 1, but since int is generally 4-bytes, it won't give the desired output. I know that each int in the array will we initalized to 0x01010101 = 16843009. Since I have a weak (very) understanding of hex values and memory layouts, can someone explain why it gets initialized to that hex value ? What will be the case if I have say, 4, in place of 1 ?
If I trust the man page
The memset() function writes len bytes of value c (converted to an unsigned char) to the string b.
In your case it will convert 0x00000001 (as an int) into 0x01 (as an unsigned char), then fill each byte of the memory with this value. You can fit 4 of that in an int, that is, each int will become 0x01010101.
If you had 4, it would be casted into the unsigned char 0x04, and each int would be filled with 0x04040404.
Does that make sense to you ?
What memset does is
Converts the value ch to unsigned char and copies it into each of the first count characters of the object pointed to by dest.
So, first your value (1) will be converted to unsigned char, which occupies 1 byte, so that will be 0b00000001. Then memset will fill the whole array's memory with these values. Since an int takes 4 bytes on your machine, the value of each int int the array would be 00000001000000010000000100000001 which is 16843009. If you place another value instead of 1, the array's memory will be filled with that value instead.
Note that memset converts its second argument to an unsigned char which is one byte. One byte is eight bits, and you're setting each byte to the value 1. So we get
0b00000001 00000001 00000001 00000001
or in hexadecimal,
0x01010101
or the decimal number 16843009. Why that value? Because
0b00000001000000010000000100000001 = 1*2^0 + 1*2^8 + 1*2^16 + 1*2^24
= 1 + 256 + 65536 + 16777216
= 16843009
Each group of four binary digits corresponds to one hexadecimal digit. Since 0b0000 = 0x0 and 0b0001 = 0x1, your final value is 0x01010101. With memset(arr, 4, sizeof(arr)); you would get 0x04040404 and with 12 you would get 0x0c0c0c0c.

Code to find Endianness-pointer typecasting

I was trying to search for a code to determine the endianness of the system, and this is what I found:
int main()
{
unsigned int i= 1;
char *c = (char *)&i;
if (*c) {
printf("Little Endian\n");
} else {
printf("Big Endian\n");
}
}
Could someone tell me how this code works? More specifically, why is the ampersand needed here in this typecasting :
char *c = (char *)&i;
What is getting stored into the pointer c.. the value i contains or the actual address i is contained in? Also why is this a char for this program?
While dereferencing a character pointer, only one byte is interpreted(Assuming a char variable takes one byte).And in little-endian mode,the least-significant-byte of an integer is stored first.So for a 4-byte integer,say 3,it is stored as
00000011 00000000 00000000 00000000
while for big-endian mode it is stored as:
00000000 00000000 00000000 00000011
So in the first case, the char* interprets the first byte and displays 3 but in the second case it displays 0.
Had you not typecasted it as :
char *c = (char *)&i;
it will show a warning about incompatible pointer type.Had c been an integer pointer, dereferencing it will get an integer value 3 irrespective of the endianness,as all 4 bytes will be interpreted.
NB You need to initialize the variable i to see the whole picture.Else a garbage value is stored in the variable by default.
Warning!! OP,we discussed the difference between little-endian and big-endian,but it's more important to know the difference between little-endian and little-indian.I noticed that you used the latter.Well, the difference is that little-indian can cost you your dream job in Google or a $3 million in venture capital if your interviewer is a Nikesh Arora,Sundar Pichai,Vinod Dham or Vinod Khosla :-)
Let's try to walk through this: (in comments)
int main(void){ /
unsigned int i = 1; // i is an int in memory that can be conceptualized as
// int[0x00 00 00 01]
char *c = *(char *)&i; // We take the address of i and then cast it to a char pointer
// which we then dereference. This cast from int(4 bytes)
// to char(1 byte) results in only keeping the lowest byte by
if(*c){ // Endian-ness.
puts("little!\n"); // This means that on a Little Endian machine, 0x01 will be
} else { // the byte kept, but on a Big Endian machine, 0x00 is kept.
puts("big!\n"); // int[0x00 00 00 (char)[01]] vs int[0x01 00 00 (char)[00]]
}
return 0;
}

Can someone explain this "endian-ness" function for me?

Write a program to determine whether a computer is big-endian or little-endian.
bool endianness() {
int i = 1;
char *ptr;
ptr = (char*) &i;
return (*ptr);
}
So I have the above function. I don't really get it. ptr = (char*) &i, which I think means a pointer to a character at address of where i is sitting, so if an int is 4 bytes, say ABCD, are we talking about A or D when you call char* on that? and why?
Would some one please explain this in more detail? Thanks.
So specifically, ptr = (char*) &i; when you cast it to char*, what part of &i do I get?
If you have a little-endian architecture, i will look like this in memory (in hex):
01 00 00 00
^
If you have a big-endian architecture, i will look like this in memory (in hex):
00 00 00 01
^
The cast to char* gives you a pointer to the first byte of the int (to which I have pointed with a ^), so the value pointed to by the char* will be 01 if you are on a little-endian architecture and 00 if you are on a big-endian architecture.
When you return that value, 0 is converted to false and 1 is converted to true. So, if you have a little-endian architecture, this function will return true and if you have a big-endian architecture, it will return false.
If ptr points to byte A or D depends on the endianness of the machine. ptr points to that byte of the integer that is at the lowest address (the other bytes would be at ptr+1,...).
On a big-endian machine the most significant byte of the integer (which is 0x00) will be stored at this lowest address, so the function will return zero.
On a litte-endian machine it is the opposite, the least significant byte of the integer (0x01) will be stored at the lowest address, so the function will return one in this case.
This is using type punning to access an integer as an array of characters. If the machine is big endian, this will be the major byte, and will have a value of zero, but if the machine is little endian, it will be the minor byte, which will have a value of one. (Instead of accessing i as a single integer, the same memory is accessed as an array of four chars).
Whether *((char*)&i) is byte A or byte D gets to the heart of endianness. On a little endian system, the integer 0x41424344 will be laid out in memory as: 0x44 43 42 41 (least significant byte first; in ASCII, this is "DCBA"). On a big endian system, it will be laid out as: 0x41 42 43 44. A pointer to this integer will hold the address of the first byte. Considering the pointer as an integer pointer, and you get the whole integer. Consider the pointer as a char pointer, and you get the first byte, since that's the size of a char.
Assume int is 4 bytes (in C it may not be). This assumption is just to simplify the example...
You can look at each of these 4 bytes individually.
char is a byte, so it's looking at the first byte of a 4 byte buffer.
If the first byte is non 0 then that tells you if the lowest bit is contained in the first byte.
I randomly chose the number 42 to avoid confusion of any special meaning in the value 1.
int num = 42;
if(*(char *)&num == 42)
{
printf("\nLittle-Endian\n");
}
else
{
printf("Big-Endian\n");
}
Breakdown:
int num = 42;
//memory of the 4 bytes is either: (where each byte is 0 to 255)
//1) 0 0 0 42
//2) 42 0 0 0
char*p = &num;/*Cast the int pointer to a char pointer, pointing to the first byte*/
bool firstByteOf4Is42 = *p == 42;/*Checks to make sure the first byte is 1.*/
//Advance to the 2nd byte
++p;
assert(*p == 0);
//Advance to the 3rd byte
++p;
assert(*p == 0);
//Advance to the 4th byte
++p;
bool lastByteOf4Is42 = *p == 42;
assert(firstByteOf4Is42 == !lastByteOf4Is42);
If firstByteOf4Is42 is true you have little-endian. If lastByteOf4Is42 is true then you have big-endian.
Sure, let's take a look:
bool endianness() {
int i = 1; //This is 0x1:
char *ptr;
ptr = (char*) &i; //pointer to 0001
return (*ptr);
}
If the machine is Little endian, then data will be in *ptr will be 0000 0001.
If the machine is Big Endian, then data will be inverted, that is, i will be
i = 0000 0000 0000 0001 0000 0000 0000 0000
So *ptr will hold 0x0
Finally, the return *ptr is equivalent to
if (*ptr = 0x1 ) //little endian
else //big endian