I'm having troubles understanding pointer arithmetic or how memory is assigned. In the code snippet below, I am trying to access the value of 'size = 1' which is located 8 bytes before 'test', but I don't get size's value and the value is not random. So I may have an issue with understanding bytes sizes. If void*, long, and char are 8 bytes should it matter when using pointer arithmetic?
#include <iostream>
using namespace std;
char arrayOfCrap[100];
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return ((void*) &arrayOfCrap[29]);
}
int main(){
long * test;
test = (long*)what();
*test = 1221;
cout << "Value of test: " << *test << endl;
cout << "Long number before test: " << *(test-1) << endl;
}
The code works when main moves forward from what()'s void* 'pointer:
#include <iostream>
using namespace std;
char arrayOfCrap[100];
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return ((void*) &arrayOfCrap[28]); //change from above
}
int main(){
long * test;
test = (long*)what();
test++; //change from above
*test = 1221;
cout << "Value of test: " << *test << endl;
cout << "Long number before test: " << *(test-1) << endl;
}
Your code is not locating *size eight bytes before *test:
size = (long*)&arrayOfCrap[28];
arrayOfCrap is char arrayOfCrap[100] so arrayOfCrap[28] is the char at offset 28 and arrayOfCrap[29] is the char at offset 29.
The reason test++ works is that test is of type long*, so incrementing it actually moves to the next position for a long, whereas incrementing a char* or using an index on a char array gives you the next position for a char.
You could also do one of these:
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return size+1;
}
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return ((void*) &arrayOfCrap[28 + sizeof(long)];
}
By the way, its not necessarily safe to take a pointer to just any memory location and treat it as a pointer to another type. Some platforms require some types to be 'aligned', or to have those types exist only at addresses that are multiples of a certain value. On those platforms reading or writing to an unaligned object may crash (bus error) or otherwise have undefined behavior. Also, some platforms may not crash or behave incorrectly, but have much better performance when reading/writing aligned objects. I know this is completely beside the point of your experimentation, but it's something you should know for real code. Here's an example of what not to do in real code:
int read_int(char *&c) {
int out = *(int*)c; // c may not be properly aligned!
c += sizeof(int);
return out;
}
Unfortunately on a common platform, x86, unaligned access is usually just slow rather than something that will always cause a crash, so users of that platform have to be especially careful.
When you increment a pointer, it increments not by the pointer size, but by the size of the type of the pointer. A char* pointer increments by sizeof(char), a long* pointer increments by sizeof(long)
sizeof(char *), sizeof(long *) should be both the same size (generally 4 bytes on 32-bit systems, 8 bytes on 64-bit systems).
However, sizeof(char) and sizeof(long) are not the same.
You are confusing your pointer size with the integer size.
#include <iostream>
using namespace std;
int main()
{
cout << "\n sizeof(char*) " << sizeof(char *);
cout << "\n sizeof(char) " << sizeof(char);
cout << "\n sizeof(long*) " << sizeof(long *);
cout << "\n sizeof(long) " << sizeof(long);
}
See it in action here: http://ideone.com/gBcjS
Related
#include <iostream>
using namespace std;
int main(int argc, char** argv) {
unsigned int a =5;
unsigned int *pint = NULL;
cout << "&a = " << &a << endl;
cout << " &pint = " << &pint << endl;
}
Output:
&a = 0x6ffe04
&pint = 0x6ffdf8
I'm wondering why the address of pint equals 0x6ffdf8. pint is an unsigned int (4 bytes), shouldn't the address of it be 0x6ffe00?
Your pint is not an unsigned int. It is a pointer to an unsigned int.
Pointer can have a different size and can especially have the size 8.
It could hence fit into 0x6ffdfc before 0x6ffe04.
But it also has bigger alignment needs, it wants an address dividable by 8, so 0x...c is out, it needs e.g. 0x...8.
With 463035818_is_not_a_number I agree that this not really predictable, there are implementation specific aspects. That is why I phrase "softly" with "can", "wants", "e.g."....
I am trying to convert a char array to integers:
const int LENGTH = 3 * sizeof(int);
char data[LENGTH];
/* data is filled */
for (int i = 0; i < LENGTH; i += sizeof(int)) {
std::cout << "Integer: " << (int)data[i] << std::endl;
}
for (int i = 0; i < LENGTH; i += sizeof(short)) {
std::cout << (short)data[i] << " ";
}
the output is:
Integer: 0
Integer: 0
Integer: 0
0 3 0 3 0 3
I'd expect that if the shorts are not zero so must the integers. Probably the conversion as seen here works for just that one character/byte and not as expected for the folloing 4 bytes. How can I fix that?
To be clear: I want bytes 0 to 3 casted into one integer, then the next (4 to 7) into the next integer and so on...
You are casting data[i] to an int. However, data[i] is a char, so you can cast all you want, the cast is not going to magically read extra bytes. Instead, you have to cast the data pointer to int * and only then dereference it.
Basically, you'll end up with something like this:
auto voidPtr = static_cast<void const *>(data);
auto intPtr = static_cast<int const *>(voidPtr);
for (size_t i = 0; i < LENGTH / sizeof(int); ++i) {
std::cout << "Int: " << intPtr[i] << "\n";
}
Note how i is only incremented by 1 each time, but the number of increments is divided by sizeof(int). This is because the compiler will automatically do the right thing when you're indexing an int *.
Also be aware that what you're getting back might not be what you expect. Depending on whether the machine you're running this on is big- or little-endian.
P.S.: It's generally discouraged to use a C-style cast, static_cast<int> is much more explicit in showing what you want to achieve.
As #underscore_d pointed out, *((int*)&data[i]) from this answer will result in undefined behaviour and memcpy should be used.
int intData[3];
std::memcpy(intData, data, sizeof data);
for (int i = 0; i < 3; i++) {
std::cout << "int: " << intData[i] << " ";
}
is working fine and complies with the reference of memcpy.
I have an array that I want to store the address of the next array in the current position.
So far I have
char *a = new char[50];
char *free = a;
*a = &(a + 1); //value of a[0] is equal to the address of a[1]
Also I'm using a char array so I'm sure I'll need to cast some stuff.
Any help would be nice.
You can't store a char* in a char array.
A character is equal to one byte. The size of a pointer, such as char*, varies depending on your computer. On my computer, its 8 bytes. 8 bytes can't fit in 1 byte.
#include <iostream>
int main()
{
std::cout << "sizeof(char): " << sizeof(char) << std::endl;
std::cout << "sizeof(char*): " << sizeof(char*) << std::endl;
return 0;
}
// Outputs
// sizeof(char): 1
// sizeof(char*): 8
You also won't be able to cast a char* to a char to fit it in the array either, as your compiler will yell at you.
#include <iostream>
int main()
{
char myArray[10];
std::cout << (char)&myArray[0];
}
// Compiler error:
// g++ main.cpp -std=gnu++11
// main.cpp: In function ‘int main()’:
// main.cpp:7:34: error: cast from ‘char*’ to ‘char’ loses precision [-fpermissive]
The closest thing you can do to get this working is to use an array of size_t. size_t is the size of a pointer. So the number of bytes in size_t and size_t* is equal, and therefore you can put a size_t* in an array of size_t... after casting.
#include <iostream>
int main()
{
size_t myArray[10];
myArray[0] = reinterpret_cast<size_t>(&myArray[1]);
std::cout << std::hex << "0x" << myArray[0] << std::endl;
}
// Outputs: 0x7fff4eded5c8
Also, consider using indicies[] instead of pointer addition. Its more readable, and it does the same thing under the hood. a[1] == *(a+1).
#include <iostream>
using namespace std;
struct test
{
int i;
double h;
int j;
};
int main()
{
test te;
te.i = 5;
te.h = 6.5;
te.j = 10;
cout << "size of an int: " << sizeof(int) << endl; // Should be 4
cout << "size of a double: " << sizeof(double) << endl; //Should be 8
cout << "size of test: " << sizeof(test) << endl; // Should be 24 (word size of 8 for double)
//These two should be the same
cout << "start address of the object: " << &te << endl;
cout << "address of i member: " << &te.i << endl;
//These two should be the same
cout << "start address of the double field: " << &te.h << endl;
cout << "calculate the offset of the double field: " << (&te + sizeof(double)) << endl; //NOT THE SAME
return 0;
}
Output:
size of an int: 4
size of a double: 8
size of test: 24
start address of the object: 0x7fffb9fd44e0
address of i member: 0x7fffb9fd44e0
start address of the double field: 0x7fffb9fd44e8
calculate the offset of the double field: 0x7fffb9fd45a0
Why do the last two lines produce different values? Something I am doing wrong with pointer arithmetic?
(&te + sizeof(double))
This is the same as:
&((&te)[sizeof(double)])
You should do:
(char*)(&te) + sizeof(int)
You are correct -- the problem is with pointer arithmetic.
When you add to a pointer, you increment the pointer by a multiple of that pointer's type
Therefore, &te + 1 will be 24 bytes after &te.
Your code &te + sizeof(double) will add 24 * sizeof(double) or 192 bytes.
Firstly, your code is wrong, you'd want to add the size of the fields before h (i.e. an int), there's no reason to assume double. Second, you need to normalise everything to char * first (pointer arithmetic is done in units of the thing being pointed to).
More generally, you can't rely on code like this to work. The compiler is free to insert padding between fields to align things to word boundaries and so on. If you really want to know the offset of a particular field, there's an offsetof macro that you can use. It's defined in <stddef.h> in C, <cstddef> in C++.
Most compilers offer an option to remove all padding (e.g. GCC's __attribute__ ((packed))).
I believe it's only well-defined to use offsetof on POD types.
struct test
{
int i;
int j;
double h;
};
Since your largest data type is 8 bytes, the struct adds padding around your ints, either put the largest data type first, or think about the padding on your end! Hope this helps!
&te + sizeof(double) is equivalent to &te + 8, which is equivalent to &((&te)[8]). That is — since &te has type test *, &te + 8 adds eight times the size of a test.
You can see what's going on more clearly using the offsetof() macro:
#include <iostream>
#include <cstddef>
using namespace std;
struct test
{
int i;
double h;
int j;
};
int main()
{
test te;
te.i = 5;
te.h = 6.5;
te.j = 10;
cout << "size of an int: " << sizeof(int) << endl; // Should be 4
cout << "size of a double: " << sizeof(double) << endl; // Should be 8
cout << "size of test: " << sizeof(test) << endl; // Should be 24 (word size of 8 for double)
cout << "i: size = " << sizeof te.i << ", offset = " << offsetof(test, i) << endl;
cout << "h: size = " << sizeof te.h << ", offset = " << offsetof(test, h) << endl;
cout << "j: size = " << sizeof te.j << ", offset = " << offsetof(test, j) << endl;
return 0;
}
On my system (x86), I get the following output:
size of an int: 4
size of a double: 8
size of test: 16
i: size = 4, offset = 0
h: size = 8, offset = 4
j: size = 4, offset = 12
On another system (SPARC), I get:
size of an int: 4
size of a double: 8
size of test: 24
i: size = 4, offset = 0
h: size = 8, offset = 8
j: size = 4, offset = 16
The compiler will insert padding bytes between struct members to ensure that each member is aligned properly. As you can see, alignment requirements vary from system to system; on one system (x86), double is 8 bytes but only requires 4-byte alignment, and on another system (SPARC), double is 8 bytes and requires 8-byte alignment.
Padding can also be added at the end of a struct to ensure that everything is aligned properly when you have an array of the struct type. On SPARC, for example, the compile adds 4 bytes pf padding at the end of the struct.
The language guarantees that the first declared member will be at an offset of 0, and that members are laid out in the order in which they're declared. (At least that's true for simple structs; C++ metadata might complicate things.)
Compilers are free to space out structs however they want past the first member, and usually use padding to align to word boundaries for speed.
See these:
C struct sizes inconsistence
Struct varies in memory size?
et. al.
I have been having some problems with downward type conversion in C++ using pointers, and before I came up with the idea of doing it this way Google basically told me this is impossible and it wasn't covered in any books I learned C++ from. I figured this would work...
long int TheLong=723330;
int TheInt1=0;
int TheInt2=0;
long int * pTheLong1 = &TheLong;
long int * pTheLong2 = &TheLong + 0x4;
TheInt1 = *pTheLong1;
TheInt2 = *pTheLong2;
cout << "The double is " << TheLong << " which is "
<< TheInt1 << " * " << TheInt2 << "\n";
The increment on line five might not be correct but the output has me worried that my C compiler I am using gcc 3.4.2 is automatically turning TheInt1 into a long int or something. The output looks like this...
The double is 723330 which is 723330 * 4067360
The output from TheInt1 is impossibly high, and the output from TheInt2 is absent.
I have three questions...
Am I even on the right track?
What is the proper increment for line five?
Why the hell is TheInt1/TheInt2 allowing such a large value?
int is probably 32 bit, which gives it a range of -2*10^9 to 2*10^9.
In the line long int * pTheLong2 = &TheLong + 0x4; you are doing pointer arithmetic to a long int*, which means the address will increase by the size of 0x4 long ints. I guess you are assuming that long int is twice the size of int. This is absolutely not guaranteed, but probably true if you are compiling in 64 bit mode. So you want to add half the size of a long int -- exactly the size of an int under your assumption -- to your pointer. int * pTheLong2 = (int*)(&TheLong) + 1; achieves this.
You are on the right track, but please keep in mind, as others have pointed out, that you are now exploring undefined behaviour. This means that portability is broken and optimization flags may very well change the behaviour.
By the way, a more correct thing to output (assuming that the machine is little-endian) would be:
cout << "The long is " << TheLong << " which is "
<< TheInt1 << " + " << TheInt2 << " * 2^32" << endl;
For completeness' sake, a well-defined conversion of a 32 bit integer to two 16 bit ones:
#include <cstdint>
#include <iostream>
int main() {
uint32_t fullInt = 723330;
uint16_t lowBits = (fullInt >> 0) & 0x0000FFFF;
uint16_t highBits = (fullInt >> 16) & 0x0000FFFF;
std::cout << fullInt << " = "
<< lowBits << " + " << highBits << " * 2^16"
<< std::endl;
return 0;
}
Output: 723330 = 2434 + 11 * 2^16
Am I even on the right track?
Probably not. You seem confused.
What is the proper increment for line five?
There are none. Pointer arithmetic is possible only inside arrays, you have no arrays here. So
long int * pTheLong2 = &TheLong + 0x4;
is undefined behavior and any value other than 0 (and possibly 1) by which you'd replace 0x4 would also be UB.
Why the hell is TheInt1/TheInt2 allowing such a large value?
int and long int often have the same range of possible values.
TheInt2 = *pTheLong2;
This invokes undefined behavior, because the C++ Standard does not give any guarantee as to which memory location pTheLong2 is pointing to, as it's initlialized as:
long int * pTheLong2 = &TheLong + 0x4;
&TheLong is a memory location of the variable TheLong and pTheLong2 is initialized to a memory location which is either not a part of the program hence illegal, or its pointing to a memory location within the program itself, though you don't know where exactly, neither the C++ Standard gives any guarantee where it's pointing to.
Hence, dereferencing such a pointer invokes undefined behavior.