Please take a look at this macro. It is used in Symbian OS SDK, which compiler is based on GCC (< 4 version of it).
#ifndef _FOFF
#if __GNUC__ < 4
#define _FOFF(c,f) (((TInt)&(((c *)0x1000)->f))-0x1000)
#else
#define _FOFF(c,f) __builtin_offsetof(c,f)
#endif
#endif
I understand that it is calculating offset to specific class/struct member. But I cannot understand how that weird statement works - what is the constant 0x1000 and why is it there? Could somebody please explain this to me?
Imo 0x1000 is just a randomly chosen number. It is not a valid pointer, and it you could probably use zero instead of it.
How it works:
Casts 0x1000 into class pointer (pointer of type c). - (c*)0x1000
Takes pointer to "f" member of class c - &(((c *)0x1000)->f)
Casts it into TInt. ((TInt)&(((c *)0x1000)->f))
Substracts integer value of pointer to base (0x1000 in this case) from integer value of pointer to c's member: (((TInt)&(((c *)0x1000)->f))-0x1000)
Becuase f isn't being written to, there is no accessViolation/segfault.
You could probably use zero instead of 0x1000 and discard subtraction (i.e. just use "((TInt)&(((c *)0x0000)->f))"), but maybe author thought think that subtracting base pointer from pointer to member is a more "proper" way than trying to directly cast pointer into integer. Or maybe compiler provides "hidden" class members that can have negative offset (which is possible in some compilers - for example Delphi Compiler (I know it isn't c++) provided multiple hidden "fields" that were located before "self"(analogue of "this") pointer), in which case using 0x1000 instead of 0 makes sense.
"If there was a member of struct c starting exactly at the (perfectly-aligned;-) address 0x1000, then at what address would the struct's member f be?" -- answer: the offset you're looking for, minus of course the hypothetical starting address 0x1000 for the struct... with the difference, AKA distance or offset, computed as integers, otherwise the automatic scaling in address arithmetic throws you off (whence the cast).
What parts of the expression, specifically, are giving you problems?
The inner part &(((c *)0x1000)->f) is "the address of member f of a hypothetical struct c located at 0x1000. Right in front of it is the cast (I assume TInt is some kind of integer type, of course), then the - 0x1000 to get the offset (AKA distance or difference between the address of the specific member of interest and the start of the whole structure).
It is working out the relative address of 'f' as a member of a class/struct at address 0x1000, and then subtracting 0x1000 so that only the difference between the class/struct address and the member function address is returned. I imagine a non-zero value (i.e. the 0x1000) is used to avoid null pointer detection.
Related
I'm not a C++ programmer and faced in source code the following macro definition:
// HACK: gcc warns about applying offsetof() to non-POD object or calculating
// offset directly when base address is NULL. Use 16 to get around the
// warning. gcc-3.4 has an option -Wno-invalid-offsetof to suppress
// this warning.
#define offset_of(klass,field) (size_t)((intx)&(((klass*)16)->field) - 16)
The facts I was confused is these:
((klass*)16)->field) casting 16 to klass* and then inderecting to field field?
(intx)&(...) Bitwise or with intx or casting to reference to some type intx?
Where intx is defined as follows:
typedef intptr_t intx
The purpose of such kind of macro is to get memory offset of a specific field in a struct. It's not unusual.
To explain the detail, (klass*)16 will convert 16 to a pointer, then (intx)&(((klass*)16)->field) will look at the address of field field and interpret this address as an integer. Combining with the following minus 16, the offset of this field is obtained.
For example:
#define offset_of(klass,field) (size_t)((size_t)&(((klass*)16)->field) - 16)
struct foo {
int a;
char b;
short c;
};
int main() {
size_t offset_of_b = offset_of(foo, b);
}
Here the offset_of_b is just the memory offset of field b in the struct foo.
The & in this case is an address-of operator, taking the address of field within the klass type.
The rest of it is just a matter of parenthesis to make sure that you get the RIGHT bits evaluated in the right place, and several casts to ensure that pointer to field is an integer type that can be cast to size_t.
This macro replaces code whose behavior is undefined with different code whose behavior is also undefined, in an attempt to avoid a warning that the behavior of the first version is undefined.
Trying to analyze why it "works" is a parlor game. It is not engineering. That's why it's labeled "HACK".
I'm testing some ways of calculating the size,in bytes of a function(I'm familiar with opcodes on x86). The code is quite self-explanatory:
void exec(void* addr){
int (WINAPI *msg)(HWND,LPCSTR,LPCSTR,UINT)=(int(WINAPI *)(HWND,LPCSTR,LPCSTR,UINT))addr;
msg(0,"content","title",0);
}
void dump(){};
int main()
{
cout<<(char*)dump-(char*)exec; // this is 53
return 0;
}
It is supposed to substract the address of 'exec' from 'dump'. This works but I noticed the values differ when using other types of pointers like DWORD*:
void exec(void* addr){
int (WINAPI *msg)(HWND,LPCSTR,LPCSTR,UINT)=(int(WINAPI *)(HWND,LPCSTR,LPCSTR,UINT))addr;
msg(0,"content","title",0);
}
void dump(){};
int main()
{
cout<<(DWORD*)dump-(DWORD*)exec; // this is 13
return 0;
}
From my understanding no matter the pointer type ,it is always the largest possible data type(so that it can handle large adresses),in my case of 4 bytes (x86 system). The only thing that changes between pointers is the data type it points to.
What is the explanation?
Pointer arithmetic in C/C++ is designed for accessing elements of an array. In fact, array indexing is merely a simpler syntax for pointer arithmetic. For example, if you have an array named array, array[1] is the same thing as *(array+1), regardless of the data type of the elements in array.
(I'm assuming here that no operator overloading is going on; that could change everything.)
If you have a char* or unsigned char*, the pointer points to a single byte, and incrementing the pointer advances it to the next byte.
In Windows, DWORD is a 32-bit value (four bytes), and DWORD* points to a 32-bit value. If you increment a DWORD*, the pointer is advanced by four bytes, just as array[1] gives you the second element of the array, which is four bytes (one DWORD) after the first element. Similarly, if you add 10 to a DWORD*, it advances 40 bytes, not 10 bytes.
Either way, incrementing or adding to a pointer is only valid if the resulting pointer points into the same array as the original one, or one element past the end. Otherwise it is undefined behavior.
Pointer subtraction works just like addition. When you subtract one pointer from another, they must be the same type, and must be pointers into the same array or one past the end.
What you're doing is counting the number of elements between the two pointers, as if they were pointers into the same array (or one past the end). But when the two pointers don't point into the same array (or again, one past the end), the result is undefined behavior.
Here is a reference from Carnegie Mellon University about this:
ARR36-C. Do not subtract or compare two pointers that do not refer to the same array - SEI CERT C Coding Standard
Pointer subtraction tells you the number of elements between the two addresses, so using DWORD * it will be in DWORD sized units.
You have:
cout<<(char*)dump-(char*)exec;
where dump and exec are the names of functions. Each cast converts a function pointer to char*.
I'm not sure about the status of such a conversion in C++. I think it either has undefined behavior or is illegal (making your program ill-formed). When I compile with g++ 4.8.4 with options -pedantic -std=c++11, it complains:
warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Wpedantic]
(There's a similar diagnostic for C, which I believe is not strictly correct, but that's another story.)
There's no guarantee that there's any meaningful relationship between object pointers and function pointers.
Apparently your compiler lets you get away with the casts, and presumably the result is a char* representation of the address of the function. Subtracting two pointers yields the distance between the two addresses in units of the type the pointers point to. Subtracting two char* pointers yields a ptrdiff_t result that is the difference in bytes. Subtracting two DWORD* pointers yields the difference in unit of sizeof (DWORD) (probably 4 bytes?). That explains why you get different results. If two DWORD pointers don't point to addresses that aren't a whole number of DWORDs apart in memory, the results are unpredictable, but in your example getting 13 rather than 53 (truncating) is plausible.
However, pointer subtraction is defined only when both pointer operands point to elements of the same array object, or just past the end of it. For any other operands, the behavior is undefined.
For an implementation that permits the casts, that uses the same representation for object pointers and for function pointers, and on which the value of a function pointer refers to a memory address in the same way that the value of an object pointer does, you can likely determine the size of a function by converting its address to char* and subtracting the result from the converted address of an adjacent function. But a compiler and/or linker is free to generate code for functions in any order it likes, including perhaps inserting code for other functions between two functions whose definitions are adjacent in your source code.
If you want to determine the size in bytes, use pointers to byte-sized types such as char. And be aware that the method you're using is not portable and is not guaranteed to work.
If you really need the size of a function, see if you can get your linker to generate some kind of map showing the allocated sizes and locations of your functions. There's no portable way to do it from within C++.
I want to know if there is anyway that I can store the address location of a variable as an integer value.
For example, let's say that I have a number stored in some location in memory
int i= 20;
and we know that for example, the location of the variable i is 0x77C79AB2.
e.g.
int * ip = &i;
so we know that ip = 0x77C79AB2.
But at this point the variable ip is just a pointer. But let's say that I now want to store the address location 0x77C79AB2 into a variable of type int (NOT of type Pointer).
so, somehow I want to be able to make another variable of type (int) to actually store the number 0x77C79AB2 as a value not as a address location.
int a = 0x77C79AB2;
So, I could do whatever I want with the variable a. For example, I want to treat a as an integer and add a hex number 0x20 to it.
e.g.
int b = a + 0x20 = 0x77C79AB2 + 0x20 = 0x77C79AD2
Is this possible?
How could I make this assignment ?
Pointers are not integers. If you want to store a pointer value, you should almost always store it in a pointer object (variable). That's what pointer types are for.
You can convert a pointer value to an integer using a cast, either a C-style cast:
int foo;
int addr = (int)&foo;
or using a C++-style cast:
int foo;
int addr = reinterpret_cast<int>(&foo);
But this is rarely a useful thing to do, and it can lose information on systems where int happens to be smaller than a pointer.
C provides two typedefs intptr_t and uintptr_t that are guaranteed to be able to hold a converted pointer value without loss of information. (If no integer types are wide enough for this, intptr_t and uintptr_t will not be defined). These are declared in the <stdint.h> header, or <cstdint> in C++:
#include <stdint.h>
// ...
int foo;
uintptr_t addr = reinterpret_cast<uintptr_t>(&foo);
You can then perform integer arithmetic on the value -- but there's no guarantee that the result of any arithmetic is meaningful.
I suspect that you're trying to do something that doesn't make much sense. There are rare cases where it makes sense to convert a pointer value to an integer and even rarer cases where it makes sense to perform arithmetic on the result. Reading your question, I don't see any real indication that you actually need to do this. What exactly are you trying to accomplish? It's likely that whatever you're trying to do, there's a cleaner way to do it.
Q: Is there any way I can store the address location of a variable as an integral value.
A: Sure. All you have to do is cast.
CAVEATS:
1) Just remember that sizeof(int) != sizeof (char *) on all platforms. As mentioned above, use "size_t" whenever possible.
2) For C++, consider using reinterpret_cast<>:
http://en.cppreference.com/w/cpp/language/reinterpret_cast
The header cstdint defines a type uintptr_t which is an integer type large enough to hold a pointer. Cast your pointer type to it with reinterpret_cast. e.g:
#include <cstdint>
...
int i = 20;
uintptr_t ip = reinterpret_cast<uintptr_t>(&i);
You can already do things like adding an offset by just going back to array syntax: ip[0x20]. You can typecast between different types of pointers to change the offset "step size".
Can anyone parse this following expression for me
#define Mask(x) (*((int *)&(x)))
I applied the popular right-left rule to solve but cant.. :(
Thanks a bunch ahead :)
This just reads out the first sizeof (int) bytes at the address of the argument, and returns them as an integer.
This defines a macro Mask that interprets its argument as an int.
&(x) - address of x...
(int *)&(x) - ...interpreted as a pointer to int
*((int *)&(x)) - the value at that pointer
You need to think inside out:
Find the address of x.
cast this to an integer pointer
dereference this pointer to return the value.
As such, int y = Mask(something); returns an integer interpretation of something.
To keep it simple:
#define Mask(x) reinterpret_cast<int&>(x)
I am assuming that there is no const_cast, that is, that the argument x is of non-const type, else there would be an extra const_cast in there.
This kind of macro can cast a float value to a DWORD in it's binary form and vice versa. This can be used with libraries that have functions using DWORD as generic input types.
An example would be SetRenderState() in DirectX :
HRESULT SetRenderState(
D3DRENDERSTATETYPE state,
DWORD value
);
In this particular case, some state require you to give a float value. Now, trying to pass 6.78f directly to that function would truncate 6.78f to an integer which would be 6. What we want is the binary form 0x40D8F5C3 so that the library will be able to cast it back to 6.78f.
That's what we call a reinterpret cast. It's platform dependent and potentialy dangerous unless you know what you are doing.
int main()
{
int *d=0;
printf("%d\n",*d);
return 0;
}
this works fine.
>cc legal.c
> ./a.out
0
if i change the statement int *d=0; to int *d=1;
i see the error.
cc: "legal.c", line 6: error 1522: Cannot initialize a pointer with an integer constant other than zero.
so its obvious that it will allow only zero.i want to know what happens inside the memory when we do this int *d=0 which is making it valid syntax.
I am just asking this out of curiosity!
I'm surprised that you didn't get a SEGFAULT when you ran your code. The *d in the printf statement is dereferencing the pointer. To answer your question, though, C++ allows 0 to be given as a default initializer for any object, which is why it can be used to initialize the pointer to null (0 and null are identical). With the value of 1, you are asking the compiler to convert an integer to a pointer, which requires an explicit cast.
When initializing a pointer with 0, that 0 is implicitly converted to a null-pointer. How that null-pointer looks depends on your platform, the compiler will use the correct binary value.
When you try to initialize the pointer with 1 (or any other non-zero integer) the compiler doesn't know to convert this value to a valid pointer and issues a warning.
You are creating a pointer variable called d on the stack which is said to "point to an integer". You then assign that pointer variable to 0 which makes it point to memory address 0x0 which valid (and the same as NULL in C).
To make this clearer, int *d = 0 is the same as:
int *d;
d = 0; // set it to address 0
If you want to point to an integer 1 then you need this:
int x = 1;
int *d = &x; // "set it to 'address of x'"
Incase of *d = 0, your integer pointer d is getting initialized to 0 which is valid. Basically you are declaring an integer pointer so it makes sense to initialize a pointer.
But, you don't want to initialize the pointer, rather initialize memory pointed to by it, which is incorrect.
When you do *d = 1, the pointer values becomes 1, and when your printf statement gets executed, it will try to access the value at address 1 which will not be allowed.
Hope this helps.
In your "working" example you are dereferencing a null pointer and the language is putting whatever bits it finds as the argument to printf. That it works at all is a totally implementation dependent feature of your compiler and machine and will likely segfault in another implementation.
That the code works seems to be an indication that your compiler is doing something odd behind the scenes in attempt to "protect" coders from a very common error; that's a bad idea. I'd love to see what assembly is generated by your compiler with cc -s
In ISO-C99, there are two types of null pointer constants: integer constant expression of value 0 - eg 0, 1 - 1, (int)0.0 - and such expressions cast to void * - eg (void *)0, which is often used to define NULL.
Converting a null pointer constant to an arbitrary pointer type yields a null pointer of that type. This conversion is implicit, but may actually involve address translation as the standard doesn't require null pointers to have the bit-representation 0.
This conversion is also defined for function pointer types, even if it's normally illegal to convert object pointers to function pointers:
void (*foo)(void) = (void *)0; // valid
void *bar = 0; // valid
void (*baz)(void) = (void (*)(void))bar; // invalid even with explicit cast
This also means that you can use 0 to initialize any scalar type without casting, and it's the only value for which this is true: Converting 0 to pointer types will always yield a null pointer, whereas converting any other integral value is possible, but requires explicit casting, has an implementation-defined result and might fail due to alignment or address space restrictions.