Strange behavior when obtaining addresses of functions - c++

I've yet again been faced with the challenge of a slight hack in my code to accommodate for an unmodifiable library of some poorly written code.
After hours of research (after finding out you cannot pass any sort of state to a function pointer), I need to NOP out the ASM bytes within the function header that reset EAX to 0xCCCCCCCC.
By using the built in VC++ debugger to obtain the 'address' of the function and manually creating an array of bytes entry in cheat engine (which is surprisingly useful for this sort of thing), it successfully pulled up the bytes and I could NOP the 5 byte sequence manually.
However, doing this programmatically is a little different.
When I do any of the following, the address is significantly higher than what the debugger is reporting, this giving me a pointer to the wrong bytes.
unsigned int ptr = reinterpret_cast<unsigned int>(testhack);
unsigned char* tstc = (unsigned char*)&testhack;
FARPROC prc = GetProcAddress(GetModuleHandle("mod.dll"), "testhack"); // Still
// returns the incorrect address (same as the previous two)
What is the debugger doing to find the correct address? How can I find the correct address programmatically?

Presumably the function address is in a DLL. In that case what you're seeing is the loader relocating the addresses of for example &testhack into the correct location for the memory in which your program is loaded. The debugger probably already knows about the relocations and is helpfully automatically taking care of that for you, allowing you to modify the bytes.
Unfortunately I'm not aware of any mechanism to work around this, but if you link statically the addresses will be fixed at link time rather than runtime.

Figured it out.
Totally forgot about (what I believe are called) lookup tables. The address I was receiving was the pointer to the lookup table entry, which was a simple 5-byte unconditional JMP to the real function body. All I had to do was get the offset, add it to the end of the lookup table entry, and then I had my correct address.
Here is the code to reach the correct start address of the function.
// Get the lookup table address
const char* ptr = (const char*)&testhack;
// Get the offset (+ 1 byte retrieved as a long)
unsigned int offset = (unsigned int)(*((unsigned int*)(ptr + 1)));
// Add to the original look up pointer, +5 (the offset starts after the
// end of the instruction)
const char* tst = (const char*)(ptr + offset + 5);
tst then holds the correct address!
As of now, the calling ASM instructions put the pointer to the struct containing the function pointers into EAX. Normally, the callback function would reset EAX. However, I've applied this hack and NOP'd out that part. The first lines of code within the function create a local pointer, and then inline assembly to mov [ptr],eax the address into the pointer. I can then use the pointer normally.

Related

embed a functions assembly code in a struct

I've a rather special question: is it possible in C/++ (both because I am sure the question is the same in both languages) to specify a functions's location? Why? I have a very large list of function pointers, and I want to eliminate them.
(Currently) This looks like that(repeated over lika a million times, stored in the user's RAM):
struct {
int i;
void(* funptr)();
} test;
Because I know that in most assembly languages, functions are just "goto" directives, I had the following idea. Is it possible to optimize the above construct so that it looks like that?
struct {
int i;
// embed the assembler of the function here
// so that all the functions
// instructions are located here
// like this: mov rax, rbx
// jmp _start ; just demo code
} test2;
In the end, the thing should look like this in memory: An int holding any value, followed by the function's assembly code, referenced by test2. I should be able to call these functions like that: ((void(*)()) (&pointerToTheStruct + sizeof(int)))();
You might think that I'm insane to optimize the app that way, and I cannot disclose any more details on it's function, but if anyone has some pointers on how solve this problem, I would appreciate it.
I do not think that there is a standard way to this, so any hacky way to do this via inline assembler/other crazy things is also appreciated!
The only thing you really have to do is make the compiler aware of the (constant) value of the function pointer you want in the struct. The compiler will then (presumably/hopefully) inline that function call wherever it sees it called through that function pointer:
template<void(*FPtr)()>
struct function_struct {
int i;
static constexpr auto funptr = FPtr;
};
void testFunc()
{
volatile int x = 0;
}
using test = function_struct<testFunc>;
int main()
{
test::funptr();
}
Demo - no call or jmp after optimization.
It remains unclear what the point of the int i is. Note that the code is not technically "directly after the i" here, but it is even more unclear how you'd expect instances of the struct to look like (is the code in them or is it "static" in a way? I feel there is some misunderstanding here on your part what compilers actually produce...). But consider the ways that compiler inlining can help you and you might find the solution you need. If you're worried about executable size after inlining, tell the compiler and it will compromise between speed and size.
This sounds like a terrible idea for a lot of reasons that probably won't save memory, and will hurt performance by diluting L1I-cache with data and L1D-cache with code. And worse if you ever modify or copy objects: self-modifying code stalls.
But yes, this would be possible in C99/C11 with a flexible array member at the end of the struct, which you cast to a function pointer.
struct int_with_code {
int i;
char code[]; // C99 flexible array member. GNU extension in C++
// Store machine code here
// you can't get the compiler to do this for you. Good Luck!
};
void foo(struct int_with_code *p) {
// explicit C-style cast compiles as both C and C++
void (*funcp)(void) = ( void (*)(void) ) p->code;
funcp();
}
Compiler output from clang7.0, on the Godbolt compiler explorer is the same when compiled as either C or C++. This is targeting the x86-64 System V ABI, where the first function arg is passed in RDI.
# this is the code that *uses* such an object, not the code that goes in its code[]
# This proves that it compiles,
# without showing any way to get compiler-generated code into code[]
foo: # #foo
add rdi, 4 # move the pointer 4 bytes forward, to point at code[]
jmp rdi # TAILCALL
(If you leave out the (void) arg-type declaration in C, the compiler will zero AL first in the x86-64 SysV calling convention, in case its actually a variadic function, because it's passing no FP args in registers.)
You'd have to allocate your objects in memory that was executable (normally not done unless they're const with static storage), e.g. compile with gcc -zexecstack. Or use a custom mmap/mprotect or VirtualAlloc/VirtualProtect on POSIX or Windows.
Or if your objects are all statically allocated, it might be possible to massage compiler output to turn functions in the .text section into objects by adding an int member right before each one. Maybe with some .section and linker tricks, and maybe a linker script, you could even somehow automate it.
But unless they're all the same length (e.g. with padding like char code[60]), that won't form an array you can index, so you'll need some way of referencing all these variable-length object.
There are potentially huge performance downsides if you ever modify an object before calling its function: on x86 you'll get self-modifying-code pipeline nuke for executing code near a just-written memory location.
Or if you copied an object before calling its function: x86 pipeline flush, or on other ISAs you need to manually flush caches to get the I-cache in sync with D-cache (so the newly-written bytes can be executed). But you can't copy such objects because their size isn't stored anywhere. You can't search the machine code for a ret instruction, because a 0xc3 byte might appear somewhere that's not the start of an x86 instruction. Or on any ISA, the function might have multiple ret instructions (tail duplication optimization). Or end with a jmp instead of a ret (tailcall).
Storing a size would start to defeat the purpose of saving size, eating up at least an extra byte in each object.
Writing code to an object at runtime, then casting to a function pointer, is undefined behaviour in ISO C and C++. On GNU C/C++, make sure you call __builtin___clear_cache on it to sync caches or whatever else is necessary. Yes, this is needed even on x86 to disable dead-store elimination optimizations: see this test case. On x86 it's just a compile-time thing, no extra asm. It doesn't actually clear any caches.
If you do copy at runtime startup, maybe allocate a big chunk of memory and carve out variable-length chunks of it, while copying. If you malloc each separately, you're wasting memory-management overhead on it.
This idea will not save you memory unless you have about as many functions as you have objects
Normally you have a fairly limited number of actual functions, with many objects having copies of the same function pointer. (You've kind of hand-rolled C++ virtual functions, but with only one function you just have a function pointer directly instead of a vtable pointer to a table of pointers for that class type. One fewer levels of indirection, and apparently you're not passing the object's own address to the function.)
One of the several benefits of this level of indirection is that one pointer is usually significantly smaller than the entire code for a function. For that to not be the case, your functions would have to be tiny.
Example: with 10 different functions of 32 bytes each, and 1000 objects with function pointers, you have a total of 320 bytes of code (which will stay hot in I-cache), and 8000 bytes of function pointers. (And in your objects, another 4 bytes per object wasted on padding to align the pointer, making the total size 16 instead of 12 bytes per object.) Anyway, that's 16320 bytes total for entire structs + code. If you allocated each object separately, there's per-object bookkeeping.
With inlining machine code into each object, and no padding, that's 1000 * (4+32) = 36000 bytes, over twice the total size.
x86-64 is probably a best-case scenario, where a pointer is 8 bytes and x86-64 machine code uses a (famously complex) variable-length instruction encoding which allows for high code density in some cases, especially when optimizing for code-size. (e.g. code-golfing. https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code). But unless your functions are mostly something trivial like lea eax, [rdi + rdi*2] (3 bytes=opcode + ModRM + SIB) / ret (1 byte), they're still going to take more than 8 bytes. (That's return x*3; for a function that takes a 32-bit integer x arg, in the x86-64 System V ABI.)
If they're wrappers for larger functions, a normal call rel32 instruction is 5 bytes. A load of static data is at least 6 bytes (opcode + modrm + rel32 for a RIP-relative addressing mode, or loading EAX specifically can use the special no-modrm encoding for an absolute address. But in x86-64 that's a 64-bit absolute unless you use an address-size prefix too, potentially causing an LCP stall in the decoders on Intel. mov eax, [32 bit absolute address] = addr32 (0x67) + opcode + abs32 = 6 bytes again, so this is worse for no benefit).
Your function-pointer type doesn't have any args (assuming this is C++ where foo() means foo(void) in a declaration, not like old C where an empty arg list is somewhat similar to (...)). Thus we can assume you're not passing args, so to do anything useful the functions are probably accessing some static data or making another call.
Ideas that make more sense:
Use an ILP32 ABI like Linux x32, where the CPU runs in 64-bit mode but your code uses 32-bit pointers. This would make each of your objects only 8 bytes instead of 16. Avoiding pointer-bloat is a classic use-case for x32 or ILP32 ABIs in general.
Or (yuck) compile your code as 32-bit. But then you have obsolete 32-bit calling conventions that pass args on the stack instead of registers, and less than half the registers, and much higher overhead for position-independent code. (No EIP/RIP-relative addressing.)
Store an unsigned int table index to a table of function pointers. If you have 100 functions but 10k objects, the table is only 100 pointers long. In asm you could index an array of code directly (computed goto style) if all the functions were padded to the same length, but in C++ you can't do that. An extra level of indirection with a table of function pointers is probably your best bet.
e.g.
void (*const fptrs[])(void) = {
func1, func2, func3, ...
};
struct int_with_func {
int i;
unsigned f;
};
void bar(struct int_with_func *p) {
fptrs[p->f] ();
}
clang/gcc -O3 output:
bar(int_with_func*):
mov eax, dword ptr [rdi + 4] # load p->f
jmp qword ptr [8*rax + fptrs] # TAILCALL # index the global table with it for a memory-indirect jmp
If you were compiling a shared library, PIE executable, or not targeting Linux, the compiler couldn't use a 32-bit absolute address to index a static array with one instruction. So there'd be a RIP-relative LEA in there and something like jmp [rcx+rax*8].
This is an extra level of indirection vs. storing a function pointer in each object, but it lets you shrink each object to 8 bytes, down from 16, like using 32-bit pointers. Or to 5 or 6 bytes, if you use an unsigned short or uint8_t and pack the structs with __attribute__((packed)) in GNU C.
No, not really.
The way to specify a function's location is to use a function pointer, which you're already doing.
You could make different types which have their own different member functions, but then you're back to the original problem.
I have in the past experimented with auto-generating (as a pre-build step, using Python) a function with a long switch statement that does the work of mapping int i to a normal function call. This gets rid of the function pointers, at the expense of branching. I don't remember whether it ended up being worthwhile in my case and, even if I did, that wouldn't tell us whether it's worthwhile in your case.
Because I know that in most assembly languages, functions are just "goto" directives
Well, it's perhaps a little more complicated than that…
You might think that I'm insane to optimize the app that way
Perhaps. Trying to eliminate indirection is not, in itself, a bad thing, so I don't think you're wrong to try to improve this. I just don't think that you necessarily can.
but if anyone has some pointers
lol
I don't understand the goal of this "optimization" is it about saving the memory?
I might be misunderstanding the question, but if you just replace your function pointer with a regular function, then you'll have your struct only containing the int as data and the function-pointer being inserted by the compiler when you take the address of it, instead of stored in memory.
So just do
struct {
int i;
void func();
} test;
Then sizeof(test)==sizeof(int) should hold true if you set alignment/packing to be tight.

Accessing memory mapped register

Assume there is a memory mapped device at address 0x1ffff670. The device register has only 8-bits. I need to get the value in that register and increment by one and write back.
Following is my approach to do that,
In the memory I think this is how the scenario looks like.
void increment_reg(){
int c;//to save the address read from memory
char *control_register_ptr= (char*) 0x1ffff670;//memory mapped address. using char because it is 8 bits
c=(int) *control_register_ptr;// reading the register and save that to c as an integer
c++;//increment by one
*control_register_ptr=c;//write the new bit pattern to the control register
}
Is this approach correct? Many thanks.
Your approach is almost correct. The only missing part - as pointed out in the comments on the question - is adding a volatile to the pointer type like so:
volatile unsigned char * control_register_ptr = ...
I would also make it unsigned char, since that is usually a better fit, but that's basically not that much different (the only meaningful difference would be when shifting the value down.)
The volatile keyword signals to the compiler the fact that the value at that address might change from outside the program (i.e. by code that the compiler doesn't see and know about.) This will make the compiler more conservative in optimizing loads and stores away, for example.

How to change returned value of function

There is a function in this program, that currently returns a 1. I would prefer for it to return a 0.
uregs[R_PC] is the program counter.
arg0 is the program counter offset from where we left the function (assembly, "ret").
From this I deduce: we can add the offset to the program counter, uregs[R_PC]+arg0, to find the address of the return value.
I have allocated a 32-bit "0", and I try to write 2 bytes of it into the address where the return value lives (our function expects to return a BOOL16, so we only need 2 bytes of 0):
sudo dtrace -p "$(getpid)" -w -n '
int *zero;
BEGIN { zero=alloca(4); *zero=0; }
pid$target::TextOutA:return {
copyout(zero, uregs[R_PC]+arg0, 2);
}'
Of course I get:
dtrace: error on enabled probe ID 2 (ID 320426: pid60498:gdi32.dll.so:TextOutA:return): invalid address (0x41f21c) in action #1 at DIF offset 60
uregs[R_PC] is presumably a userspace address. Probably copyout() wants a kernel address.
How do I translate the userspace address uregs[R_PC] to kernel-space? I know that with copyin() we can read data stored at user-space address, into kernel-space. But that doesn't give us the kernel address of that memory.
Alternatively: is there some other way to change the return value using DTrace?
DTrace is not the right tool for this. You should instead use a debugger like dbx, mdb or gdb.
In the meantime, I'll try to clarify some of the concepts that you've mentioned.
To begin, you may well see in the source code for a simple function that there is a single return. It is quite possible that the compiled result, i.e. the function's machine-specific implementation, also contains only a single point of exit. Typically, however, the implementation is likely to contain more than one exit point and it may be useful for a developer to know from which specific one a function returned. It is this information, described as an offset from the start of the function, that is given by a return probe's arg0. Your D script, then, is attempting to update part of the program or library itself; although the addition of arg0 makes the destination address somewhat random, the result is most likely still within the text section, which is read-only.
Secondly, in the common case, a function's implementation returns a value by storing it in a specific register; e.g. %rax on amd64. Thus overriding a return value would neccessitate overriding a register value. This is impossible because DTrace's access to the user-land registers is read-only.
It is possible that a function is implemented in such a way that, as it returns, it recovers the return value from a specific memory location before writing it into the appropriate register. If this were the case then one could, indeed, modify the value in memory (given its location) just before it is accessed. However, this is going to work for only a subset of cases: the return value might equally be contained in another register or else simply expressed as a constant in the program text itself. In any case, it would be far more trouble than it's worth given the existence of more appropriate debugging tools.

C++ Memory editing- Editing assembly/writing bytes

At address 10134CE0 I have
10134CE0 - 40 - inc eax
How could I change this (using C++ hopefully with WriteProcessMemory) to make it
dec eax
I know 40 means inc eax and 48 means dec eax but how could I change the 40 into 48?
First, if this is code and part of your program, you should make sure that the segment is writable to you. Otherwise, you cannot dynamically patch your code.
If it is, then the following will do the trick in C (C++ might benefit from using a more beautiful static_cast<>):
uint8_t *code = (uint8_t*)0x10134ce0;
*code = 0x48;
The first line declares a pointer and assigns it the address of your code. In the second line you then use this pointer to overwrite the original instruction.
If you are thinking about patching x86 code in general, note that simply doing this will not suffice. x86 is a packed instruction set and operations may have different lengths. In this case, overwriting one instruction with another might be hard, because the new instruction may be longer and you thereby would overwrite one or more instructions you did not mean to patch.
For such cases, you'll need to disassemble the original code and re-assemble a new instance that you use instead of your old code. For such purposes, I like using udis86 as a disassembler, and AsmJit to create new code on the fly.
Using WriteProcessMemory (if appropriate):
uint8_t buffer;
BOOL ok
buffer = 0x48;
ok = WriteProcessMemory(<handle of the process>, 0x10134CE0, &buffer, 1, NULL);
But there's the question as to whether you should be writing to the memory of another process, if that's what you're doing, even if you have the permission.
If all you're asking is "how do I write the byte 0x48 at memory location 0x10134CE0", that's:
*(char *)0x10134CE0 = 0x48
Doing the same thing in another process's memory space looks like this:
char val = 0x48;
BOOL success = WriteProcessMemory(target, 0x10134CE0, &val, 1, NULL);
Presumably you want to do more than just this, but you haven't explained what that "more" is, which is going to make it hard for anyone to answer. BjoemD made some valiant attempts at reading your mind. If he succeeded, great. If not, please tell us what else you need.

off-by-one error with string functions (C/C++) and security potentials

So this code has the off-by-one error:
void foo (const char * str) {
char buffer[64];
strncpy(buffer, str, sizeof(buffer));
buffer[sizeof(buffer)] = '\0';
printf("whoa: %s", buffer);
}
What can malicious attackers do if she figured out how the function foo() works?
Basically, to what kind of security potential problems is this code vulnerable?
I personally thought that the attacker can't really do anything in this case, but I heard that they can do a lot of things even if they are limited to work with 1 byte.
The only off-by-one error I see here is this line:
buffer[sizeof(buffer)] = '\0';
Is that what you're talking about? I'm not an expert on these things, so maybe I've overlooking something, but since the only thing that will ever get written to that wrong byte is a zero, I think the possibilities are quite limited. The attacker can't control what's being written there. Most likely it would just cause a crash, but it could also cause tons of other odd behavior, all of it specific to your application. I don't see any code injection vulnerability here unless this error causes your app to expose another such vulnerability that would be used as the vector for the actual attack.
Again, take with a grain of salt...
Read Shell Coder's Handbook 2nd Edition for lots of information.
Disclaimer: This is inferred knowledge from some research I just did, and should not be taken as gospel.
It's going to overwrite part or all of your saved frame pointer with a null byte - that's the reference point that your calling function will use to offset it's memory accesses. So at that point the calling function's memory operations are going to a different location. I don't know what that location will be, but you don't want to be accessing the wrong memory. I won't say you can do anything, but you might be able to do something.
How do I know this (really, how did I infer this)? Smashing the stack for Fun and Profit by Aleph One. It's quite old, and I don't know if Windows or Compilers have changed the way the stack behaves to avoid these problems. But it's a starting point.
example1.c:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
To understand what the program does to call function() we compile it with
gcc using the -S switch to generate assembly code output:
$ gcc -S -o example1.s example1.c
By looking at the assembly language output we see that the call to
function() is translated to:
pushl $3
pushl $2
pushl $1
call function
This pushes the 3 arguments to function backwards into the stack, and
calls function(). The instruction 'call' will push the instruction pointer
(IP) onto the stack. We'll call the saved IP the return address (RET). The
first thing done in function is the procedure prolog:
pushl %ebp
movl %esp,%ebp
subl $20,%esp
This pushes EBP, the frame pointer, onto the stack. It then copies the
current SP onto EBP, making it the new FP pointer. We'll call the saved FP
pointer SFP. It then allocates space for the local variables by subtracting
their size from SP.
We must remember that memory can only be addressed in multiples of the
word size. A word in our case is 4 bytes, or 32 bits. So our 5 byte buffer
is really going to take 8 bytes (2 words) of memory, and our 10 byte buffer
is going to take 12 bytes (3 words) of memory. That is why SP is being
subtracted by 20. With that in mind our stack looks like this when
function() is called (each space represents a byte):
bottom of top of
memory memory
buffer2 buffer1 sfp ret a b c
<------ [ ][ ][ ][ ][ ][ ][ ]
top of bottom of
stack stack
What can malicious attackers do if she
figured out how the function foo()
works? Basically, to what kind of
security potential problems is this
code vulnerable?
This is probably not the best example of a bug that could be easily exploited for security purposes although it could exploited to potentially crash the code simply by using a string of 64-characters or longer.
While it certainly is a bug that will corrupt the address immediately after the array (on the stack) with a single zero byte, there is no easy way for a hacker to inject data into the corrupted area. Calling the printf() function will push parameters on the stack and may clear the zero that was written out of array bounds and lead to a potentially unterminated string being passed to printf.
However, without intimate knowledge of what goes on in printf (and needing to exploit printf as well as foo), a hacker would be hard pressed to do anything other than crash your code.
FWIW, this is a good reason to compile with warnings on or to use functions like strncpy_s which both respects buffer size and also includes a terminating null even if the copied string is larger than the buffer. With strncpy_s, the line "buffer[sizeof(buffer)] = '\0';" is not even necessary.
The issue is that you don't have permission to write to the item after the array. When you asked for 64 chars for buffer, the system is required to give you at least 64 bytes. It's normal for the system to give you more than that -- in which case the memory belongs to you and there is no problem in practice -- but it is possible that even the first byte after the array belongs to "somebody else."
So what happens if you overwrite it? If the "somebody else" is actually inside your program (maybe in a different structure or thread) the operating system probably won't notice you trampled on that data, but that other structure or thread might. There's no telling what data should be there or how trampling over it will affect things.
In this case you allocated buffer on the stack, which means (1) the somebody else is you, and in fact is your current stack frame, and (2) it's not in another thread (but could affect other local variables in the current stack frame).