This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.
If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.
The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.
While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.
It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.
Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.
Related
This is a terrible idea, but i'm seeing if it's even feasible before I walk down this road.
I have to write a Win32 C++ program that can dynamically load a library based on a file that has serialized information on what dll, function, signature, and arguements to use. Loading the library is trivial (LoadLibraryEx works fine). Then getting the function pointer is easy (not a big deal, the GetProcAdderss takes care of this). However the rest is tricky.
Here's my plan of attack, feel free to let me know if this isn't the best approach:
Open the serialized information from a file on what DLL to load, and what function to execute.
LoadLibraryEx to bring in the DLL
GetProcAddress to get the function pointer (after casting the byte array to a string)
Write the arguments (which are read in as a byte array) to memory in bytes.
Get the address to the beginning of each argument (i'll know from serialization what the size of each argument is).
Using assembly jump to the beginning of the function pointer, push the addresses on the heap to the arguments in the stack (in reverse order).
Execute and get back the return value address (as a void * ?)
Use the memory address of the return value (that I got from assembly) and the size (which I got from the serialization) of the return type value and write the raw bytes back to a file.
Keep in mind my limitations:
I will never know except for run-time what the signature, dll, function name is.
It is always read in from a file.
Is there a better approach, will this approach even work?
Update
For anyone who comes poking in this thread to learn more, I found a solution. In C you can dynamically load a library using dlopen (there's a winlib of this for ease of use). Once loaded you can dynamically execute functions using libffi (supports mac/ios/win 64 and 32bit). This only gets you to C functions and primitive types (pointer,uint,int,double,float) and thats about it. However using macosx objective-c bridge you can access objective-c by loading libobjc (osx's native obj-c to c "toll free" bridge). Then through that dynamically create obj-c and c++ classes. A similar technique can be done on windows using C# and its marshaling capabilities.
This ends up with HIGH overhead, and you must be VERY careful about your memory, in addition don't mix pointers from C/C#/C++. Finally, whatever you do, at runtime. BE ABSOLUTELY SURE YOU KNOW YOUR TYPES.... seriously. BTW, libffi/cinvoke, amazing libraries.
There are existing libraries that can do what you describe, such as C/Invoke:
http://www.nongnu.org/cinvoke/
General rule, that if you have a terrible idea, drop it and find a good one.
If the signature is not known at all, what you describe will fall on face. Suppose your call works for my function as it is. I change the function from __stdcall to __cdecl or back, and you will crash.
Also you don't handle the return.
If you relax the "unknown" to allow some limitations, like fixing a return type and calling convention, you are somewhat ahead -- then really you can emulate the call but what is it good for? This whole thing sounds like a self-service hack-me engine.
The normal way is to publish some fixed interface (using function signatures), and make the DLL support it. Or arrange some uniform data transfer mechanism.
I'd suggest you to describe what you're after really and post ask that, maybe on Programmers SE.
I'm creating a class for a Lua binding which holds a pointer and can be changed by the scripter. It will include a few functions such as :ReadString and :ReadBool, but I don't want the application to crash if I can tell that the address they supplied will cause an access violation.
Is the a good way to detect if an address is outside of the readable/writable memory? Thanks!
A function library that may be useful is the "Virtual" function libraries, for example VirtualQuery
I'm not really looking for a foolproof design, I just want to omit the obvious (null pointers, pointers way outside the possible memory location)
I understand how unsafe this library is, and I'm not looking for safety, just sanity.
There are ways, but they do not serve the purpose you intend. That is; yes, you can determine whether an address appears to be valid at the current moment in time. But; no, you cannot determine whether that address will be valid a few clock cycles from now. Another thread could change the virtual memory map and a formerly valid address would become invalid.
The only way to properly handle the possibility of accessing suspect pointers is using whatever native exception handling is available on your platform. This may involve handling the signal SIG_BUS or it may involve using the proprietary __try and __catch extensions.
The idiom to use is the one wherein you attempt the access, and explicitly handle the resulting exception, if any does happen to occur.
You also have the problem of ensuring that the pointers you return point to your memory or to some other memory. For that, you could make your own data structure, a tree springs to mind, which stores the valid address ranges your "pointers" can achieve. Otherwise, the script code can hand you some absolute addresses and you will change memory structures by the operating system for the process environment.
The application you write about is highly suspect and you should probably start over with a less explosive design. But I thought I would tell you how to play with fire if you really want to.
Check out Raymond Chen's blog post, which goes more deeply into why this practice is bad. Most interestingly, he points out that, once a page is tested by IsBadReadPtr, further accesses to that page will not raise exceptions!
There is no, and that's why you should never do things like this.
Perhaps try using segvcatch, which can convert segfaults into C++ exceptions.
I'm writing a NES emulator in C/C++ for Mac OS (I've already written one, so I know the basics). Since many hardware registers are mapped to memory locations, I was wondering if there was some syscall I could use to map an address to the result of a function: when it would be accessed, the function would be called. (I'm pretty sure I can't, but hey, it's worth asking.)
Here is what I'd like to do:
int getStatusRegisterValue()
{
return 0xCAFEBABE;
}
// obviously, more parameters than just this would be involved I suppose
int* statusRegister = syscall_to_map_function_to_address(getStatusRegisterValue);
// from here on, doing (*statusRegister) should call getStatusRegisterValue and
// return its value
*statusRegister == 0xCAFEBABE;
This project is going to be my try at LLVM, and my goal is to recompile the ROM to LLVM bytecode. That's why it would be convenient if the simple memory access could trigger the function (just like on real NES hardware). The two other obvious possibilities to solve my problem are to either cache the register values and store them in actual memory, or call a function from the recompiled code to map the memory locations to whatever they really are.
Thanks!
Maybe you could try installing a SEGV handler and checking the faulting address there. As I don't use Mac OS I can't help you more.
This almost sounds just like normal function pointers:
typedef int(*function_type)(void);
function_type = &getStatusRegisterValue; // store
int i = function_type(); // call
Different syntax, same idea?
This can't be done in C (or C++, but let's just stick to C for simplicity).
You can "emulate" (ha) this effect with operator overloading and functors with explicit addressing, but it won't be the real thing. There are too many assumptions that must be made about the target function to do this normally.
1) You assume it always returns the same value.
Actually, that's about it. Still though, it's a big assumption to make!
In C++ you can overload the * and -> operators for user-defined classes. Would this allow you to achieve what you want?
You can't trigger functions to be executed on memory access unless you use memory breakpoints (which is really just for debugging, if available on the system).
This is also independent from the programming language, your question is rather aimed at modern computer platforms in general.
In my application I have quite some void-pointers (this is because of historical reasons, application was originally written in pure C). In one of my modules I know that the void-pointers points to instances of classes that could inherit from a known base class, but I cannot be 100% sure of it. Therefore, doing a dynamic_cast on the void-pointer might give problems. Possibly, the void-pointer even points to a plain-struct (so no vptr in the struct).
I would like to investigate the first 4 bytes of the memory the void-pointer is pointing to, to see if this is the address of the valid vtable. I know this is platform, maybe even compiler-version-specific, but it could help me in moving the application forward, and getting rid of all the void-pointers over a limited time period (let's say 3 years).
Is there a way to get a list of all vtables in the application, or a way to check whether a pointer points to a valid vtable, and whether that instance pointing to the vtable inherits from a known base class?
I would like to investigate the first
4 bytes of the memory the void-pointer
is pointing to, to see if this is the
address of the valid vtable.
You can do that, but you have no guarantees whatsoever it will work. Y don't even know if the void* will point to the vtable. Last time I looked into this (5+ years ago) I believe some compiler stored the vtable pointer before the address pointed to by the instance*.
I know this is platform, maybe even
compiler-version-specific,
It may also be compiler-options speciffic, depending on what optimizations you use and so on.
but it could help me in moving the
application forward, and getting rid
of all the void-pointers over a
limited time period (let's say 3
years).
Is this the only option you can see for moving the application forward? Have you considered others?
Is there a way to get a list of all
vtables in the application,
No :(
or a way to check whether a pointer
points to a valid vtable,
No standard way. What you can do is open some class pointers in your favorite debugger (or cast the memory to bytes and log it to a file) and compare it and see if it makes sense. Even so, you have no guarantees that any of your data (or other pointers in the application) will not look similar enough (when cast as bytes) to confuse whatever code you like.
and whether that instance pointing to
the vtable inherits from a known base
class?
No again.
Here are some questions (you may have considered them already). Answers to these may give you more options, or may give us other ideas to propose:
how large is the code base? Is it feasible to introduce global changes, or is functionality to spread-around for that?
do you treat all pointers uniformly (that is: are there common points in your source code where you could plug in and add your own metadata?)
what can you change in your sourcecode? (If you have access to your memory allocation subroutines or could plug in your own for example you may be able to plug in your own metadata).
If different data types are cast to void* in various parts of your code, how do you decide later what is in those pointers? Can you use the code that discriminates the void* to decide if they are classes or not?
Does your code-base allow for refactoring methodologies? (refactoring in small iterations, by plugging in alternate implementations for parts of your code, then removing the initial implementation and testing everything)
Edit (proposed solution):
Do the following steps:
define a metadata (base) class
replace your memory allocation routines with custom ones which just refer to the standard / old routines (and make sure your code still works with the custom routines).
on each allocation, allocate the requested size + sizeof(Metadata*) (and make sure your code still works).
replace the first sizeof(Metadata*) bytes of your allocation with a standard byte sequence that you can easily test for (I'm partial to 0xDEADBEEF :D). Then, return [allocated address] + sizeof(Metadata*) to the application. On deallocation, take the recieved pointer, decrement it by `sizeof(Metadata*), then call the system / previous routine to perform the deallocation. Now, you have an extra buffer allocated in your code, specifically for metadata on each allocation.
In the cases you're interested in having metadata for, create/obtain a metadata class pointer, then set it in the 0xDEADBEEF zone. When you need to check metadata, reinterpret_cast<Metadata*>([your void* here]), decrement it, then check if the pointer value is 0xDEADBEEF (no metadata) or something else.
Note that this code should only be there for refactoring - for production code it is slow, error prone and generally other bad things that you do not want your production code to be. I would make all this code dependent on some REFACTORING_SUPPORT_ENABLED macro that would never allow your Metadata class to see the light of a production release (except for testing builds maybe).
I would say it is not possible without related reference (header declaration).
If you want to replace those void pointers to correct interface type, here is what I think to automate it:
Go through your codebase to get a list of all classes that has virtual functions, you could do this fast by writing script, like Perl
Write an function which take a void* pointer as input, and iterate over those classes try to dynamic_cast it, and log information if succeeded, such as interface type, code line
Call this function anywhere you used void* pointer, maybe you could wrap it with a macro so you could get file, line information easy
Run a full automation (if you have) and analyse the output.
The easier way would be to overload operator new for your particular base class. That way, if you know your void* pointers are to heap objects, then you can also with 100% certainty determine whether they're pointing to your object.
I have a large inherited C/C++ project. Are there any good tools or techniques to produce a report on the "sizeof" of all the datatypes, and a breakdown of the stack footprints of each function in such a project.
I'm curious to know why you want to do this, but that's merely a curiosity.
Determining the sizeof for every class used should be simple, unless they've been templated, in which case you'd have to check every instantiation, also.
Likewise, determining the per call sizeof on a function is simple: it's a sizeof on each passed parameter plus some function overhead.
To determine the full memory usage of the whole program, if it's not all statically defined, couldn't be done without a runtime profiler.
Writing a shell scrip that would collect all the class names into a file would be pretty simple. That file could be constructed as a .cpp file that was a series of calls to sizeof on each class. If the file also #included each header file, it could be compiled and run to get an output of the memory footprint of just the classes.
Likewise, culling all of the function definitions to see when they're not using reference or pointer arguments (ie copying the entire class instance onto the stack) should be pretty straight-forward.
All this goes to say that I know of no existing tool, but writing one shouldn't be difficult.
I'm not aware of any tools, but if you're working under MSVC you can use DIA SDK to extract size information from .PDB files. Sadly, this wont work for stack footprints IIRC.
I'm not sure if the concept of the stack footprint actually exists with modern compilers. That is to say, I think that determining the amount of stack space used depends on the branches taken, which in turn depends on input parameters, and in general requires solving the halting problem.
I am looking for the same information about stack footprint for functions, and I dont believe what warren said is true. Yes, part of what impacts the stack in a function is the parameters, but I've also found that every local variable in a function, regardless of the scoping of said variable, is used to determine the amount of stack space to reserve for the function.
In the particular poor code example I am working with, there are >200 local class instances, each guarded by if (blah-blah) clauses, but the stack space reserved is modified by these guarded local variables.
I know what I need is to be able to read the function prologue for each method to determine the amount of space being reserved for the function, now how would I do that....?