calling code stored in the heap from vc++ - c++

Imagine I am doing something like this:
void *p = malloc (1000);
*((char*)p) = some_opcode;
*((char*)p+1) = another_opcode; // for the sake of the example: the opcodes are ok
....
etc...
How can I define a function pointer to call p as if it was a function? (i'm using VC++ 2008 express).
Thanks

A comment wasn't enough space. Joe_Muc is correct. You should not stuff code into memory obtained by malloc or new. You will run into problems if you change the page properties of pages that Windows allocates.
This isn't a problem becuase using VirtualAlloc() and the related WIn32 APIs is every easy: call VirtualAlloc() and set the flProtect to [PAGE_EXECUTE_READWRITE][2]
Note, you should probably do three allocations, one guard page, the pages you need for your code, then another guard page. This will give you a little protection from bad code.
Also wrap calls to your generated code with structured exception handling.
Next, the Windows X86 ABI (calling conventions) are not well documented (I know, I've looked). There is some info here, here, here The best way to see how things work is to look at code generated by the compiler. This is easy to do with the \FA switches ( there are four of them).
You can find the 64-bit calling conventions here.
Also, you can still obtain Microsoft's Macro Assembler MASM here. I recommend writing your machine code in MASM and look at its output, then have your machine code generator do similar things.
Intel's and AMD's processor manuals are good references - get them if you don't have them.

Actually, malloc probably won't cut it. On Windows you probably need to call something like [VirtualAlloc](http://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspx) in order to get an executable page of memory.
Starting small:
void main(void)
{
char* p = (char*)VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
p[0] = (char)0xC3; // ret
typedef void (*functype)();
functype func = (functype)p;
(*func)();
}
The next step for playing nice with your code is to preserve the EBP register. This is left as an exercise. :-)
After writing this, I ran it with malloc and it also worked. That may be because I'm running an admin account on Windows 2000 Server. Other versions of Windows may actually need the VirtualAlloc call. Who knows.

If you have the right opcodes in place, calling can be as simple as casting to a function pointer and calling it.
typedef void (*voidFunc)();
char *p = malloc (1000);
p[0] = some_opcode;
p[1] = another_opcode; // for the sake of the example: the opcodes are ok
p[n] = // return opcode...
((voidFunc)p)();
Note though that unless you mark the page as executable, your processor may not let you execute code generated on the heap.

I'm also currently looking into executing generated code, and while the answers here didn't give me precisely what I needed, you guys sent me on the right track.
If you need to mark a page as executable on POSIX systems (Linux, BSD etc.), check out the mmap(2) function.

Related

Determine allocation call site over time (line of code) [duplicate]

I am looking for a way to track memory allocations in a C++ program. I am not interested in memory leaks, which seem to be what most tools are trying to find, but rather creating a memory usage profile for the application. Ideal output would be either a big list of function names plus number of maximum allocated bytes over time or better yet, a graphical representation of the heap over time. Horizontal axis is time, vertical axis heap space. Every function would get it's own color and draw lines according to allocated heap bytes. Bonus points for identifying allocated object types as well.
The idea is to find memory bottlenecks/to visualize what functions/threads consume the most memory and should be targetted for further optimization.
I have briefly looked at Purify, BoundsChecker and AQTime but they don't seem to be what I'm after. Valgrind looks suitable, however, I'm on Windows. Memtrack looks promising, but requires significant changes to the source code.
My google skills must have failed me, cause it doesn't seem to be such an uncommon request? All the needed information to create a tool like that should be readily available from the program's debug symbols plus runtime API calls - no?
Use Valgrind and its tool Massif. Its example output (a part of it):
99.48% (20,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->49.74% (10,000B) 0x804841A: main (example.c:20)
|
->39.79% (8,000B) 0x80483C2: g (example.c:5)
| ->19.90% (4,000B) 0x80483E2: f (example.c:11)
| | ->19.90% (4,000B) 0x8048431: main (example.c:23)
| |
| ->19.90% (4,000B) 0x8048436: main (example.c:25)
|
->09.95% (2,000B) 0x80483DA: f (example.c:10)
->09.95% (2,000B) 0x8048431: main (example.c:23)
So, you will get detailed information:
WHO allocated the memory (functions: g(), f(), and main() in above example); you also get complete backtrace leading to allocating function,
to WHICH data structure the memory did go (no data structures in above example),
WHEN it happened,
what PERCENTAGE of all allocated memory it is (g: 39.7%, f: 9.95%, main: 49.7%).
Here is Massif manual
You can track heap allocation as well as stack allocation (turned off by default).
PS. I just read that you're on Windows. I will leave the answer though, because it gives a picture of what you can get from a possible tool.
Microsoft have well documented memory tracking functions. However, for some reason they are not really well-known in the developer community. These are CRT debug functions. Good starting point will be CRT Debug Heap functions.
Check the following links for more details
Heap state reporting functions
Tracking heap allocation requests. Probably this is the functionality that you are looking for.
For a generic C++ memory tracker you will need to overload the following:
global operator new
global operator new []
global operator delete
global operator delete []
any class allocators
any in-place allocators
The tricky bit is getting useful information, the overloaded operators only have size information for allocators and memory pointers for deletes. One answer is to use macros. I know. Nasty. An example - place in a header which is included from all source files:
#undef new
void *operator new (size_t size, char *file, int line, char *function);
// other operators
#define new new (__FILE__, __LINE__, __FUNCTION__)
and create a source file with:
void *operator new (size_t size, char *file, int line, char *function)
{
// add tracking code here...
return malloc (size);
}
The above only works if you don't have any operator new defined at class scope. If you do have some at class scope, do:
#define NEW new (__FILE__, __LINE__, __FUNCTION__)
and replace 'new type' with 'NEW type', but that requires changing a lot of code potentially.
As it's a macro, removing the memory tracker is quite straightforward, the header becomes:
#if defined ENABLED_MEMORY_TRACKER
#undef new
void *operator new (size_t size, char *file, int line, char *function);
// other operators
#define NEW new (__FILE__, __LINE__, __FUNCTION__)
#else
#define NEW new
#endif
and the implementation file:
#if defined ENABLED_MEMORY_TRACKER
void *operator new (size_t size, char *file, int line, char *function)
{
// add tracking code here...
return malloc (size);
}
endif
Update: to the answer of #Skizz
Since C++20, we can use std::source_location instead of macros like __FILE__ and __LINE__.
(As this is a major simplification, I believe that it deserves a seperate answer).
On Xcode, you can use Instruments to track allocations, VM usage, and several other parameters. Mostly popular among iOS developers, but worth a try.
On Mac OS X, you can use the code profiling tool Shark to do this, IIRC.
"A graphical representation of the heap over time" - close to what you are looking for is implemented in Intel(R) Single Event API, details can be found in this article (its rather big to put it here).
It shows you timeline of per-block-size allocations and allows to add additional mark up to your code to understand the whole picture better.
The Visual Studio IDE has built-in heap profiling support (since 2015), which is probably the easiest to start with. It has graphical views of heap usage over time, and can track allocations by function/method.
Measure memory usage in Visual Studio
The CRT also has debug and profile support, which is more detailed and more low-level. You could track the data and plot the results using some other tool:
CRT Debug Heap Details
In particular, look at _CrtMemCheckpoint and related functions.

Overloading base types with a custom allocator, and its alternatives

So, this is a bit of an open question. But let's say that I have a large application which globally overrides the various new and delete operators so that they use home-brewed jemalloc-style arenas and custom alignments.
All fine and good, but I have been running into segfault issues because other C++-based DLLs and their dependencies also use the overloaded allocators when they shouldn't (namely LLVM), putting the little custom allocator to its knees (lack of memory and more stresses).
Testing workarounds, I have wrapped (and moved) those global operators into a class, and I made all base classes inherit from it. And well, that works for classes, but not for base types. That's the problem.
Given that C++ doesn't allow useful things like having separate allocators per namespace, or limiting the new operator per executable module, what is the best way of emulating this in base data types, where I can't directly subclass an int?
The obvious way is wrapping them in a custom template, but the problem is performance. Do I have to emulate all the array and indexing operations under a second layer just so that I can malloc from a different place without having to change the rest of the functional code? There's a better way?
P.S.: I have also been thinking about using special global new/delete operators with extra parameters, while leaving the standard ones alone. Thus ensuring that I am (well, my executable module is) the only one calling those global functions. It should be a simple search-and-replace.
Well, quick update. What I did in the end to 'solve' this conundrum is to manually detect if the code that called the overridden global allocators comes from the main executable module and conditionally redirect all the external new / delete calls to their corresponding malloc / free while still using the custom arena allocator for our own internal code.
How? After doing some R&D I found that this could be done by using the _ReturnAddress() built-in on MSVC and __builtin_extract_return_addr(__builtin_return_address(0)) on GCC/Clang; and I can say that it seems to work fine so far in production software.
Now, when some C++ code from our address space wants some memory we can see where it comes from.
But, how do we find out if that address is part of some other module in our process space or our own? We might need to find out both the base and end addresses of the main program, cache them at startup as globals, and check that the return address is within bounds.
All for extremely little overhead. But, our second problem is that retrieving the base address is different in every platform. After some research I found that things were more straightforward than expected:
In Windows/Win32 we can simply do this:
#include <windows.h>
#include <psapi.h>
inline void __initialize_base_address()
{
MODULEINFO minfo;
GetModuleInformation(GetCurrentProcess(), GetModuleHandle(NULL), &minfo, sizeof(minfo));
base_addr = (uintptr_t) minfo.lpBaseOfDll;
base_end = (uintptr_t) minfo.lpBaseOfDll + minfo.SizeOfImage;
}
In Linux there are a thousand ways of doing this, including linker globals and some debuggey (verbose and unreliable) ways of walking the process module table. I was looking at the linker map output and noticed that the _init and _fini functions always seem to wrap the rest of the .text section symbols. Sometimes it's hard to get to the simplest solution that works everywhere:
#include <link.h>
inline void __initialize_base_address()
{
void *handle = dlopen(0, RTLD_NOW);
base_addr = (uintptr_t) dlsym(handle, "_init");
base_end = (uintptr_t) dlsym(handle, "_fini");
dlclose(handle);
}
While in macOS things are even less documented and I had to cobble together my own thing using the Darwin kernel open-source code and tracking down some obscure low-level tools as reference. Keep in mind that _NSGetMachExecuteHeader() is just a wrapper for the internal _mh_execute_header linker global. If you need to do anything about parsing the Mach-O format and its structures then getsect.h is the way to go:
#include <mach-o/getsect.h>
#include <mach-o/ldsyms.h>
#include <crt_externs.h>
inline void __initialize_base_address()
{
size_t size;
void *ptr = getsectiondata(&_mh_execute_header, SEG_TEXT, SECT_TEXT, &size);
base_addr = (uintptr_t) _NSGetMachExecuteHeader();
base_end = (uintptr_t) ptr + size;
}
Another thing to keep in mind is that this some-other-cpp-module-is-using-our-internal-allocator-that-globally-overrides-new-causing-weird-bugs issue seems to be a problem in Linux and maybe macOS, I didn't have this issue in Windows, probably because no conflicting DLLs were loaded in the process, being mostly C API-based. I think, or maybe the platform uses different C++ runtimes for each module.
The main issue I had was caused by Mesa3D, which uses LLVM (pure C++ in and out) for many of their GLSL shader compilers and liked to gobble up big chunks of my small custom-tailored memory arena uninvited.
Rewriting a legacy program that is structurally dependent on these allocators was out of the question due to its sheer size and complexity, so this turned out to be the best way of making things work as expected.
It's only a few lines of optional, sneaky, extra per-platform code.

calling kernel32.dll function without including windows.h

if kernel32.dll is guaranteed to loaded into a process virtual memory,why couldn't i call function such as Sleep without including windows.h?
the below is an excerpt quoting from vividmachine.com
5. So, what about windows? How do I find the addresses of my needed DLL functions? Don't these addresses change with every service pack upgrade?
There are multitudes of ways to find the addresses of the functions that you need to use in your shellcode. There are two methods for addressing functions; you can find the desired function at runtime or use hard coded addresses. This tutorial will mostly discuss the hard coded method. The only DLL that is guaranteed to be mapped into the shellcode's address space is kernel32.dll. This DLL will hold LoadLibrary and GetProcAddress, the two functions needed to obtain any functions address that can be mapped into the exploits process space. There is a problem with this method though, the address offsets will change with every new release of Windows (service packs, patches etc.). So, if you use this method your shellcode will ONLY work for a specific version of Windows. Further dynamic addressing will be referenced at the end of the paper in the Further Reading section.
The article you quoted focuses on getting the address of the function. You still need the function prototype of the function (which doesn't change across versions), in order to generate the code for calling the function - with appropriate handling of input and output arguments, register values, and stack.
The windows.h header provides the function prototype that you wish to call to the C/C++ compiler, so that the code for calling the function (the passing of arguments via register or stack, and getting the function's return value) can be generated.
After knowing the function prototype by reading windows.h, a skillful assembly programmer may also be able to write the assembly code to call the Sleep function. Together with the function's address, these are all you need to make the function call.
With some black magic you can ;). there have been many custom implementations of GetProcAddress, which would allow you to get away with not needing windows.h, this however isn't be all and end all and could probably end up with problems due to internal windows changes. Another method is using toolhlp to enumerate the modules in the process to get kernel.dll's base, then spelunks its PE for the EAT and grab the address of GetProcAddress. from there you just need function pointer prototypes to call the addresses correctly(and any structure defs needed), which isn't too hard meerly labour intensive(if you have many functions), infact under windows xp this is required to disable DEP due to service pack differencing, ofc you need windows.h as a reference to get this, you just don't need to include it.
You'd still need to declare the function in order to call it, and you'd need to link with kernel32.lib. The header file isn't anything magic, it's basically just a lot of function declarations.
I can do it with 1 line of assembly and then some helper functions to walk the PEB
file by hard coding the correct offsets to different members.
You'll have to start here:
static void*
JMIM_ASM_GetBaseAddr_PEB_x64()
{
void* base_address = 0;
unsigned long long var_out = 0;
__asm__(
" movq %%gs:0x60, %[sym_out] ; \n\t"
:[sym_out] "=r" (var_out) //:OUTPUTS
);
//: printf("[var_out]:%d\n", (int)var_out);
base_address=(void*)var_out;
return( base_address );
}
Then use windbg on an executable file to inspect the data structures on your machine.
A lot of the values you'll be needing are hard to find and only really documented by random hackers. You'll find yourself on a lot of malware writing sites digging for answers.
dt nt!_PEB -r #$peb
Was pretty useful in windbg to get information on the PEB file.
There is a full working implementation of this in my game engine.
Just look in: /DEP/PEB2020 for the code.
https://github.com/KanjiCoder/AAC2020
I don't include <windows.h> in my game engine. Yet I use "GetProcAddress"
and "LoadLibraryA". Might be in-advisable to do this. But my thought was the more
moving parts, the more that can go wrong. So figured I'd take the "#define WIN32_LEAN_AND_MEAN" to it's absurd conclusion and not include <windows.h> at all.

Does this have anything to do with endian-ness?

For this code:
#include<stdio.h>
void hello() { printf("hello\n"); }
void bye() { printf("bye\n"); }
int main() {
printf("%p\n", hello);
printf("%p\n", bye);
return 0;
}
output on my machine:
0x80483f4
0x8048408
[second address is bigger in value]
on Codepad
0x8048541
0x8048511
[second address is smaller in value]
Does this have anything to do with endian-ness of the machines? If not,
Why the difference in the ordering of the addresses?
Also, Why the difference in the difference?
0x8048541 - 0x8048511 = 0x30
0x8048408 - 0x80483f4 = 0x14
Btw, I just checked. This code (taken from here) says that both the machines are Little-Endian
#include<stdio.h>
int main() {
int num = 1;
if(*(char *)&num == 1)
printf("Little-Endian\n");
else
printf("Big-Endian\n");
return 0;
}
No, this has nothing to do with endianness. It has everything to do with compilers and linkers being free to order function definitions in memory pretty much as they see fit, and different compilers choosing different memory layout strategies.
It has nothing to do with endinanness, but with the C++ standard. C++ isn't required to write functions in the order you see them to disk (and think about cross-file linking and even linking other libraries, that's just not feasable), it can write them in any order it wishes.
About the difference between the actual values, one compiler might add guards around a block to prevent memory overrides (or other related stuff, usually only in debug mode). And there's nothing preventing the compiler from writing other functions between your 2 functions. Keep in mind even a simple hello world application comes with thousands of bytes of executable code.
The bottom line is: never assume anything about how things are positioned in memory. Your assumptions will almost always be wrong. And why even assume? There's nothing to be gained over writing normal, safe, structured code anyway.
The location and ordering of functions is extremely specific to platform, architecture, compiler, compiler version and even compiler flags (especially those).
You are printing function addresses. That's purely in the domain of the linker, the compiler doesn't do anything that's involved with creating the binary image of the program. Other than generating the blobs of machine code for each function. The linker arranges those blobs in the final image. Some linkers have command line options that affect the order, it otherwise rarely matters.
Endian-ness cannot affect the output of printf() here. It knows how to interpret the bytes correctly if the pointer value was generated on the same machine.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.