Generating Shellcode from an exe?

Generating Shellcode from an exe? - c++

So recently I have been learning about low level programming languages (such as Assembly, which from my understanding is just symbolic binary) and have came across Shellcoding (e.g. "\x4D..." etc). I found out that you can input Shellcode into a C/C++ application and then execute it - my question is, is it possible to generate Shellcode from an existing exe application and then use this generated Shellcode in a C/C++ application? Have I misunderstood the possibilities of Shellcoding? Many thanks - a person with very limited knowledge on low level programming

is it possible to generate Shellcode from an existing exe application and then use this generated Shellcode in a C/C++ application
Answer: No. Shellcode is base-independed, executable PE file has a huge amount of headers, etc, you cant execute it before doing some actions/
Shellcode - it is a very big question.
First of all, you need to know that function adresses of external libraries such as kernel32, user32 libs, etc, is stored in Import Adress Table, that filled by windows-loader in startup time. All memory workings is doing by addresses, that computing in compile stage. So you need to find addreses by yourself.
To call functions from shellcode you have to have your own loader of function addresses. This loader must to load kernel32.dll library, search for GetProcAddress function and fill IAT
You dont know what address your shellcode will be loaded, you can know it from such code, calling "delta-offset"
call delta
delta:
pop ebp
sub ebp,offset delta
Now in ebp an offset to real addreses, so to get a variable of function address you need to plus the offset, example:
lea eax, [variable]
add eax, ebp; adding a delta-offset
mov ecx, dword ptr DS:[eax]
To compile code for future use you should use something like FASM, after compiling use WinHex editor -> copy -> copy all -> GREP C source
And you will get something like "\x00\x28" etc, to call it you need to set Execution rights to your shellcode array and change an EIP by command handlers like jmp/call/etc
There are an example that shows in Windows-system Hello, World MessageBox
# include <stdlib.h>
# include <stdio.h>
# include <string.h>
# include <windows.h>
int
main(void)
{
char *shellcode = "\x33\xc9\x64\x8b\x49\x30\x8b\x49\x0c\x8b"
"\x49\x1c\x8b\x59\x08\x8b\x41\x20\x8b\x09"
"\x80\x78\x0c\x33\x75\xf2\x8b\xeb\x03\x6d"
"\x3c\x8b\x6d\x78\x03\xeb\x8b\x45\x20\x03"
"\xc3\x33\xd2\x8b\x34\x90\x03\xf3\x42\x81"
"\x3e\x47\x65\x74\x50\x75\xf2\x81\x7e\x04"
"\x72\x6f\x63\x41\x75\xe9\x8b\x75\x24\x03"
"\xf3\x66\x8b\x14\x56\x8b\x75\x1c\x03\xf3"
"\x8b\x74\x96\xfc\x03\xf3\x33\xff\x57\x68"
"\x61\x72\x79\x41\x68\x4c\x69\x62\x72\x68"
"\x4c\x6f\x61\x64\x54\x53\xff\xd6\x33\xc9"
"\x57\x66\xb9\x33\x32\x51\x68\x75\x73\x65"
"\x72\x54\xff\xd0\x57\x68\x6f\x78\x41\x01"
"\xfe\x4c\x24\x03\x68\x61\x67\x65\x42\x68"
"\x4d\x65\x73\x73\x54\x50\xff\xd6\x57\x68"
"\x72\x6c\x64\x21\x68\x6f\x20\x57\x6f\x68"
"\x48\x65\x6c\x6c\x8b\xcc\x57\x57\x51\x57"
"\xff\xd0\x57\x68\x65\x73\x73\x01\xfe\x4c"
"\x24\x03\x68\x50\x72\x6f\x63\x68\x45\x78"
"\x69\x74\x54\x53\xff\xd6\x57\xff\xd0";
DWORD why_must_this_variable;
BOOL ret = VirtualProtect (shellcode, strlen(shellcode),
PAGE_EXECUTE_READWRITE, &why_must_this_variable);
if (!ret) {
printf ("VirtualProtect\n");
return EXIT_FAILURE;
}
printf("strlen(shellcode)=%d\n", strlen(shellcode));
((void (*)(void))shellcode)();
return EXIT_SUCCESS;
}
You probably looking for RunPE algorithm. This algorithm can execute PE executable inside another. You are openning another process, copying sections, fill IAT-table and resuming target process from new entrypoint. It is a code injection tecnhiques, used my a malware. So i will not explain how to realise it

Shellcode is machine code that's used as the payload of an exploit (such as a buffer overflow). Depending on the exploit it's used with, it may have limitations such as a maximum length, or certain byte values (e.g. zero) not allowed. There's no one-size-fits-all answer to what shellcode can be.
In general, though: yes, it's possible in principle to embed a complete program in shellcode. It could take the form of a small wrapper (probably hand-written in assembly) that writes the program to a new .exe file and then runs it, or it could use more-sophisticated techniques to replace the current program in memory. There are probably automated tools to create this sort of shellcode, though I don't know of any specifically.
However, the tone of your question makes me think you might be misunderstanding something important:
I found out that you can input Shellcode into a C/C++ application and then execute it
This is a bug, not a feature. Being able to inject new code into a running program, where the program isn't specifically meant to allow that, is a major security flaw. This sort of thing has been the root of a great many security breaches over the span of decades, and developers spend a great deal of effort trying to prevent it from happening.
If it's possible to inject shellcode into a program, the program is broken.

Related

How to execute separate compiled binary file from inside program on MCU?

I have an MCU (say an STM32) running, and I would like to 'pass' it a separately compiled binary file over UART/USB and use it like calling a function, where I can pass it data and collect its output? After its complete, a second, different binary would be sent to be executed, and so on.
How can I do this? Does this require an OS be running? I'd like to avoid that overhead.
Thanks!

It is somewhat specific to the mcu what the exact call function is but you are just making a function call. You can try the function pointer thing but that has been known to fail with thumb (on gcc)(stm32 uses the thumb instruction set from arm).
First off you need to decide in your overall system design if you want to use a specific address for this code. for example 0x20001000. or do you want to have several of these resident at the same time and want to load them at any one of multiple possible addresses? This will determine how you link this code. Is this code standalone? with its own variables or does it want to know how to call functions in other code? All of this determines how you build this code. The easiest, at least to first try this out, is a fixed address. Build like you build your normal application but based in a ram address like 0x20001000. Then you load the program sent to you at that address.
In any case the normal way to "call" a function in thumb (say an stm32). Is the bl or blx instruction. But normally in this situation you would use bx but to make it a call need a return address. The way arm/thumb works is that for bx and other related instructions the lsbit determines the mode you switch/stay in when branching. Lsbit set is thumb lsbit clear is arm. This is all documented in the arm documentation which completely covers your question BTW, not sure why you are asking...
Gcc and I assume llvm struggles to get this right and then some users know enough to be dangerous and do the worst thing of ADDing one (rather than ORRing one) or even attempting to put the one there. Sometimes putting the one there helps the compiler (this is if you try to do the function pointer approach and hope the compiler does all the work for you *myfun = 0x10000 kind of thing). But it has been shown on this site that you can make subtle changes to the code or depending on the exact situation the compiler will get it right or wrong and without looking at the code you have to help with the orr one thing. As with most things when you need an exact instruction, just do this in asm (not inline please, use real) yourself, make your life 10000 times easier...and your code significantly more reliable.
So here is my trivial solution, extremely reliable, port the asm to your assembly language.
.thumb
.thumb_func
.globl HOP
HOP:
bx r0
I C it looks like this
void HOP ( unsigned int );
Now if you loaded to address 0x20001000 then after loading there
HOP(0x20001000|1);
Or you can
.thumb
.thumb_func
.globl HOP
HOP:
orr r0,#1
bx r0
Then
HOP(0x20001000);
The compiler generates a bl to hop which means the return path is covered.
If you want to send say a parameter...
.thumb
.thumb_func
.globl HOP
HOP:
orr r1,#1
bx r1
void HOP ( unsigned int, unsigned int );
HOP(myparameter,0x20001000);
Easy and extremely reliable, compiler cannot mess this up.
If you need to have functions and global variables between the main app and the downloaded app, then there are a few solutions and they involve resolving addresses, if the loaded app and the main app are not linked at the same time (doing a copy and jump and single link is generally painful and should be avoided, but...) then like any shared library you need to have a mechanism for resolving addresses. If this downloaded code has several functions and global variables and/or your main app has several functions and global variables that the downloaded library needs, then you have to solve this. Essentially one side has to have a table of addresses in a way that both sides agree on the format, could be as a simple array of addresses and both sides know which address is which simply from position. Or you create a list of addresses with labels and then you have to search through the list matching up names to addresses for all the things you need to resolve. You could for example use the above to have a setup function that you pass an array/structure to (structures across compile domains is of course a very bad thing). That function then sets up all the local function pointers and variable pointers to the main app so that subsequent functions in this downloaded library can call the functions in the main app. And/or vice versa this first function can pass back an array structure of all the things in the library.
Alternatively a known offset in the downloaded library there could be an array/structure for example the first words/bytes of that downloaded library. Providing one or the other or both, that the main app can find all the function addresses and variables and/or the caller can be given the main applications function addresses and variables so that when one calls the other it all works... This of course means function pointers and variable pointers in both directions for all of this to work. Think about how .so or .dlls work in linux or windows, you have to replicate that yourself.
Or you go the path of linking at the same time, then the downloaded code has to have been built along with the code being run, which is probably not desirable, but some folks do this, or they do this to load code from flash to ram for various reasons. but that is a way to resolve all the addresses at build time. then part of the binary in the build you extract separately from the final binary and then pass it around later.
If you do not want a fixed address, then you need to build the downloaded binary as position independent, and you should link that with .text and .bss and .data at the same address.
MEMORY
{
hello : ORIGIN = 0x20001000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > hello
.rodata : { *(.rodata*) } > hello
.bss : { *(.bss*) } > hello
.data : { *(.data*) } > hello
}
you should obviously do this anyway, but with position independent then you have it all packed in along with the GOT (might need a .got entry but I think it knows to use .data). Note, if you put .data after .bss with gnu at least and insure, even if it is a bogus variable you do not use, make sure you have one .data then .bss is zero padded and allocated for you, no need to set it up in a bootstrap.
If you build for position independence then you can load it almost anywhere, clearly on arm/thumb at least on a word boundary.
In general for other instruction sets the function pointer thing works just fine. In ALL cases you simply look at the documentation for the processor and see the instruction(s) used for calling and returning or branching and simply use that instruction, be it by having the compiler do it or forcing the right instruction so that you do not have it fail down the road in a re-compile (and have a very painful debug). arm and mips have 16 bit modes that require specific instructions or solutions for switching modes. x86 has different modes 32 bit and 64 bit and ways to switch modes, but normally you do not need to mess with this for something like this. msp430, pic, avr, these should be just a function pointer thing in C should work fine. In general do the function pointer thing then see what the compiler generates and compare that to the processor documentation. (compare it to a non-function pointer call).
If you do not know these basic C concepts of function pointer, linking a bare metal app on an mcu/processor, bootstrap, .text, .data, etc. You need to go learn all that.
The times you decide to switch to an operating system are....if you need a filesystem, networking, or a few things like this where you just do not want to do that yourself. Now sure there is lwip for networking and some embedded filesystem libraries. And multithreading then an os as well, but if all you want to do is generate a branch/jump/call instruction you do not need an operating system for that. Just generate the call/branch/whatever.

Loading and execution a fully linked binary and loading and calling a single function (and returning to the caller) are not really the same thing. The latter is somewhat complicated and involves "dynamic linking", where the code effectively and secures in the same execution environment as the caller.
Loading a complete stand-alone executable in the other hand is more straightforward and is the function of a bootloader. A bootloader loads and jumps to the loaded executable which then establishes it's own execution environment. Returning to the bootloader requires a processor reset.
In this case it would make sense to have the bootloader load and execute code in RAM if you are going to be frequently loading different code. However be aware that on Harvard Architecture devices like STM32, RAM execution may slow down execution because data and instruction fetch share the same bus.
The actual implementation of a bootloader will depend on the target architecture, but for Cortex-M devices is fairly straightforward and dealt with elsewhere.
STM32 actually includes an on-chip bootloader (you need to configure the boot source pins to invoke it), which I believe can load and execute code in RAM. It is normally used to load a secondary bootloader to load and program flash, but it can be used for loading any code.
You do need to build and link your code to run from RAM at the address tle loader locates it, or if supported build position-indeoendent code that can run from anywhere.

C++ self erasing code [duplicate]

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?

Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!

No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.

It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.

This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?

I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}

This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.

The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.

In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.

What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.

I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.

There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.

Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.

Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));

below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};

The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.

It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...

int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

Why does pre_c_init access memory outside of the defined program segments?

While looking through the assembly for a console "hello world" program (compiled using the visual c++ compiler), I came across this:
pre_c_init proc near
.text:00401AFE mov eax, 5A4Dh
.text:00401B03 cmp ds:400000h, ax
The code above seems to be accessing memory that isn't filled with anything in particular: All segments start at 0x401000 or even further down in the file. (The image base is at 0x400000, but the first segment is at 0x401000).
I used OllyDbg to see what the actual value at 0x400000 is, and every single time it's the same as in the code (0x5A4D). What's going on here?

5A4D is "MZ" in little-endian ASCII, and MZ is the signature of MS-DOS and, more recently, PE executables.
The comparison checks whether the executable has been mapped at the default base address, 0x400000. This, I believe, is used to determine whether it is necessary to perform relocation.
This is discussed further in the following thread: Why does PE need a relocation table?

Corrupted offset in call instruction

The last few days have been spent debugging a very strange problem. An application built for i386 running on Windows crashed, with the top of the callstack completely corrupted and the instruction pointer in a nonsense location.
After some effort, I rebuilt the callstack and was able to determine how the IP ended up in the nonsense location. An instruction in boost shared pointer code attempts to call a function defined in my DLLs import address table using an incorrect offset. The instruction looks like:
call dword ptr [nonsense offset into import address table]
As a result, execution ended up in a bad location that was, unfortunately, executable. Execution then proceeded, gobbling up the top of the stack until eventually crashing.
By launching the identical application on my PC, and stepping into the problematic code, I can find the same call instruction and see it's supposed to by calling msvc100's 'new' operator.
Further comparing the minidump from the client's PC to my PC, I found that my PC was calls a function with an offset of 0x0254 into the address table. On the clients PC, the code is trying to invoke a function with an offset of 0x8254.
What's even more confusing is that this offset is not coming from a register or another memory location. The offset is a constant in the disassembly. So, the disassembly looks like:
call dword ptr [ 0x50018254 ]
not like:
call dword ptr [ edx ]
Does anyone know how this might happen?

That's a single bit flip:
0x0254 = 0b0000001001010100
0x8254 = 0b1000001001010100
Perhaps corrupt memory, corrupt disk, gamma ray from the sun...?
If this specific case is reproducible and their on-disk binary matches yours, I'd investigate further. If it's not specifically reproducible, I'd encourage the client to run some machine diagnostics.

Thats seems to me a hardware error for sure, mainly memory error. As #Hostile_Fork pointed out, is just a bit flip.
Does your memory have ECC feature? it it does it, make sure is enabled. I would pass a burn-in memory test with memtest86 to see what happens, I bet you have a faulty memory chip, doesn't look like a bug.

Getting The Size of a C++ Function

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?

Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!

No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.

It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.

This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?

I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}

This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.

The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.

In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.

What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.

I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.

There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.

Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.

Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));

below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};

The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.

It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...

int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Generating Shellcode from an exe? - c++

Related

How to execute separate compiled binary file from inside program on MCU?

C++ self erasing code [duplicate]

Why does pre_c_init access memory outside of the defined program segments?

Corrupted offset in call instruction

Getting The Size of a C++ Function

Categories

Resources