disassemble c++ dll roughly to machine code - c++

I know that you cannot have a C++ dll and expect to have it as a source code, am I right? But at the same time, when I revert a C++ dll to raw data, using DUMPBIN, then there are some stuff which can be done to interpret it, right? For example, we know basic mappings for most popular operators and all.
Is there a tool that can roughly interpret that raw machine data to something that looks more to a code or instruction? The issue is that I do not have enough time to write it myself to scratch! so I am looking for a tool or something to do it.

You need a disassembler, which is a tool that takes a binary object (like a compiled file, a library, or even an executable) and tries to reinterpret its content as assembly.
With a tool like that you can usually retrieve the names of the internal symbols (unless of course the binary image is stripped, but this is not the case for dynamically liked libraries).
Also for C++ this is a bit difficult because of name mangling.
Try to give a glance to Objdump.

YOu can disassemble the code sections of the DUMPBIN /DISASS - but if you want "code" like C++, then you need a decompiler. However, they are far from great, and often make quite unreadable code - yes, it's something that you can feed into a compiler, but it's hardly what I'd call "human readable".

In my biased opinion, the best disassembler out there is IDA (Interactive Disassembler). It is somewhat expensive since it's targeted towards professional use, but there's a freeware (older) version you can try:
http://www.hex-rays.com/products/ida/support/download_freeware.shtml
The Hex-Rays Decompiler (an add-on for IDA) can produce C pseudocode from the disassembly, which can in many cases be recompiled again.

Related

C++ add new code during runtime

I am using C++ (in xcode and code::blocks), I don't know much.
I want to make something compilable during runtime.
for eg:
char prog []={"cout<<"helloworld " ;}
It should compile the contents of prog.
I read a bit about quines , but it didn't help me .
It's sort of possible, but not portably, and not simply.
Basically, you have to write the code out to a file, then
compile it to a dll (invoking the compiler with system), and
then load the dll. The first is simple, the last isn't too
difficult (but will require implementation specific code), but
the middle step can be challenging: obviously, it only works if
the compiler is installed on the system, but you have to find
where it is installed, verify that it is the same version (or
at least a version which generates binary compatible code),
invoke it with the same options that were used when your code
was compiled, and process any errors.
C++ wasn't designed for this. (Compiled languages generally
aren't.)
The short answer is "no, you can't do that". C and C++ were never designed to do this.
That's pretty much also the long answer to the actual question, but I'll expand a bit on a few ideas.
The code, as compiled by the compiler is pretty certainly not trivial to add things to. There are a few techniques that can be used to "add more code" to a program:
Add a dynamic shared library (DLL), which contains code that has been compiled separately to the existing code. You could of course also have code in your program to output some code, compile this code with the compiler, link it into a dynamic library, and load it in your code.
You could build your own little code-generator that generates machine code in a chunk of memory. Note that you probably need to call a "special" memory allocation function, as "normal" memory allocations are typically not allowed to be executed - you need to allocate "with execute permission" - VirtualAlloc in Windows does have such a flag, and mmap in Linux/Unix flavours does too. And of course, you pretty much have to "be a compiler" to achieve this.
You could naturally also invent your own interpreted language, which would allow your program to load in for example a text-file with commands/instructions to be executed, or contain text inside the program for execution with this language.
But like I said to start with, this is not what C and C++ (and most other compiled languages) were meant for, so it's not going to be as simple as "stick some C++ code in a string, and make it run".
It depends why you want to do this.
If it's for efficiency reasons - you know what a function does only at run time, but it has to be very efficient - then what was already suggested (writing to a file, compiling to a dll / so and dynamically loading it) is your best option.
BUT if the reason you want this is to allow for user-input behaviour, say a general function your read from a database (behaviour or a unit ingame? value of a field in a plot?) - or more generally you just want to change / augment behaviour at runtime with little concern for efficiency, I recommend using an outside scripting language like lua, which easily interacts with your compiled C++ code.
The C and C++ languages compile to binary machine code, unlike Java and C# which generate instructions for a 'virtual machine' or interpreted scripting languages such as JavaScript. The compilation of C++ is performed by a separate executable, the compiler, which is not incorporated into the resulting executable.
So the language does not have any built in "eval" capability to translate further code once compilation is finished.
It's not uncommon for new C/C++ programmers to think they need to do this, but they typically don't. Perhaps you could expand further on what you're actually looking to do.
But if you do actually need to be able to do this, your options are:
Write code to compile a new executable with the new code and then run the resulting program.
Write a simple parser and "virtual machine" of your own,
Look at incorporating an embedded scripting/interpreted language such as Lua,
Try and wrap your head around integrating CINT,
See also: Scripting language for C++

Simple C++ source instrumentation?

I want to use Shiny on a large-ish C++ code base, but I'd rather not add the required PROFILE_FUNC() calls to my source. I figure it's easy enough to write a script that for each source file, regex-searches for function definitions, adds a macro call just after the opening bracket and pipes the result to g++; but that seems an awfully obvious source-code instrumentation case, so much so I find it hard to believe no-one has come up with a better solution already.
Unfortunately, searching around I could only find references to LLVM / clang instrumentation and the odd research tool, which look like overly complicated solutions to my comparatively simple problem. In fact, there seems to be no straightforward way to perform simple automated code edits to C/C++ code just prior to compilation.
Is that so? Or am I missing something?
Update: I forgot to mention this "C++ code base" is a native application I am porting to Android. So I can use neither gprof (which isn't available on Android), Valgrind (which requires an older version of the NDK than what i'm using) nor the android-ndk-profiler (which is for dynamic libraries loaded by Android Activities, either Java or native, not plain executables). Hence my looking into Shiny.
Update 2: Despite previous claims I actually managed to build Valgrind on Android NDK r8e, so I settled on using it instead of Shiny. However I still think the original question is valid: isn't there any straightforward tool for effecting simple compile-time edits to C / C++ source files – some sort of macro preprocessor on steroids?
You can consider gprof or valgrind. If memory serves, gprof uses instrumentation and valgrind is a sampling-based profiler. Neither of them requires you to annotate source code.
You can use the android ndk profiler to profile C/C++ code
More info here
http://code.google.com/p/android-ndk-profiler/
You use gprof to analyse the results

Is it possible to decompile C++ Builder exe? Is C++ Builder exe safe?

Is it possible to decompile C++ Builder exe?
Is C++ Builder safe programming tools or anyone can decompile it and see the code?
The short answer, yes, it can be decompiled, and it's not "safe". Anything ran on a computer can be disassembled and from that inspected by reading the disassembly. Decompiling would mean restoring even some of the original compiled source code - which indeed is possible, to some extent. After all, it is "just" about writing a program which can translate assembly to the desired language. If a human can do that, then a program can do that too, because it is only about applying known rules and logic to translate the program from different representation/language to another. However, it is not just that simple...
Lots of information (like source files, variable names, some unused code, comments etc.) gets lost in the compilation process. This is further worsened by compiler optimizations which usually make the resulting disassembly near unreadable in some cases. As such, the decompiled source code can only give mere clues about the implementation details and mainly just the logic, not the actual source code used to build the project.
Please note that this has near nothing to do with any form of "safety" or security of a program itself. Any program can be disassembled in a way or another, any logic behind a working program can be inspected and reverse-engineered. There can be no secrets inside a program, nothing can be hidden if it can be run.
It is often much easier to disassemble a piece of executable and work through its logic in assembly, than trying to rely on very vague and usually broken reconstruct in high-level language such as C which many such decompilers still produce. Sometimes though, tools can produce readable and very clear high-level representations by disassembling, but they are often the simple cases and short excerpts of code.
The bottom line is, that you don't need a decompiler to inspect, reverse-engineer and understand a target program. All one needs is the access to the executable, a disassembler and understanding of assembly language. There is no way to avoid this fact, and it is very rarely a real problem.

print the code of a function in a DLL

I want to print the code of a function in a DLL.
I loaded the dll, I have the name of the desired function, what's next?
Thank you!
Realistically, next is getting the code. What you have in the DLL is object code -- binary code in the form ready for the processor to execute, not ready to be printed.
You can disassemble what's in the DLL. If you're comfortable working with assembly language, that may be useful, but it's definitely not the original source code (nor probably anything very close to it either). If you want to disassemble it, loading it in your program isn't (usually) a very good starting point. Try opening a VS command line and using dumpbin /disasm yourfile.dll. Be prepared for a lot of output unless the DLL in question is really tiny.
Your only option to retrieve hints about the actual implemented functionality of said function inside the DLL is to reverse engineer whatever the binary representation of assembly happens to be. What this means is that you pretty much have to use a disassembler(IDA Pro, or debugger, e.g. OllyDbg) to translate the opcodes to actual assembly mnemonics and then just work your way through it and try to understand the details of how it functions.
Note, that since it is compiled from C/C++ there is lots and lots of data lost in the process due to optimization and the nature of the process; the resulting assembly can(and probably will) seem cryptic and senseless, but it still does it's job the exact same way as the programmer programmed it in higher level language. It won't be easy. It will take time. You will need luck and nerves. But it IS doable. :)
Nothing. A DLL is compiled binary code; you can't get the source just by downloading it and knowing the name of the function.
If this was a .NET assembly, you might be able to get the source using reflection. However, you mentioned C++, so this is doubtful.
Check out this http://www.cprogramming.com/challenges/solutions/self_print.html and this Program that prints its own code? and this http://en.wikipedia.org/wiki/Quine_%28computing%29
I am not sure if it will do what you want, but i guess it may help you.

Learning to read GCC assembler output

I'm considering picking up some very rudimentary understanding of assembly. My current goal is simple: VERY BASIC understanding of GCC assembler output when compiling C/C++ with the -S switch for x86/x86-64.
Just enough to do simple things such as looking at a single function and verifying whether GCC optimizes away things I expect to disappear.
Does anyone have/know of a truly concise introduction to assembly, relevant to GCC and specifically for the purpose of reading, and a list of the most important instructions anyone casually reading assembly should know?
You should use GCC's -fverbose-asm option. It makes the compiler output additional information (in the form of comments) that make it easier to understand the assembly code's relationship to the original C/C++ code.
If you're using gcc or clang, the -masm=intel argument tells the compiler to generate assembly with Intel syntax rather than AT&T syntax, and the --save-temps argument tells the compiler to save temporary files (preprocessed source, assembly output, unlinked object file) in the directory GCC is called from.
Getting a superficial understanding of x86 assembly should be easy with all the resources out there. Here's one such resource: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html .
You can also just use disasm and gdb to see what a compiled program is doing.
I usually hunt down the processor documentation when faced with a new device, and then just look up the opcodes as I encounter ones I don't know.
On Intel, thankfully the opcodes are somewhat sensible. PowerPC not so much in my opinion. MIPS was my favorite. For MIPS I borrowed my neighbor's little reference book, and for PPC I had some IBM documentation in a PDF that was handy to search through. (And for Intel, mostly I guess and then watch the registers to make sure I'm guessing right! heh)
Basically, the assembly itself is easy. It basically does three things: move data between memory and registers, operate on data in registers, and change the program counter. Mapping between your language of choice and the assembly will require some study (e.g. learning how to recognize a virtual function call), and for this an "integrated" source and disassembly view (like you can get in Visual Studio) is very useful.
"casually reading assembly" lol (nicely)
I would start by following in gdb at run time; you get a better feel for whats happening. But then maybe thats just me. it will disassemble a function for you (disass func) then you can single step through it
If you are doing this solely to check the optimizations - do not worry.
a) the compiler does a good job
b) you wont be able to understand what it is doing anyway (nobody can)
Unlike higher-level languages, there's really not much (if any) difference between being able to read assembly and being able to write it. Instructions have a one-to-one relationship with CPU opcodes -- there's no complexity to skip over while still retaining an understanding of what the line of code does. (It's not like a higher-level language where you can see a line that says "print $var" and not need to know or care about how it goes about outputting it to screen.)
If you still want to learn assembly, try the book Assembly Language Step-by-Step: Programming with Linux, by Jeff Duntemann.
I'm sure there are introductory books and web sites out there, but a pretty efficient way of learning it is actually to get the Intel references and then try to do simple stuff (like integer math and Boolean logic) in your favorite high-level language and then look what the resulting binary code is.