Is there any way to decompile Linux .so? - c++

Is there any way to decompile Linux .so?

There are decompilers, but a decompiler might not emit code in the same language that the original program was written in.
There are also disassemblers, which will reassemble the machine code into assembly.
The Decompilation Wiki may be a good source of additional information.

You can disassemble the code with objdump(1) for example.

Related

Disassemble IAR 8051 with debug information

I am developing 8051 firmware in a project and have to use IAR as the toolchain. The build system is CMake. I cannot use the IAR IDE.
In order to optimize my source code I want to look at the disassembly of the resulting binary, preferably with labels. Is there any way to make xlink output something that I can analyse? I know that the IAR IDE debugger has a debug view, but I cannot use the IDE. It seems xlink can output a lot of file formats, but which ones allow to extract debug information on command line?
The linker is commonly not the tool to produce assembly listings. It generally only knows of bytes, sections, references to be resolved, addresses to be located, output formats to generate, and so on. Only few linkers know how to patch compiler/assembler generated listings with the final addresses.
So you would need to look at the tool that knows about machine code, the compiler or the assembler. Unfortunately IAR does not seem to provide a disassembler.
A quick web research on "iar 8051 compiler option" revealed this compiler guide. And another quick look in the table of contents led to the chapter "Descriptions of compiler options", which describes among others the option -l. The guide says:
Use this option to generate an assembler or C/C++ listing to a file. Note that this option can be used one or more times on the command line.
This should be enough to see the generated assembly. If you want the allocated addresses after linking, the linker can provide that.
Anyway, as a last resort you would like to check out decent disassemblers like Ghidra.

Can Winelib link a DLL directly to the ELF executable?

There is a DLL (no source code, but no fancy stuff expected inside, hopefully). Going to write a Linux application to use it. So, GNU all the way: native Linux gcc/gdb/ELF, etc.
I've found here on SO some solutions: with WineLib it's possible to write a code that have access to the win32 LoadLibrary function, and that code still compiles into ELF binary. A bit of API forwarding and here is a *.so file that calls LoadLibrary on the dll and exposes its functions.
Is it correct?
Is it possible to automate it? Is there an example with winedump and winegcc that are probably the tools for this job?
Sounds all perfectly reasonable. The DLL format is properly ancient, and not excessively complex (it had to work on the original 8086 CPU, and it got simpler with 32 bits Windows). Code is just that, x86 instructions, and data may be even more boring.
However, it sounds also very specialized, which probably explains why I've never heard of an actual implementation of this idea.

How make my GDB understand new instructions?

I implemented some additional x86 instructions on QEMU for research purpose.
To provide debugging facility for these newly added instructions,
I want GDB understand my new instructions when it debugs binary.
(Now it appears as bad instructions...)
Is there any method that i can do it without modifying GDB source code?
Such as inserting modules... or whatever. Thanks:)
gdb relies on the opcodes library to know how to disassemble. So, to see your new instructions, you simply have to modify this library. opcodes lives in the gdb source tree.

binary generation from LLVM

How does one generate executable binaries from the c++ side of LLVM?
I'm currently writing a toy compiler, and I'm not quite sure how to do the final step of creating an executable from the IR.
The only solution I currently see is to write out the bitcode and then call llc using system or the like. Is there a way to do this from the c++ interface instead?
This seems like it would be a common question, but I can't find anything on it.
LLVM does not ship the linker necessary to perform this task. It can only write out as assembler and then invoke the system linker to deal with it. You can see the source code of llvm-ld to see how it's done.

Same binary code on Windows and Linux (x86)

I want to compile a bunch of C++ files into raw machine code and the run it with a platform-dependent starter written in C. Something like
fread(buffer, 1, len, file);
a=((*int(*)(int))buffer)(b);
How can I tell g++ to output raw code?
Will function calls work? How can I make it work?
I think the calling conventions of Linux and Windows differ. Is this a problem? How can I solve it?
EDIT: I know that PE and ELF prevent the DIRECT starting of the executable. But that's what I have the starter for.
There is one (relatively) simple way of achieving some of this, and that's called "position independent code". See your compiler documentation for this.
Meaning you can compile some sources into a binary which will execute no matter where in the address space you place it. If you have such a piece of x86 binary code in a file and mmap() it (or the Windows equivalent) it is possible to invoke it from both Linux and Windows.
Limitations already mentioned are of course still present - namely, the binary code must restrict itself to using a calling convention that's identical on both platforms / can be represented on both platforms (for 32bit x86, that'd be passing args on the stack and returning values in EAX), and of course the code must be fully self-contained - no DLL function calls as resolving these is system dependent, no system calls either.
I.e.:
You need position-independent code
You must create self-contained code without any external dependencies
You must extract the machine code from the object file.
Then mmap() that file, initialize a function pointer, and (*myblob)(someArgs) may do.
If you're using gcc, the -ffreestanding -nostdinc -fPIC options should give you most of what you want regarding the first two, then use objdump to extract the binary blob from the ELF object file afterwards.
Theoretically, some of this is achievable. However there are so many gotchas along the way that it's not really a practical solution for anything.
System call formats are totally incompatible
DEP will prevent data executing as code
Memory layouts are different
You need to effectively dynamically 'relink' the code before you can run it.
.. and so forth...
The same executable cannot be run on both Windows and Linux.
You write your code platform independently (STL, Boost & Qt can help with this), then compile in G++ on Linux to output a linux-binary, and similarly on a compiler on the windows platform.
EDIT: Also, perhaps these two posts might help you:
One
Two
Why don't you take a look at wine? It's for using windows executables on Linux. Another solution for that is using Java or .NET bytecode.
You can run .NET executables on Linux (requires mono runtime)
Also have a look at Agner's objconv (disassembling, converting PE executable to ELF etc.)
http://www.agner.org/optimize/#objconv
Someone actually figured this out. It’s called αcτµαlly pδrταblε εxεcµταblε (APE) and you use the Cosmopolitan C library. The gist is that there’s a way to cause Windows PE executable headers to be ignored and treated as a shell script. Same goes for MacOS allowing you to define a single executable. Additionally, they also figured out how to smuggle ZIP into it so that it can incrementally compress the various sections of the file / decompress on run.
https://justine.lol/ape.html
https://github.com/jart/cosmopolitan
Example of a single identical Lua binary running on Linux and Windows:
https://ahgamut.github.io/2021/02/27/ape-cosmo/
Doing such a thing would be rather complicated. It isn't just a matter of the cpu commands being issued, the compiler has dependencies on many libraries that will be linked into the code. Those libraries will have to match at run-time or it won't work.
For example, the STL library is a series of templates and library functions. The compiler will inline some constructs and call the library for others. It'd have to be the exact same library to work.
Now, in theory you could avoid using any library and just write in fundamentals, but even there the compiler may make assumptions about how they work, what type of data alignment is involved, calling convention, etc.
Don't get me wrong, it can work. Look at the WINE project and other native drivers from windows being used on Linux. I'm just saying it isn't something you can quickly and easily do.
Far better would be to recompile on each platform.
That is achievable only if you have WINE available on your Linux system. Otherwise, the difference in the executable file format will prevent you from running Windows code on Linux.