How to determine which compiler has been used to compile an executable? - c++

From a compiled file, can I see which compiler has been used to generate the file?

There's also the good old 'strings' utility. Dumps all ascii-ish looking strings it finds in the binary. Different compilers embed different amounts of information in the binaries they produce, but many will actually include obviously identifying strings.

Many compilers/linkers insert a .comment section in the output file which identifies them. There are also a number of more subtle behaviors you could create compiler fingerprints off of, but I know of no existing tools for this.
If you have the source, the easiest solution would be to try compiling with each compiler in question until you get a binary that matches byte-for-byte (or even closely).

In some cases you can run ldd on the binary and find out which standard library it's linked against. For example on Solaris gcc vs Sun CC vs whatever.
For C++ code, you can also dump some of the symbols, find a mangled function name, and then figure out which demangler generates the correct original name.

Try, IDA Pro which identifies the libraries and tools used to build up executable file.

I was answering a quiz in a Blue Team website, and this was a question. I found the solution using a tool called PE Detective, he looks for signatures on the EXE, works really fine
https://www.softpedia.com/get/System/File-Management/PE-Detective.shtml

Related

How to find out which c++ standard used in a binary file?

For example, I have a "helloworld" cpp file named main.cpp.
If I compile it with flag -std=c++11. And I compile it again with flag -std=c++03.
How can I specify which is compiled with c++11 flag between this two?
extra: My specific problem is that I have a third-party lib file, I used it in my code, but I don't know which "-std" flag should I use.
If it is a third party library, then there must be some documentation stating the compilation steps to build from source. Please refer that.
If there is no such thing available, I am assuming that you at least have access to the source code, please look into the implementation (the header files or the source files), you will probably get more than enough information to figure out if it uses code conforming to C++ 11 standard.
#πάνταῥεῖ,I mean compile with different c++ standard won't leave something in the binary filie? - Riopho
If you want to figure out from binary, then I would probably use objdump and disassemble the binary with demangling turned on - objdump -dC <binary_name> - (assuming that you are on Linux, don't know much windows though). you should be able to get some hint from that.
I am not sure if the compiler leaves any traces in the binary though.

Finding all libraries and header files forming a C++ executable

If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.

Utility for analysing symbols in a library file

I'm having some problems with a large static library (.lib) file, and am suspecting code bloat from indiscriminate use of template classes. I want to analyse the symbols in the library to confirm which are making up the bulk of the file size.
When I link my executable against this library, the resulting output is much more sensible, size-wise (about 20Mb), so the linker is obviously stripping out lots of redundant symbols. I want to find out what its removing..
I know I can use dumpbin to generate the symbols and headers, but, with the library in question being pretty large (900Mb), this dump is pretty much unusable without a utility for parsing and reporting on it.
Obviously I could write this myself, but was wondering if anyone can recommend any freeware already available for this?
Is this your own library? If so you can generate a link map that describes the layout of the code in the library, which would give you the info you need here in a more friendly form.
If you don't have source code access to do this, you could use Perl or other open-source scripting tools to crack the dumpbin output.
EDIT: you could also give LibDump a spin, it's downloadable from here. I have not used this myself.
I found one (SymbolSort) that works really well, gives me exactly what I need:

Same binary code on Windows and Linux (x86)

I want to compile a bunch of C++ files into raw machine code and the run it with a platform-dependent starter written in C. Something like
fread(buffer, 1, len, file);
a=((*int(*)(int))buffer)(b);
How can I tell g++ to output raw code?
Will function calls work? How can I make it work?
I think the calling conventions of Linux and Windows differ. Is this a problem? How can I solve it?
EDIT: I know that PE and ELF prevent the DIRECT starting of the executable. But that's what I have the starter for.
There is one (relatively) simple way of achieving some of this, and that's called "position independent code". See your compiler documentation for this.
Meaning you can compile some sources into a binary which will execute no matter where in the address space you place it. If you have such a piece of x86 binary code in a file and mmap() it (or the Windows equivalent) it is possible to invoke it from both Linux and Windows.
Limitations already mentioned are of course still present - namely, the binary code must restrict itself to using a calling convention that's identical on both platforms / can be represented on both platforms (for 32bit x86, that'd be passing args on the stack and returning values in EAX), and of course the code must be fully self-contained - no DLL function calls as resolving these is system dependent, no system calls either.
I.e.:
You need position-independent code
You must create self-contained code without any external dependencies
You must extract the machine code from the object file.
Then mmap() that file, initialize a function pointer, and (*myblob)(someArgs) may do.
If you're using gcc, the -ffreestanding -nostdinc -fPIC options should give you most of what you want regarding the first two, then use objdump to extract the binary blob from the ELF object file afterwards.
Theoretically, some of this is achievable. However there are so many gotchas along the way that it's not really a practical solution for anything.
System call formats are totally incompatible
DEP will prevent data executing as code
Memory layouts are different
You need to effectively dynamically 'relink' the code before you can run it.
.. and so forth...
The same executable cannot be run on both Windows and Linux.
You write your code platform independently (STL, Boost & Qt can help with this), then compile in G++ on Linux to output a linux-binary, and similarly on a compiler on the windows platform.
EDIT: Also, perhaps these two posts might help you:
One
Two
Why don't you take a look at wine? It's for using windows executables on Linux. Another solution for that is using Java or .NET bytecode.
You can run .NET executables on Linux (requires mono runtime)
Also have a look at Agner's objconv (disassembling, converting PE executable to ELF etc.)
http://www.agner.org/optimize/#objconv
Someone actually figured this out. It’s called αcτµαlly pδrταblε εxεcµταblε (APE) and you use the Cosmopolitan C library. The gist is that there’s a way to cause Windows PE executable headers to be ignored and treated as a shell script. Same goes for MacOS allowing you to define a single executable. Additionally, they also figured out how to smuggle ZIP into it so that it can incrementally compress the various sections of the file / decompress on run.
https://justine.lol/ape.html
https://github.com/jart/cosmopolitan
Example of a single identical Lua binary running on Linux and Windows:
https://ahgamut.github.io/2021/02/27/ape-cosmo/
Doing such a thing would be rather complicated. It isn't just a matter of the cpu commands being issued, the compiler has dependencies on many libraries that will be linked into the code. Those libraries will have to match at run-time or it won't work.
For example, the STL library is a series of templates and library functions. The compiler will inline some constructs and call the library for others. It'd have to be the exact same library to work.
Now, in theory you could avoid using any library and just write in fundamentals, but even there the compiler may make assumptions about how they work, what type of data alignment is involved, calling convention, etc.
Don't get me wrong, it can work. Look at the WINE project and other native drivers from windows being used on Linux. I'm just saying it isn't something you can quickly and easily do.
Far better would be to recompile on each platform.
That is achievable only if you have WINE available on your Linux system. Otherwise, the difference in the executable file format will prevent you from running Windows code on Linux.

g++ produces big binaries despite small project

Probably this is a common question. In fact I think I asked it years ago... but I can't remember the answer.
The problem is: I have a project that is composed of 6 source files. All of them no more than 200 lines of code. It uses many STL containers, stdlib.h and iostream. Now the executable is around 800kb in size.... I guess I shouldn't statically link libraries. How to do this with GCC? And in Eclipse CDT?
EDIT:
As I responses away from what I want I think it's the case for a clarification. What I want to know is why such a small program is so big in size and what is the relationship with static, shared libraries and their difference. If it's a too long story to tell feel free to give pointers to docs. Thank you
If you give g++ dynamic library names, and don't pass the -static flag, it should link dynamically.
To reduce size, you could of course strip the binary, and pass the -Os (optimize for size) optimization flag to g++.
One thing to remember is that using the STL results in having that extra code in your executable even if you are dynamically linking with the C++ library. This is by virtue of the fact that the STL is a bunch of templates that aren't actually compiled until you write and compile your code. Since the library can't anticipate what you might store in a container, there's no way for the library to already contain the code for that particular usage of the container. Same goes with algorithms and everything else in the STL.
I'm not saying this is definitely the reason your executable is so much larger than you expect. But it may be a factor.
Use -O3 and -s flags to produce the most optimized binary. Also see this link for some more information.
If you are building for Windows, consider using the Microsoft compiler. It always produces the smallest binary on that platform.
Eclipse should be linking dynamically by default, unless you've set the static flag on the linker in your makefile.
In response to your EDIT :
-when you link statically, the executable contains a full copy of each library you've linked to.
-when you link dynamically, the executable only contains references and hooks to the linked libraries, which is a much much smaller amount of code.
The executable has to contain more than just your code.
At the very least, it contains some startup code, setting up the environment and if necessary, loading any external libraries, before the program launches.
If you've statically linked the runtime library, you also get that included in your executable. Otherwise you only get a small stub, just big enough to redirect system calls to the external runtime.
It may, depending on compiler settings also include a lot of debugging info and other non-essential data. If optimizations are enabled, that may have increased code size as well.
The real question is why does this matter? 800KB still fits easily on a floppy disk!
Most of this is a one-time cost. it doesn't mean that if you write twice as much code, it'll take up 1600KB. More likely, it'll take 810KB or something like that.
Don't worry about one-time startup costs.
The size usually results in static libraries being linked into your application.
You can reduce the size of the compiled binary by compiling to RELEASE versions, with optimizations to binary size.
Another source of executable size are the libraries. You said that you don't use external libraries, except for STD, so I believe you're including the C Runtime with your executable, ie, linking statically. so check for dynamic linking.
IMO you shouldn't really worry about that, but if you're really paranoid, check this: Smallest x86 ELF Hello World
use Visual C++ 6.0
it supported with Windows 95 to Windows 7.
and can be compiled as x86 platforms but only for Windows.
so if you are a Windows user just stick with Windows Compilers other than GCC which is sux actually.most of people who say Visual C++ is sux cause they are Anti-Microsofters.
also remember use "Visual C++ 6.0" if you use a newer one probably you can't run your files on Windows 95. I have tested all those things that's why I said.
GCC produces largest binaries, but Visual C++ not ,Intel Compiler can use to save more than 30% of space but it demands a Intel processor unless performance would be horrible.
another thing u need to remember is when u use templates though you see small lines
when you compiles those functions would be expanded so the result is make larger binaries.
if you need smaller binaries I suggest move to C cause C is actually widely used but not OO
infact C is easy to use than C++
this does make sense then C++ example
cout << "Hello World" << endl;
printf("%s","Hello World");
second one say print field %s means you type a string so it's easy. :P