How to find out which c++ standard used in a binary file? - c++

For example, I have a "helloworld" cpp file named main.cpp.
If I compile it with flag -std=c++11. And I compile it again with flag -std=c++03.
How can I specify which is compiled with c++11 flag between this two?
extra: My specific problem is that I have a third-party lib file, I used it in my code, but I don't know which "-std" flag should I use.

If it is a third party library, then there must be some documentation stating the compilation steps to build from source. Please refer that.
If there is no such thing available, I am assuming that you at least have access to the source code, please look into the implementation (the header files or the source files), you will probably get more than enough information to figure out if it uses code conforming to C++ 11 standard.
#πάνταῥεῖ,I mean compile with different c++ standard won't leave something in the binary filie? - Riopho
If you want to figure out from binary, then I would probably use objdump and disassemble the binary with demangling turned on - objdump -dC <binary_name> - (assuming that you are on Linux, don't know much windows though). you should be able to get some hint from that.
I am not sure if the compiler leaves any traces in the binary though.

Related

Making sense of .so file: trying to restore poorly versioned source files

If this question is too generic, please tell me so i can delete it.
I have a software used in operation that is compiled with linking to a .so file. The file is generated in compilation of a set of versioned .c and .cpp sources. Previous developer generated the .so file compiling a local version of source files that was modified in unknown ways and modified sources are god-knows where, if anywhere in the system at all. Fortunately it was compiled with debugging symbols, so reading it with gdb is easier.
Software is being used in operation and i need to modify it. Recompiling any known version of it will obviously generate results that differ from current compiled version in unknown ways. I want to dig as deep as possible in current .so file to know what it is doing, so that i can recompile sources generating as similar a result as i can. What i did until now:
readelf --debug-dump=info path/to/file | grep "DW_AT_producer" to see compilation flags and reproduce them in new compilations.
(gdb) info functions to see what functions are defined and compare it with previous versions of code.
Going function by function on the functions listed by previous command and: list <function>
Does anyone have any more tips on how to get as much info from .so file as i can? Since im not expert with gdb yet: am i missing something important?
Edit: by using strip in both files (compiled from original source and compiled from mysterious lost source file) i managed to see that most of differences between them were just debug symbols (which is weird because it seems both were compiled with -g option).
There is only one line of difference between them now.
I just found out that "list" just reads the source file from the binary, so list doesn't help me
You are confused: the source is never stored in the binary. GDB list command is showing the source as it exists in some file on disk.
The info sources command will show where on disk GDB is reading the sources from.
If you are lucky, that's the sources that were used to build the .so binary, and your task is trivial -- compare them to VCS sources to find modifications.
If you are unlucky, the sources GDB reads have been overwritten, and your task is much harder -- effectively you'll need to reverse-engineer the .so binary.
The way I would approach the harder task: build the library from VCS sources, and then for each function compare disas fn between the two versions of .so, and study differences (if any).
P.S. I hope you are also using the exact same version of the compiler that was used to compile the in-production .so, otherwise your task becomes much harder still.

should I symlink the C++ include directory so that it also appears under /usr/include?

I'm using Cygwin32 on Win7 64. I have g++ and libstdc++ installed. The C++ includes are located at /usr/lib/gcc/i686-pc-cygwin/4.8.2/include/c++/tr1/ - but nowhere under /usr/include.
Is it reasonable to place them, by symlink, under /usr/include? If not, why? And if so, why isn't this done by default And what should the symlink be? /usr/include/c++/ ? Something else?
Note: Yes, I know I can add them to the compiler flags; I'm asking whether it's reasonable to do more than that.
There shouldn't be any need, if you are talking about standard C++ includes. The g++ version destined to use them should know about that location, and since you might have different gcc versions around (for example, MinGW's one), it is better to leave it as it is just to not confuse other compilers.
If your compiler is having troubles finding its own includes, well, that's entirely another matter.
If you are curious about how and why this location is determined, read here, specifically under the option --enable-version-specific-runtime-libs ... it says something about "using several gcc versions in parallel". You can also check the actual configure script under libstdc++-v3 source code directory...
In my personal experience, when you are creating a single library for a bunch of platforms, you simply want (cross-) compilers as independent as possible. If every compiler puts its includes in /usr/include/c++ ... well, that can end bad. In fact, under that particular scenario, it could be reasonable for each compiler to hide its specific header and library files as well as possible...
Just add them to your environment variable CPPFLAGS (or in your makefile):
CPPFLAGS='-I/usr/lib/gcc/i686-pc-cygwin/4.8.2/include/c++/tr1 -I/whatev'

How to determine which compiler has been used to compile an executable?

From a compiled file, can I see which compiler has been used to generate the file?
There's also the good old 'strings' utility. Dumps all ascii-ish looking strings it finds in the binary. Different compilers embed different amounts of information in the binaries they produce, but many will actually include obviously identifying strings.
Many compilers/linkers insert a .comment section in the output file which identifies them. There are also a number of more subtle behaviors you could create compiler fingerprints off of, but I know of no existing tools for this.
If you have the source, the easiest solution would be to try compiling with each compiler in question until you get a binary that matches byte-for-byte (or even closely).
In some cases you can run ldd on the binary and find out which standard library it's linked against. For example on Solaris gcc vs Sun CC vs whatever.
For C++ code, you can also dump some of the symbols, find a mangled function name, and then figure out which demangler generates the correct original name.
Try, IDA Pro which identifies the libraries and tools used to build up executable file.
I was answering a quiz in a Blue Team website, and this was a question. I found the solution using a tool called PE Detective, he looks for signatures on the EXE, works really fine
https://www.softpedia.com/get/System/File-Management/PE-Detective.shtml

Same binary code on Windows and Linux (x86)

I want to compile a bunch of C++ files into raw machine code and the run it with a platform-dependent starter written in C. Something like
fread(buffer, 1, len, file);
a=((*int(*)(int))buffer)(b);
How can I tell g++ to output raw code?
Will function calls work? How can I make it work?
I think the calling conventions of Linux and Windows differ. Is this a problem? How can I solve it?
EDIT: I know that PE and ELF prevent the DIRECT starting of the executable. But that's what I have the starter for.
There is one (relatively) simple way of achieving some of this, and that's called "position independent code". See your compiler documentation for this.
Meaning you can compile some sources into a binary which will execute no matter where in the address space you place it. If you have such a piece of x86 binary code in a file and mmap() it (or the Windows equivalent) it is possible to invoke it from both Linux and Windows.
Limitations already mentioned are of course still present - namely, the binary code must restrict itself to using a calling convention that's identical on both platforms / can be represented on both platforms (for 32bit x86, that'd be passing args on the stack and returning values in EAX), and of course the code must be fully self-contained - no DLL function calls as resolving these is system dependent, no system calls either.
I.e.:
You need position-independent code
You must create self-contained code without any external dependencies
You must extract the machine code from the object file.
Then mmap() that file, initialize a function pointer, and (*myblob)(someArgs) may do.
If you're using gcc, the -ffreestanding -nostdinc -fPIC options should give you most of what you want regarding the first two, then use objdump to extract the binary blob from the ELF object file afterwards.
Theoretically, some of this is achievable. However there are so many gotchas along the way that it's not really a practical solution for anything.
System call formats are totally incompatible
DEP will prevent data executing as code
Memory layouts are different
You need to effectively dynamically 'relink' the code before you can run it.
.. and so forth...
The same executable cannot be run on both Windows and Linux.
You write your code platform independently (STL, Boost & Qt can help with this), then compile in G++ on Linux to output a linux-binary, and similarly on a compiler on the windows platform.
EDIT: Also, perhaps these two posts might help you:
One
Two
Why don't you take a look at wine? It's for using windows executables on Linux. Another solution for that is using Java or .NET bytecode.
You can run .NET executables on Linux (requires mono runtime)
Also have a look at Agner's objconv (disassembling, converting PE executable to ELF etc.)
http://www.agner.org/optimize/#objconv
Someone actually figured this out. It’s called αcτµαlly pδrταblε εxεcµταblε (APE) and you use the Cosmopolitan C library. The gist is that there’s a way to cause Windows PE executable headers to be ignored and treated as a shell script. Same goes for MacOS allowing you to define a single executable. Additionally, they also figured out how to smuggle ZIP into it so that it can incrementally compress the various sections of the file / decompress on run.
https://justine.lol/ape.html
https://github.com/jart/cosmopolitan
Example of a single identical Lua binary running on Linux and Windows:
https://ahgamut.github.io/2021/02/27/ape-cosmo/
Doing such a thing would be rather complicated. It isn't just a matter of the cpu commands being issued, the compiler has dependencies on many libraries that will be linked into the code. Those libraries will have to match at run-time or it won't work.
For example, the STL library is a series of templates and library functions. The compiler will inline some constructs and call the library for others. It'd have to be the exact same library to work.
Now, in theory you could avoid using any library and just write in fundamentals, but even there the compiler may make assumptions about how they work, what type of data alignment is involved, calling convention, etc.
Don't get me wrong, it can work. Look at the WINE project and other native drivers from windows being used on Linux. I'm just saying it isn't something you can quickly and easily do.
Far better would be to recompile on each platform.
That is achievable only if you have WINE available on your Linux system. Otherwise, the difference in the executable file format will prevent you from running Windows code on Linux.

How to view source code of header file in C++?

similar to iostream.h ,conio.h , ...
The standard library is generally all templates. You can just open up the desired header and see how it's implemented†. Note it's not <iostream.h>, it's <iostream>; the C++ standard library does not have .h extensions. C libraries like <string.h> can be included as <cstring> (though that generally just includes string.h)
That said, the run-time library (stuff like the C library, not-template-stuff) is compiled. You can search around your compiler install directory to find the source-code to the run-time library.
Why? If just to look, there you go. But it's a terrible way to try to learn, as the code may have non-standard extensions specific to the compiler, and most implementations are just generally ugly to read.
If you have a specific question about the inner-workings of a function, feel free to start a new question and ask how it works.
† I should mention that you may, on the off chance, have a compiler that supports export. This would mean it's entirely possible they have templated code also compiled; this is highly unlikely though. Just should be mentioned for completeness.
From a comment you added, it looks like you're looking for the source to the implementations of functions that aren't templates (or aren't in the header file for whatever reason). The more traditional runtime library support is typically separately compiled and in a library file that gets linked in to your program.
The majority of compilers provide the source code for the library (though it's not guaranteed to be available), but the source files might be installed anywhere on your system.
For the Microsoft compilers I have installed, I can find the source for the runtime in a directory under the Visual Studio installed location named something like:
vc\crt\src // VS2008
vc7\crt\src // VS2003
vc98\crt\src // VC6
If you're using some other compiler, poke around the installation directory (and make sure that you had asked that runtime sources to be installed when you installed your compiler tools).
As mentioned, it is implementation specific but there is an easy way to view contents of header files.
Compile your code with just preprocessing enabled for gcc and g++ it is -E option.
This replaces the contents of header files by their actual content and you can see them.
On linux, you can find some of them in /usr/include
These files merely contain declarations and macro definitions.The actual implementation source files can be obtained from the library provider e.g the source code of standard C++ Library(libstdc++) is obtainable here.
According to the C++ language specification, implementors do not have to put standard headers into physical files. Implementors are allowed to have the headers hard coded in the translator's executable.
Thus, you may not be able to view the contents of standard header files.