How to retrieve flat binary from an executable file in C

How to retrieve flat binary from an executable file in C - c++

How to retrieve a block of binary from .text section in an executable?
I know objcopy can help by using:
objcopy --only-section=.text --output-target binary a.out a.out.bin
But it would be much better if I can realize the same goal within a function call using BFD library. Is there any way to call objcopy using function calls?

You probably are looking for function in binutils / bfd libs. You can find doc at http://www.delorie.com/gnu/docs/binutils/bfd_toc.html and I think the function you are looking for is:
boolean bfd_get_section_contents (bfd *abfd, asection *section,
PTR location, file_ptr offset,
bfd_size_type count);
whose doc can be find at http://www.delorie.com/gnu/docs/binutils/bfd_57.html

Related

How to generate dwarf information?

I'm new to dwarf format and trying to generate dwarf info.
I'm using windows.
I used this command gcc -c --debug=dwarf file_name and it is generating file in binary format.
So is this the right way?
is there any efficient way?
also how can I convert this binary file to readable format?
I need to access only function and variable details, So is there a way to only generate this info?

How to use dlopen() to get the executables path

I am trying to use dlopen() and dlinfo() to get the path my executable. I am able to get the path to a .so by using the handle returned by dlopen() but when I use the handle returned by dlopen(NULL,RTLD_LAZY); then the path I get back is empty.
void* executable_handle = dlopen(0, RTLD_LAZY);
if (nullptr != executable_handle)
{
char pp_linkmap[sizeof(link_map)];
int r = dlinfo(executable_handle, RTLD_DI_LINKMAP, pp_linkmap);
if (0 == r)
{
link_map* plink = *(link_map**)pp_linkmap;
printf("path: %s\n", plink->l_name);
}
}
Am I wrong in my assumption that the handle for the executable can be used in the dlinfo functions the same way a .so handle can be used?

Am I wrong in my assumption that the handle for the executable can be used in the dlinfo functions the same way a .so handle can be used?
Yes, you are.
The dynamic linker has no idea which file the main executable was loaded from. That's because the kernel performs all mmaps for the main executable, and only passes a file descriptor to the dynamic loader (who's job it is to load other required libraries and star the executable running).
I'm trying to replicate some of the functionality of GetModuleFileName() on linux
There is no reliable way to do that. In fact the executable may no longer exist anywhere on disk at all -- it's perfectly fine to run the executable and remove the executable file while the program is still running.
Also hard links mean that there could be multiple correct answers -- if a.out and b.out are hard linked, there isn't an easy way to tell whether a.out or b.out was used to start the program running.
Your best options probably are reading /proc/self/exe, or parsing /proc/self/cmdline and/or /proc/self/maps.

The BSD utility library has a function getprogname(3) that does exactly what you want. I'd suggest that is more portable and easier to use than procfs in this case.

Disable relocations when linking with LLD

Is there an option for lld that will tell it not to perform relocations. I don't want PIC code, I just want relocations not to be performed. (Yes I know this will result in an executable that doesn't work.)

Turns out to be an easy and fairly obvious solution - just pass -r or --relocatable. Then it won't apply relocations but will store them in the output file instead.
Edit: Unfortunately this does not quite have the effect I want, because you can't use --gc-sections and --relocatable at the same time.

C++ binary identification (manifest)

We have a large set of C++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. Then we compile an executable using these libraries and deploy the binary on the front-end. It would be extremely useful to be able to identify that binary. Ideally what we would like to have is a small script that would retrieve the following information directly from the binary:
$ident binary
$binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx...
So it should display all the information for the binary itself and for all of its dependencies.
Currently our approach is:
During compilation for each product we generate Manifest.h and Manifest.cpp and then inject Manifest.o into binary
ident script parses target binary, finds generated stuff there and prints this information
However this approach is not always reliable for different versions of gcc..
I would like to ask SO community - is there better approach to solve this problem?
Thanks for any advice

One of the catches with storing data in source code (your Manifest.h and .cpp) is the size limit for literal data, which is dependent on the compiler.
My suggestion is to use ld. It allows you to store arbitrary binary data in your ELF file (so does objcopy). If you prefer to write your own solution, have a look at libbfd.
Let us say we have a hello.cpp containing the usual C++ "Hello world" example. Now we have the following make file (GNUmakefile):
hello: hello.o hello.om
$(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $#
%.om: %.manifest
ld -b binary -o $# $<
%.manifest:
echo "$#" > $#
What I'm doing here is to separate out the linking stage, because I want the manifest (after conversion to ELF object format) linked into the binary as well. Since I am using suffix rules this is one way to go, others are certainly possible, including a better naming scheme for the manifests where they also end up as .o files and GNU make can figure out how to create those. Here I'm being explicit about the recipe. So we have .om files, which are the manifests (arbitrary binary data), created from .manifest files. The recipe states to convert the binary input into an ELF object. The recipe for creating the .manifest itself simply pipes a string into the file.
Obviously the tricky part in your case isn't storing the manifest data, but rather generating it. And frankly I know too little about your build system to even attempt to suggest a recipe for the .manifest generation.
Whatever you throw into your .manifest file should probably be some structured text that can be interpreted by the script you mention or that can even be output by the binary itself if you implement a command line switch (and disregard .so files and .so files hacked into behaving like ordinary executables when run from the shell).
The above make file doesn't take into account the dependencies - or rather it doesn't help you create the dependency list in any way. You can probably coerce GNU make into helping you with that if you express your dependencies clearly for each goal (i.e. the static libraries etc). But it may not be worth it to take that route ...
Also look at:
C/C++ with GCC: Statically add resource files to executable/library and
Is there a Linux equivalent of Windows' "resource files"?
If you want particular names for the symbols generated from the data (in your case the manifest), you need to use a slightly different route and use the method described by John Ripley here.
How to access the symbols? Easy. Declare them as external (C linkage!) data and then use them:
#include <cstdio>
extern "C" char _binary_hello_manifest_start;
extern "C" char _binary_hello_manifest_end;
int main(int argc, char** argv)
{
const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start;
printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start);
}
The symbols are the exact characters/bytes. You could also declare them as char[], but it would result in problems down the road. E.g. for the printf call.
The reason I am calculating the size myself is because a.) I don't know whether the buffer is guaranteed to be zero-terminated and b.) I didn't find any documentation on interfacing with the *_size variable.
Side-note: the * in the format string tells printf that it should read the length of the string from the argument and then pick the next argument as the string to print out.

You can insert any data you like into a .comment section in your output binary. You can do this with the linker after the fact, but it's probably easier to place it in your C++ code like this:
asm (".section .comment.manifest\n\t"
".string \"hello, this is a comment\"\n\t"
".section .text");
int main() {
....
The asm statement should go outside any function, in this instance. This should work as long as your compiler puts normal functions in the .text section. If it doesn't then you should make the obvious substitution.
The linker should gather all the .comment.manifest sections into one blob in the final binary. You can extract them from any .o or executable with this:
objdump -j .comment.manfest -s example.o

Have you thought about using standard packaging system of your distro? In our company we have thousands of packages and hundreds of them are automatically deployed every day.
We are using debian packages that contain all the neccessary information:
Full changelog that includes:
authors;
versions;
short descriptions and timestamps of changes.
Dependency information:
a list of all packages that must be installed for the current one to work correctly.
Installation scripts that set up environment for a package.
I think you may not need to create manifests in your own way as soon as ready solution already exists. You can have a look at debian package HowTo here.

call stack for code compiled without -g option (gcc compiler)

How do I analyze the core dump (using gdb)
which is not compiled with -g GCC option ?

Generate a map file. The map file will tell you the address that each function starts at (as an offset from the start of the exe so you will need to know the base address its loaded too). So you then look at the instruction pointer and look up where it falls in the map file. This gives you a good idea of the location in a given function.
Manually unwinding a stack is a bit of a black art, however, as you have no idea what optimisations the compiler has performed. When you know, roughly, where you are in the code you can generally work out what ought to be on the stack and scan through memory to find the return pointer. its quite involved however. You effectively spend a lot of time reading memory data and looking for numbers that look like memory addresses and then checking that to see if its logical. Its perfectly doable and I, and I'm sure many others, have done it lots of times :)

With ELF binaries it is possible to separate the debug symbols into a separate file. Quoting from objcopy man pages:
Link the executable as normal (using the -g flag). Assuming that is is called foo then...
Run objcopy --only-keep-debug foo foo.dbg to create a file containing the debugging info.
Run objcopy --strip-debug foo to create a stripped executable.
Run objcopy --add-gnu-debuglink=foo.dbg foo to add a link to the debugging info into the stripped executable.

that should not be a problem , you can compile the source again with -g option and pass gdb the core and the new compiled debug binary, it should work without any problem.
BTW You can generate a map file with the below command in gcc
gcc -Wl,-Map=system.map file.c
The above line should generate the map file system.map, Once the map file is generated you can map the address as mentioned above but i am not sure how are you going to map the share library, this is very difficult

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js