Exporting functions from an executable using a def file - c++

There is plenty of information available about how to export functions from a dll (which I've done many times), but I heard that it's also possible to export functions from an executable, so that an external dll can call them.
Although I've managed to get this working, it seems as though there's some problem with the entry point:
If it is not explicitly set, then it defaults to the wrong "main" in
an obscure sub-library.
If it is explicitly set, then its input arguments, argc and argv get corrupted (argc can be ~20000000 or ~-700000).
I'm having trouble finding any documentation on exporting functions from an executable - should I be taking the hint and not doing it?
[Context: This was part of an effort to make our process work on both Windows and Linux. The Linux version was accidentally picking up functions from the executable, rather than ones explicitly exported from an attendant dll (the functions had the same name, but different args). We decided to try to run with this, and export the functions from the executable on Windows as well.]

I'm posting this just to summarise my own learning on this, in absence of a better answer:
Immediately after adding the def file, the linker complained that it couldn't determine an entry point. It was for this reason that I added the /ENTRY reference. During rework, however, I removed the /ENTRY while removing the def file, and I could compile without error - I must have removed a subtly conflicting option in the meantime.
The def file does export the functions from the .exe successfully, and these can then be used in a dll of that process (if it links to DelayImp.lib and the executable's .lib).
I was never able to get the /ENTRY option to work satisfactorily, and combined with the mildly discouraging comments on the MSDN item [https://msdn.microsoft.com/en-us/library/f9t8842e.aspx], I see no reason to use it in this case.
I hope that this is of some use to anybody else attempting to do something similar. I will be happy to re-designate a more technical answer as "the solution", should one appear...

Related

Retrieve functions (name, return type, parameters) from unknown dll (c++)

I need to work with a .dll (c++ 64bit) which is made by one of my company's vendors, and they no longer exist. I don't have the header file (.h).
I follow this:
Get the function prototypes from an unknown .dll
I was able to get the function name, but I don't know what the function return type and parameters are.
I tried dependency walker too, same result.
This dll is not a public dll, it is customized for our company to use, so the functions inside cannot be searched online.
Are there any tools or methods that can help me retrieve this info?
I need to work with a .dll (c++ 64bit) which is made by one of my company's vendors, and they no longer exist. I don't have the header file (.h).
In a professional setting, that is a symptom to give up immediately using that library, and find and use something else. Yes, that costs money and time (so advise your manager and/or client). You might prefer using some free software library instead next time (because then you could at least continue maintaining that library, even if its vendor disappeared).
You could consider that C++ usually do name mangling and try to reverse-engineer something (but you won't be able to guess the fields inside class-es reliably or other information related to types). I don't recommend trying, since that could take a lot of time & money and you won't be able to guess everything (so you don't know if you'll guess enough information to use that stuff).
(what happens to you is a project mismanagement mistake, not a developer's mistake)
Using (even a few minutes) as a developer a C++ library without full headers and documentation for it is just crazy. Don't do that.
So there is no technical solution to your issue. Decompilation is generally impossible (and costs a big lot and usually don't work well)
Dependency walker should work for you. Here is an overview of how the functions, return types and arguments are displayed:
[dependency walker overview] https://kb.froglogic.com/download/attachments/2457645/dependency_walker_provided_information.png?version=1&modificationDate=1341338967000
Remember that you have to use a different version of dependency walker for 32bit or 64bit dll.
What is more, if you see some strange symbols in your function parameters like #,# etc..., then you are missing some dependent dll.

C++: Include vs LoadLibrary()

I am having some trouble understanding why both #include and LoadLibrary() is needed in C++. In C++ "#include" forces the pre-processor to replace the #include line with the contents of the file you are including (usually a header file containing declarations). As far as I understand, this enables me to use the routines I might want in the external libraries the headers belong to.
Why do I then need LoadLibrary()? Can't i just #include the library itself?
Just as a side note: In C#, which I am more familiar with, I just Add a Reference to a DLL if I want to use types or routines from that DLL in my program. I do not have to #include anything, as the .NET framework apparently automatically searches all the referenced assemblies for the routines I want to use (as specified by the namespace)
Thank you very much in advance.
Edit: Used the word "definitions", but meant "declarations". Now fixed.
Edit 2: Tough to pick one answer, many good replies. Thanks for all contributions.
C++ uses a full separate compilation model; you can even compile
against code which hasn't been written. (This often occurs in
large projects.) When you include a file, all you are doing is
telling the compiler that the functions, etc. exist. You do not
provide an implementation (except for inline functions and
templates). In order to execute the code, you have to provide
the implementation, by linking it into your application. This
can occur in several different ways:
You have the source files; you compile them along with your
sources, and link in the resulting objects.
You have a static library; you must link against it.
You have a dynamic library. Here, what you must do will
depend on the implemention: under Windows, you must link
against a .lib stub, and put the .dll somewhere where the
runtime will find it when you execute. (Putting it in the same
directory as your application is usually a good solution.)
I don't quite understand your need to call LoadLibrary. The
only time I've needed this is when I've intentionally avoided
using anything in the library directly, and want to load it
conditionally, use GetProcAddr to get the addresses of the
functions I need.
EDIT:
Since I was asked to clarify "linking": program translation
(from the source to an executable) takes place in a number of
steps. In traditional terms, each translation unit is
"compiled" into an object file, which contains an image of the
machine instructions, but with unfilled spaces for external
references. For example, if you have:
extern void function();
in your source (probably via inclusion of a header), and you
call function, the compiler will leave the address field of
the call instruction blank, since it doesn't know where the
function will be located. Linking is the process of taking all
of the object files, and filling in these blanks. One of the
object files will define function, and the linker will
establish the actual address in the memory image, and fill in
the blank referring to function with the address of function
in that image. The result is a complete memory image of the
executable. On the early systems I worked on: literally. The
OS would simply copy the executable file directly into memory,
and then jump into it. Things like virtual memory and shared,
write protected code segments make this a little more
complicated today, but for statically linked libraries or object
files (my first two cases above), the differences aren't that
great.
Modern system technologies have blurred the lines somewhat. For
example, most Java (and I think C#) compilers don't generate
classical object files, with machine code, but rather byte code,
and the compile and link phases, above, don't take place until
runtime. Some C++ compilers also only generate byte code, which
will be compiled when the code is "linked". This is done to
permit cross-module optimizations. And all modern systems
support dynamic linking: some of the blank addresses are left
blank until execution time. And dynamic linking can be implicit
or explicit: when it is implicit, the link phase will insert
information into the executable concerning the libraries it
needs, and where to find them, and the OS will link them,
implicitly, either when the executable is loaded, or on demand,
triggered by the code attempting to use one of the unfilled
address slots. When it is explicit, you normally don't have any
explicit referenced to the name in your code. In the case of
function, above, for example, you wouldn't have any code which
directly called function. Your code would, however, load the
dynamic library using LoadLibrary (or dlopen under Unix),
then request the address of a name, using GetProcAddr (or
dlsys), and call the function indirectly through the pointer
it received.
The #include directive is, like all preprocessor functionality, merely a text replacement. The text "#include " is replaced with the contents of that file.
Typically (but not necessarily), this is used to include a header file which declares the functions that you want to use, i.e. you tell the compiler (which runs after the preprocessor) how some functions that you intend to use are named, what parameters they take, and what the return type is. It does not define what the function is actually doing.
You then also need an implementation of these functions, too. Usually, if you do not implement them in your program, you leave this task to the link stage. You give a list of libraries that your program depends on to the linker, and the linker divines via some implementation-defined way (such as an "import library") what it needs to do to "make it work". The linker will produce some glue code and write some information into the executable that will make the loader automatically load the required libraries. Everything "just works" without you having to do something special.
In some cases, however, you want to postpone the linker stage and do the loading "fully dynamically" by hand rather than automatically. This is when you have to call LoadLibrary() and GetProcAddress. The former brings the DLL into memory and does some setup (e.g. relocation), the latter gives you the address of a function that you want to call.
The #include in your code is still necessary so the compiler knows what to do with that pointer. Otherwise, you could of course call the obtained function via its address, but it would not be possible to call the function in a meaningful way.
One reason why one would want to load a library manually (using LoadLibrary) is that it is more failsafe. If you link a program against a library and the library cannot be found (or a symbol cannot be found), then your application will not start up and the user will see a more or less obscure error message.
If LoadLibrary fails or GetProcAddress doesn't work, your program can in principle still run, albeit with reduced functionality.
Another example for using LoadLibrary might be to load an alternative version of a function from a different library (some programs implement "plugins" that way). The function "looks" the same to the compiler, as defined in the include file, but may behave differently, as by whatever is in the loaded binary.
#include brings in source code only: symbol declarations for the compiler. A library (or a DLL) is object code: Use either LoadLibrary or link to a lib file to bring in object code.
LoadLibrary() causes the code module to be loaded from disk into your applications memory space for execution. This allows for dynamically loading code at runtime. You would not use LoadLibrary(), for example, if the code you want to use is compiled into a statically linked library. In that case you would provide the name of the .lib file that contained the code to the linker and it gets resolved at link time - the code is linked in to your .exe and the .lib is not distributed with the .exe in order for it to execute.
LoadLibrary() creates a dependency on an external DLL which must be present on the path provided to the method call in order for the .exe to properly execute. If LoadLibrary() fails, you must ensure your code will handle it appropriately, by either exiting gracefully or providing some other execution alternative. You must provide a .lib file to the linker the same as you would for the static library above. This .lib file however does not contain code, just entry points for the actual code that resides in the .dll.
In both cases you must #include the headers for the code you wish to execute. This is required by the compiler in order to build function call signatures properly based on the type information provided by the header.
C# assemblies contain both type information and IL. A single reference is sufficient to satisfy the need for header information and binding to the code itself.
#include is static, the substitution is done at compile time. LoadLibrary() lets you load a DLL at runtime, for example based on user imput.

How much source information is stored in c++ executables

Some days ago I accidentally opened a C++ executable of a commercial application in Notepad++ and found out that there's quite a lot information about the original source code stored in the executable.
Inside the executable I could find file names (app.c, dlgstat.c, ...), function names (GetTickCount, DispatchMessageA, ...) and small pieces of source code, mostly conditions (szChar != TEXT('\0'), iRow < XTGetRows( hwndList )). After that I checked another QT executable and: yes again source file names and method signatures.
Because of that I am wondering how much source code information is really stored in a C/C++ executable (e.g., compiled using QT or MinGW). Is this probably some kind of debug build still containing the original source? Is this information used for some reflection stuff? Is there any reason why publishers don't remove this stuff?
How much source code information is really stored in a C/C++ executable?
In practice, not much. The source code is not required at runtime. The strings you name come from two things:
The function names (e.g. GetTickCount) are the names of functions imported from other modules. The names are required at runtime because the functions are resolved dynamically (by calling GetProcAddress with the function name).
The conditions are likely assertions: the assert macro stringizes its argument so that when it fires you know what condition was not met.
If you build a DLL, it will also contain a names of all of the functions it exports, so they can be resolved at runtime (the same is likely true for other shared object formats).
Debug symbols may also contain some of the original source code, though it depends on the format used by the debug symbols. These symbols may be contained either in the binary itself or in an auxiliary file (for example, .pdb files used on Windows).
Windows function names: they probably are there just because they are being accessed dynamically - somewhere in your program there's a GetProcAddress to get their address. Still, no reason to worry, every application uses WinAPIs, so there's not much to discover about your executable from that information.
Conditions: probably from some assert-like macro; they are included to allow assert to print what failed condition triggered the failed assertion. Anyhow, in release mode assertions should be removed automatically.
Source file names and method signatures: probably from some usage of __FILE__ and __func__ macros; probably, again, from assert.
Other sources of information about the inner structure of your program is RTTI, that has to provide some representation for every type that typeid could be working on. If you don't need its functionality, you can disable it (but I don't know if that is possible in Qt projects).
Mixed into the binary of a C++ app you will find the names of most global symbols (and debugging symbols if enabled in the compiler), but with extra 'decoration text' that encodes the calling signature of the symbol if it is a function or method. Likewise, the literals of character strings are embedded in clear text. But no where will you find anything like the actual source code that the compiler used to create the binary executable. That information is lost during the compilation process, and it is especially hard to reverse engineer if C++ templates are employed in the build.

Hide or remove unwanted strings from windows executable release

I have this habit always a C++ project is compiled and the release is built up. I always open the .EXE with a hexadecimal editor (usually HxD) and have a look at the binary information.
What I hate most and try to find a solution for is the fact that somewhere in the string table, relevant (at least, from my point of view) information is offered. Maybe for other people this sounds like a schizophrenia obsession but I just don't like when my executable contains, for example, the names of all the Windows functions used in the application.
I have tried many compilers to see which of them published the least information. For example, GCC leaves all this in all of its produced final exe
libgcj_s.dll._Jv_RegisterClasses....\Data.ald.rb.Error.Data file is corrupt!
....Data for the application not found!.€.#.ř.#.0.#.€.#.°.#.p.#.p.#.p.#.p.#.
¸.#.$.#.€.#°.#.std::bad_alloc..__gnu_cxx::__concurrence_lock_error.__gnu_cxx
::__concurrence_unlock_error...std::exception.std::bad_exception...pure virt
ual method called..../../runtime/pseudo-reloc.c....VirtualQuery (addr, &b, s
ize of(b))............................/../../../gcc-4.4.1/libgcc/../gcc/conf
ig/i386/cygming-shared-data.c...0 && "Couldn't retrieve name of GCClib share
d data atom"....ret->size == sizeof(__cygming_shared) && "GCClib shared data
size mismatch".0 && "Couldn't add GCClib shared data atom".....-GCCLIBCYGMI
NG-EH-TDM1-SJLJ-GTHR-MINGW32........
Here, you can see what compiler I used, and what version. Now, a few lines below you can see a list with every Windows function I used, like CreateMainWindow, GetCurrentThreadId, etc.
I wonder if there are ways of not displaying this, or encrypting, obfuscating it.
With Visual C++ this information is not published. Instead, it is not so cross-platform as GCC, which even between two Windows systems like 7 and XP, doesn't need C++ run-time, frameworks or whatever programs compiled with VC++ need. Moreover, the VC++ executables also contain those procedures entry points to the Windows functions used in the application.
I know that even NASM, for example, saves the name of the called Windows functions, so it looks like it's a Windows issue. But maybe they can be encrypted or there's some trick to not show them.
I will have a look over the GCC source code to see where are those strings specified to be saved in the executables - maybe that instruction can be skipped or something.
Well, this is one of my last paranoia and maybe it can be treated some way. Thanks for your opinions and answers.
If you compile with -nostdlib then the GCC stuff should go away but you also lose some of the C++ support and std::*.
On Windows you can create an application that only links to LoadLibrary and GetProcAddress and at runtime it can get the rest of the functions you need (The names of the functions can be stored in encrypted form and you decrypt the string before passing it to GetProcAddress) Doing this is a lot of work and the Windows loader is probably faster at this than your code is going to be so it seems pointless to me to obfuscate the fact that you are calling simple functions like GetLastError and CreateWindow.
Windows API functions are loaded from dlls, like kernel32.dll. In order to get the loaded API function's memory address, a table of exported function names from the dll is searched. Thus the presence of these names.
You could manually load any Windows API functions you reference with LoadLibrary. The you could look up the functions' addresses with GetProcAddress and functions names stored in some obfuscated form. Alternately, you could use each function's "ordinal" -- a numeric value that identifies each function in a dll). This way, you could create a set of function pointers that you will use to call API functions.
But, to really make it clean, you would probably have to turn off linking of default libraries and replace components of the C Runtime library that are implicitly used by the compiler. Doing this is a hasslse, though.

Multiple Definitions of _Unwind_Resume

For a while, I've been using a small collection of files I wrote making it easier to interface with WinAPI. Although, it's become a pain to keep moving the files around when I want to reuse them, waiting for them to recompile, etc. I finally decided to just throw them in a DLL, and be done with it, but I'm getting an odd link error every time I try to use the library.
The error is really as specific as the title, providing little information about where the definition actually originates(considering, that kind of information can't really be collected from a DLL, as far as I'm aware). Could someone please explain exactly what would cause this error, as well as providing some possible fixes to the problem?
I'm using MinGW(the same version provided by the SFML site, 4.4) along with Code::Blocks, if that information helps any. If any more information is required, I'll do my best to provide it.
The problem is that there are multiple definitions for a symbol (function or variable) named _Unwind_Resume.
The DLL is exporting such a name. Rebuild it so that it only exposes desired symbols. Apparently, it is now built with all public symbols being exported.