Hide or remove unwanted strings from windows executable release

Hide or remove unwanted strings from windows executable release - c++

I have this habit always a C++ project is compiled and the release is built up. I always open the .EXE with a hexadecimal editor (usually HxD) and have a look at the binary information.
What I hate most and try to find a solution for is the fact that somewhere in the string table, relevant (at least, from my point of view) information is offered. Maybe for other people this sounds like a schizophrenia obsession but I just don't like when my executable contains, for example, the names of all the Windows functions used in the application.
I have tried many compilers to see which of them published the least information. For example, GCC leaves all this in all of its produced final exe
libgcj_s.dll._Jv_RegisterClasses....\Data.ald.rb.Error.Data file is corrupt!
....Data for the application not found!.€.#.ř.#.0.#.€.#.°.#.p.#.p.#.p.#.p.#.
¸.#.$.#.€.#°.#.std::bad_alloc..__gnu_cxx::__concurrence_lock_error.__gnu_cxx
::__concurrence_unlock_error...std::exception.std::bad_exception...pure virt
ual method called..../../runtime/pseudo-reloc.c....VirtualQuery (addr, &b, s
ize of(b))............................/../../../gcc-4.4.1/libgcc/../gcc/conf
ig/i386/cygming-shared-data.c...0 && "Couldn't retrieve name of GCClib share
d data atom"....ret->size == sizeof(__cygming_shared) && "GCClib shared data
size mismatch".0 && "Couldn't add GCClib shared data atom".....-GCCLIBCYGMI
NG-EH-TDM1-SJLJ-GTHR-MINGW32........
Here, you can see what compiler I used, and what version. Now, a few lines below you can see a list with every Windows function I used, like CreateMainWindow, GetCurrentThreadId, etc.
I wonder if there are ways of not displaying this, or encrypting, obfuscating it.
With Visual C++ this information is not published. Instead, it is not so cross-platform as GCC, which even between two Windows systems like 7 and XP, doesn't need C++ run-time, frameworks or whatever programs compiled with VC++ need. Moreover, the VC++ executables also contain those procedures entry points to the Windows functions used in the application.
I know that even NASM, for example, saves the name of the called Windows functions, so it looks like it's a Windows issue. But maybe they can be encrypted or there's some trick to not show them.
I will have a look over the GCC source code to see where are those strings specified to be saved in the executables - maybe that instruction can be skipped or something.
Well, this is one of my last paranoia and maybe it can be treated some way. Thanks for your opinions and answers.

If you compile with -nostdlib then the GCC stuff should go away but you also lose some of the C++ support and std::*.
On Windows you can create an application that only links to LoadLibrary and GetProcAddress and at runtime it can get the rest of the functions you need (The names of the functions can be stored in encrypted form and you decrypt the string before passing it to GetProcAddress) Doing this is a lot of work and the Windows loader is probably faster at this than your code is going to be so it seems pointless to me to obfuscate the fact that you are calling simple functions like GetLastError and CreateWindow.

Windows API functions are loaded from dlls, like kernel32.dll. In order to get the loaded API function's memory address, a table of exported function names from the dll is searched. Thus the presence of these names.
You could manually load any Windows API functions you reference with LoadLibrary. The you could look up the functions' addresses with GetProcAddress and functions names stored in some obfuscated form. Alternately, you could use each function's "ordinal" -- a numeric value that identifies each function in a dll). This way, you could create a set of function pointers that you will use to call API functions.
But, to really make it clean, you would probably have to turn off linking of default libraries and replace components of the C Runtime library that are implicitly used by the compiler. Doing this is a hasslse, though.

Related

Get available library functions at runtime

I am working with dynamic linked libraries (.dll) on windows or shared objects (.so) on Linux.
My goal is to write some code, that can - give the absolute path of the library on the disk - return a list of all exported functions (the export table) of that library and ultimately be able to call those functions. This should work on windows (wit dll) as well as on Linux (with so).
I am writing kind of a wrapper that delgates function calls to the respective library. Therefore I receive a path, a function name, and a list of parameters which I then want to forward. the thing is: I want to know whether the given function exists before trying to call it
From here I found a platform-independent way of opening and closing the library as well as getting a pointer to the function with the given name.
So the thing that remains is getting the names of available functions in the first place.
On this topic, I found this question dealing with the same kind of problem only that it asks for a Linux specific solution. In the given answer it is said
There is no libc function to do that. However, you can write one yourself (or copy/paste the code from a tool like readelf).
This clearly indicates that there are tools to do what I am looking for. The only question is, is there one that can work on windows as well as on Linux? If not how would I go about this on my own?
Here is a C# implementation (actually this is the code I want to port to C++) doing what I want (though windows only). To me, this appears as if the library structure is handled manually. If this is the way to go then where can I find the necessary information on the library structure?

So, on unixoids (and both Linux and WinNT have a posix subsytem), the dlopen function can be used to load a dynamic library and get function pointers to known symbols by name of that symbol.
Getting a list of symbols as far as I know was never an aspect that POSIX bothered to specify, so long story short, the functions that can do that for you on Linux are specific to the libc used there (GNU libc, mostly), and on Windows to the libc used there. Portable code means having to different codebases for two different libcs!
If you don't want to depend on your libc, you'd have to have a binary object parser (for ELF shared libraries on Linux, PE on Windows) to read out symbol names from the files. There's actually plenty of those – obviously, WINE has one for PE that is portable (especially works on Linux as well), and every linker (including glibc's runtime linker) under Linux can parse ELF files.
Personally, radare2 is a fine reverse-engineering framework with plenty of language bindings that is actually meant to analyze binary files, and give you exported symbols (as well as being capable of extracting non-exported functions, constructing call-graphs etc). It does have debugger, i.e. jumping-into-functions capabilities, too.

So, knowing now that
I am writing kind of a wrapper that delgates function calls to the respective library. Therefore I receive a path, a function name, and a list of parameters which I then want to forward. the thing is: I want to know whether the given function exists before trying to call it
things become way easier: you don't actually need to get the list of exports. It's easier and just as fast to simply try.
So, on any POSIX system (and that includes both Windows' POSIX subsystem and Linux), dlopen will open the library and load the symbol table, and dlsym will look up a symbol from that table. If that symbol is not in the table, it simply returns NULL. So, you already have all the tables you think you need; just not explicitly, but queryable.

Retrieve functions (name, return type, parameters) from unknown dll (c++)

I need to work with a .dll (c++ 64bit) which is made by one of my company's vendors, and they no longer exist. I don't have the header file (.h).
I follow this:
Get the function prototypes from an unknown .dll
I was able to get the function name, but I don't know what the function return type and parameters are.
I tried dependency walker too, same result.
This dll is not a public dll, it is customized for our company to use, so the functions inside cannot be searched online.
Are there any tools or methods that can help me retrieve this info?

I need to work with a .dll (c++ 64bit) which is made by one of my company's vendors, and they no longer exist. I don't have the header file (.h).
In a professional setting, that is a symptom to give up immediately using that library, and find and use something else. Yes, that costs money and time (so advise your manager and/or client). You might prefer using some free software library instead next time (because then you could at least continue maintaining that library, even if its vendor disappeared).
You could consider that C++ usually do name mangling and try to reverse-engineer something (but you won't be able to guess the fields inside class-es reliably or other information related to types). I don't recommend trying, since that could take a lot of time & money and you won't be able to guess everything (so you don't know if you'll guess enough information to use that stuff).
(what happens to you is a project mismanagement mistake, not a developer's mistake)
Using (even a few minutes) as a developer a C++ library without full headers and documentation for it is just crazy. Don't do that.
So there is no technical solution to your issue. Decompilation is generally impossible (and costs a big lot and usually don't work well)

Dependency walker should work for you. Here is an overview of how the functions, return types and arguments are displayed:
[dependency walker overview] https://kb.froglogic.com/download/attachments/2457645/dependency_walker_provided_information.png?version=1&modificationDate=1341338967000
Remember that you have to use a different version of dependency walker for 32bit or 64bit dll.
What is more, if you see some strange symbols in your function parameters like #,# etc..., then you are missing some dependent dll.

C++ Load DLL From a Subdirectory?

I'm new to the "hidden/dark places" of C++ and I was wondering how to load a .dll file from a different directory or a sub-directory inside the one where my current executable is running
Ex:
./MyAppDirectory
/MyApp.exe
/SomeDLL.dll
/AnotherDLL.dll
/SubDirectory
/SomeDLL2.dll
/AnotherDLL2.dll
/YetAnotherDLL.dll
/...
So "MyApp.exe" automatically loads "SomeDLL.dll" and "AnotherDLL.dll" from it's root folder "MyAppDirectory" however I also want to be able to load "SomeDLL2.dll", "AnotherDLL2.dll", "YetAnotherDLL.dll" etc. from the "SubDirectory" folder inside the "MyAppDirectory" folder.
I've been doing some searches and from what I've found the only solutions are:
1) Modify the working directory of the executable.
2) Put the DLL files inside the Windows root.
3) Modify the PATH environment variable.
But all of them have some bad sides (not worth mentioning here) and it's not what I actually need. Also another solution is through "Application Specific Paths!" which involves working with Windows registry and seems the be slightly better then the ones mentioned before.
However I need to be able to do this inside "MyApp.exe" using C++ without the need to use external methods.
I'm using MinGW 4.7.2 and my IDE is Code::Blocks 12.11 also my OS is Windows XP SP3 Pro x86.
Any reference, tutorial, documentation, example etc. is accepted and thank you for your time :D

If you are NOT explicitly loading the DLL ("manually", in your code using LoadLibrary(...)), then you HAVE to have the .dll in a place that Windows will look for DLL's, whihc pretty much means one of the three options you are talking about in your question.
When using LoadLibrary, you can specify a relative or absolute path to the DLL.
Note that it's completely different to load DLL's explicitly and implicitly - in the explicit case, you have to use the LoadLibrary, and then use GetProcAddress to find the address of the function, and you will have to use function pointers to call the functions - this is typically only used for plug-ins or similar functionality where the DLL provides a small number of functions (quite often just a factory function to create a objects to do something that has a generic interface class, and each DLL has the same type of function to create an object to do whatever it is supposed to do).
In the implicit load, you don't need to do anything in your code to use the DLL, and the functions from the DLL just appear to be there as if they were hard-linked into the application.

You should use
LoadLibrary("subFolder\\dynamicLibrary.dll");
That's the explicit link to DLLs, it's a bit tougher than the implicit linking(that I think it's what you are using), but I believe that this is the correct way to achieve your purpose
explicit
implicit

How much source information is stored in c++ executables

Some days ago I accidentally opened a C++ executable of a commercial application in Notepad++ and found out that there's quite a lot information about the original source code stored in the executable.
Inside the executable I could find file names (app.c, dlgstat.c, ...), function names (GetTickCount, DispatchMessageA, ...) and small pieces of source code, mostly conditions (szChar != TEXT('\0'), iRow < XTGetRows( hwndList )). After that I checked another QT executable and: yes again source file names and method signatures.
Because of that I am wondering how much source code information is really stored in a C/C++ executable (e.g., compiled using QT or MinGW). Is this probably some kind of debug build still containing the original source? Is this information used for some reflection stuff? Is there any reason why publishers don't remove this stuff?

How much source code information is really stored in a C/C++ executable?
In practice, not much. The source code is not required at runtime. The strings you name come from two things:
The function names (e.g. GetTickCount) are the names of functions imported from other modules. The names are required at runtime because the functions are resolved dynamically (by calling GetProcAddress with the function name).
The conditions are likely assertions: the assert macro stringizes its argument so that when it fires you know what condition was not met.
If you build a DLL, it will also contain a names of all of the functions it exports, so they can be resolved at runtime (the same is likely true for other shared object formats).
Debug symbols may also contain some of the original source code, though it depends on the format used by the debug symbols. These symbols may be contained either in the binary itself or in an auxiliary file (for example, .pdb files used on Windows).

Windows function names: they probably are there just because they are being accessed dynamically - somewhere in your program there's a GetProcAddress to get their address. Still, no reason to worry, every application uses WinAPIs, so there's not much to discover about your executable from that information.
Conditions: probably from some assert-like macro; they are included to allow assert to print what failed condition triggered the failed assertion. Anyhow, in release mode assertions should be removed automatically.
Source file names and method signatures: probably from some usage of __FILE__ and __func__ macros; probably, again, from assert.
Other sources of information about the inner structure of your program is RTTI, that has to provide some representation for every type that typeid could be working on. If you don't need its functionality, you can disable it (but I don't know if that is possible in Qt projects).

Mixed into the binary of a C++ app you will find the names of most global symbols (and debugging symbols if enabled in the compiler), but with extra 'decoration text' that encodes the calling signature of the symbol if it is a function or method. Likewise, the literals of character strings are embedded in clear text. But no where will you find anything like the actual source code that the compiler used to create the binary executable. That information is lost during the compilation process, and it is especially hard to reverse engineer if C++ templates are employed in the build.

Difference between GetModuleHandle and including header

maybe stupid question, but i dont know answer. What is difference between using GetModuleHandle or LoadLibrary to load dll(and then to use function of that dll) and to include directly desired header. For example, with using GetModuleHandle:
typedef void (WINAPI *PGNSI)(LPSYSTEM_INFO);
// Call GetNativeSystemInfo if supported or GetSystemInfo otherwise.
PGNSI pGNSI;
SYSTEM_INFO si;
ZeroMemory(&si, sizeof(SYSTEM_INFO));
pGNSI = (PGNSI) GetProcAddress(
GetModuleHandle(TEXT("kernel32.dll")),
"GetNativeSystemInfo");
if(NULL != pGNSI)
pGNSI(&si); //calling function through pointer
else GetSystemInfo(&si);
But I can include windows.h header to directly call that function from my code:
#include <windows.h>
SYSTEM_INFO si;
ZeroMemory(&si, sizeof(SYSTEM_INFO));
GetNativeSystemInfo(&si);
The same applies for example to opengl32.dll, i dont know if it is better to include header file for opengl functions in my project or to use Getmodulehandle and GetProcAdress to invoke desired functions. What is the difference? Is first method with using getmodulehandle benefit in some way? Thanks for answers.

First, make sure you understand that GetModuleHandle and LoadLibrary are not exactly equivalent. But since that's not directly part of your question, I'll leave off a big explanation and just suggest you make sure to understand the documentation in those two links.
To use a dll function directly as if it were like any other function, you don't just include the header. In addition to the header, somewhere in your project it is being told to link against a corresponding lib file. In your example, it would be kernel32.lib. This might be done through various means, such as linker settings in the project or having a #pragma comment (lib, ...) in a file.
When a program is built like that, code is written by the compiler to load that dll when the program starts. If the dll in question cannot be found when you actually try to run the program, then it will fail with an error message. You have no way to write code to catch the failure and take some alternative action.
For dlls that are part of the operating system (like kernel32.dll) or at least typically supplied with it, this immediate loading behavior isn't an issue since you can safely assume the dll will always be there. On the other hand, if you were to build against a dll which is not typically present, then you would have more of a concern. Either you would have to make sure such a dll is distributed with your program or else somehow try to make sure the user installs whatever other package is necessary to get that dll on their system.
Also, if the dll loads but any of the functions you're trying to use from that dll don't actually exist in it, then it will immediately fail with an error message. (It doesn't wait until your program tries to call the function. It finds this discrepancy when the program starts and aborts then.) This can be an issue if different versions of the dll exist out in the world.
Now, when you use LoadLibrary/GetProcAddress, you are asking for the dll to be loaded at the time of your choosing and asking to find a particular function provided by that dll. If either of those steps fails, you have the ability to write code to deal with it in some reasonable manner.
This can be used for various purposes. For instance, you could make a plugin mechanism where the program searches for and loads plugin dlls from some particular folder on the fly. Since the program doesn't know ahead of time which plugins will be present, LoadLibrary is the only way to do this.
Another thing LoadLibrary/GetProcAddress can be used for is to load a dll and call a function from it even when you don't have the proper header and lib files. If you know the name of the dll, the name of the function, and exact signature of the function (parameter types, return type, calling convention) then you have enough to write the code to load that dll and call the function successfully. Occasionally this can be useful. For instance, it's one way that people are able to use certain "undocumented" functions provided by Windows dlls.
Lastly, LoadLibrary/GetModuleHandle/GetProcAddress can be used to let you use functionality that doesn't necessarily exist on all the operating systems you want to support. This appears to be the reason for the code snippet you have that calls either GetNativeSystemInfo or GetSystemInfo. The former is only available from WinXP/2003 on up while the latter is available on Win2000. If the code had just been written as a direct call to GetNativeSystemInfo, then the program would fail to run on Windows 2000. But instead, what you have there checks to see if GetNativeSystemInfo exists on the current operation system and only uses it if so, otherwise it falls back on the more widely supported GetSystemInfo.
So in the case of your example, which technique you choose to call that function depends on which operating systems you intend to support. If your software does not need to run on Windows 2000, then just directly calling GetNativeSystemInfo is a lot easier (and most likely the preferable way to go).

Some functions don't exist in older versions of Windows and when you call a function directly, that function ends up in the import table of your program. If Windows fails to find one of the functions in the import table, the program will not run at all.
In your specific example, GetNativeSystemInfo was added in Windows XP and if you call it directly then your program will not run on Windows 2000 or older.
LoadLibrary is also useful if you want to support 3rd-party plug-ins etc.

GetModuleHandle enables you to load dlls dynamically, what can be used for instance for implementing plug-ins or loading some resources on-demand.
Underneath, there is no difference between the two methods -- static library that you link just contains code that does dynamic linking when program starts (in C).

It depends, in this case I'd say its being used so that the dll/exe won't get loading errors from the windows LDR if the system(kernel32.dll etc) doesn't have exports that the binary might use, onr good example of this is the DEP functions for windows XP, which only exist in SP 2+, but its not a good idea to force a required SP, as it can cut down on the avaiable audience for the program.
OpenGL uses the same sorta principle, due to the fact one can't predict the API support (and extensions), so one must check for it, then import it or an alternative with wglGetProcAddress
Another reason, which is more 'cheap' is so that one doesn't not have to link to certain libs and include certain headers, especially if your using say only 1 function from a very large SDK, this prevents other developers wasting time tring to get the huge SDK(they ofc will need the binary contianing the export)

LoadLibrary and GetModulehandle both will work for the same stuff i.e; they will map the module on to the process at run time but in case of LoadLibrary it increases the reference count in the kernel perspective where as the later will not do so..

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js