Distributing DLLs Inside an EXE (C++)

Distributing DLLs Inside an EXE (C++) - c++

How can I include my programs dependency DLLs inside the EXE file (so I only have to distribute that one file)? I am using C++ so I can't use ILMerge like I usually do for C#, but is there an easier way to automatically do this in Visual Studio?
I know this is possible (thats why installers work), I just need some help being pointed to the best way to this.
Thank you for your time.

There are many problems with this approach. For one example, see this post from REAL Software. Their “REALbasic” product used to do this and had problems including:
When writing the DLLs out at run-time, it would trigger anti-virus warnings.
Problems with machines where the user doesn’t have write permissions or is low on disk space.
Their attempt to fix the problem caused more problems, including crashes. Eventually they relented and now distribute DLLs side-by-side with apps.
If you really need a single-EXE deployment, and can’t use an installer for some reason, the reliable way is to static-link all dependencies. This assumes that you have the correct .libs (and not just .libs that link in the DLL).

There exist two options, both of which are far from ideal:
write a temporary file somewhere
load the DLL to memory "by hand", i.e. create a memory block, put DLL image to memory, then process relocations and external references.
The downside of the first approach is described above by Nate. Second approach is possible, but is complicated (requires deep knowledge of certain low-level things) and doesn't allow the DLL code to access DLL resources (this is obvious - there's no image of the DLL so the OS doesn't know where to take resources).
One more option usable in some scenarios: create a virtual disk whose contents are stored in your EXE file resources, and load the DLL from there. This is possible using our SolFS product (OS edition), but creation of the virtual disk itself requires use of kernel-mode drivers which must be written to disk before use.

Most installers use a zip file (or something similar) to hold whatever files are needed. When you run the installer, it decompresses the data and puts the individual files where needed (and typically adds registry entries, registers any COM controls it installed, etc.)

Related

c++ separate files?

My question is, when you compile your c++ program, why is it all put into one exe file? The file could become to large. Would you use dll libraries to shrink the size, or are there other files you can make? I just want to know how to make a program that uses separate files to run.
(EDIT) I just don't want it all in a single file. Files could become too large eventually for the computer to handle it, right? There must be a way to separate the files. Like in java, everything is in a class file, which just seems easier and more efficient. Some drives like FAT32 can't have a file bigger than 4 gigabytes, so they need a more broken down program. I looked at my game called portal, its exe is 100KB and it has about 100 dll files!

To answer your question, yes. You absolutely can split your program into separate DLL files if you'd like.
I've seen some developers compile utility functions into a separate common DLL files which can be included in other projects as references. This way its objects and methods can be be called from it.
In hindsight, compiled code is relatively small. Binary data is really what consumes the most space: videos, images, models, sounds, etc. Although it is possible and common for smaller programs to pack these resources directly into the executable, it generally isn't a good idea for many obvious reasons.
Finally, large executables aren't a huge problem with today's technology. For smaller programs, I wouldn't sweat it. It's more about the design development the larger the project gets.

Too large for what? If it was due to storage space restrictions, splitting it into multiple files wouldn't buy you anything. Unless you are somehow overflowing the maximum size for a file on a platform (like a 2GB limit on some 32-bit platforms), which seems very unlikely, you are probably worrying about a non-issue.
You can reduce the size of the generated executable by turning off debug options in the compiler, "stripping" it on various platforms, setting optimization settings to optimize for code size rather than execution speed, etc.

The way to split an exe into several files (on Windows) is, as the OP suggested, to use dlls.
The commenters are correct that this will not actually take any less space on disk, nor save any memory, but there are other reasons to split an application into multiple files. For example, to share code (a single dll can be used by several applications), or to help an application load more quickly (only load the dll when it is actually needed).

Virtual Files for dynamic linking

my problem is pretty complicated and potentially impossible but here we go:
Using C++,
I'm currently working on an universal server engine for a game project of mine. Universal, because every part of the engine will be loaded dynamically after startup. Now, also game objects will inherit from a base object and have overloaded "Simulate" functions. In that way, every object would have it's specific behavior and I can do something I call "C++ Scripting" which is alot faster than interpreted lua script files. Also it's more dynamic.
(Please no solutions which would kill the c++ "scripting" part, like "forget the dynamic linking, that's insane". This performance boost is totally necessary, since I'm working with large voxel maps)
My Problem:
That are indeed alot of .dll/.so files and I wanted to pack those into a simple archive so I can use zlib on said source code and maybe pack everything together with textures and sounds in little "object packages".
Now the Windows DLL API and the Linux SO API won't allow me to load a dll/so file from a memory address, which is a shame.(Am I right there, or can I bypass that? :) ) I don't want to unzip and temp save those files on the filesystem because there are hundreds to thousands of them and that would increase the loading time alot.
Also I'm not interested in more external dependencies like boost.
So here are my Questions:
Is there a cross platform-method to create virtual files IN memory with a real path?
That way I could bypass the slow IO speeds of HDDs.
Or is it really not such a big deal to use temp files, because the file buffers of modern operating systems are fast/intelligent enough to NOT write all those files to disc?
(Actually Linux supports virtual file systems, but windows does not...)
I hope you guys can help me there :)

Not with winapi, that's for sure, but you can do it manually. You can load it into the memory, fill it's import table and call exported functions (after you called DllMain). I saw a program, where someone actually created a new process with that method ... See the PE documentation for details, but it works.
Also it's relatively easy to do, since you only need to find the PE import tables, and do what the dynamic linker does, fill it with jumps and addresses. Dlls contains position independent code, so no relocation needed.
It sould be the same on linux (only using the elf structure), but if you have a better solution with virtual file systems, you should use that.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?

People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.

One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.

We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.

I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.

Profiling DLL/LIB Bloat

I've inherited a fairly large C++ project in VS2005 which compiles to a DLL of about 5MB. I'd like to cut down the size of the library so it loads faster over the network for clients who use it from a slow network share.
I know how to do this by analyzing the code, includes, and project settings, but I'm wondering if there are any tools available which could make it easier to pinpoint what parts of the code are consuming the most space. Is there any way to generate a "profile" of the DLL layout? A report of what is consuming space in the library image and how much?

When you build your DLL, you can pass /MAP to the linker to have it generate a map file containing the addresses of all symbols in the resulting image. You will probably have to do some scripting to calculate the size of each symbol.
Using a "strings" utility to scan your DLL might reveal unexpected or unused printable strings (e.g. resources, RCS IDs, __FILE__ macros, debugging messages, assertions, etc.).
Also, if you're not already compiling with /Os enabled, it's worth a try.

If your end goal is only to trim the size of the DLL, then after tweaking compiler settings, you'll probably get the quickest results by running your DLL through UPX. UPX is an excellent compression utility for DLLs and EXEs; it's also open-source with a non-viral license, so it's okay to use in commercial/closed-source products.
I've only had it turn up a virus warning on the highest compression setting (the brute-force option), so you'll probably be fine if you use a lower setting than that.

While i don't know about any binary size profilers, you could alternatively look for what object files (.obj) are the biggest - that gives you at least an idea of where your problematic spots are.
Of course this requires a sufficiently modularized project.

You can also try to link statically instead of using a dll. Indeed, when the library is linked statically the linker removes all unused functions from the final exe. Sometime the final exe is only slightly bigger and you don't have any more dll.

If your DLL is this big because it's exporting C++ function with exceptionally long mangled names, an alternative is to use a .DEF file to export the functions by ordinal, without name (using NONAME in the .DEF file). Somewhat brittle, but it reduces the DLL size, EXE size and load times.
See e.g. http://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm

Given that all your .obj files are about the same size, assuming that you're using precompiled headers, try creating an empty obj file and see how large it is. That will give you an idea of the proportion of each .obj that's due to the PCH compilation. The linker will be able to remove all the duplicates there, incidentally. Alternatively you could try disabling PCH so that the obj files will give you a better indication of where the main culprits are.

All good suggestions. What I do is get the map file and then just eyeball it. The kind of thing I've found in the past is that a large part of the space is taken by one or more class libraries brought in by the fact that some variable somewhere was declared as having a type that sounded like it would save some coding effort but wasn't really necessary.
Like in MFC (remember that?) they have a wrapper class to go around every thing like controls, fonts, etc. that Win32 provides. Those take a ton of space and you don't always need them.
Another thing that can take a ton of space is collection classes you could manage without. Another is cout I/O routines you don't use.

i would recommend one of the following:
coverage - you can run a coverage tool in the hope of detecting some dead code
caching - cache the dll on the client side on the initial activatio
splitting - split the dll into several smaller dlls, start the application with the bootstrap dll and download the other dlls after the application starts
compilation and linking - use smaller run time library, compile with size optimization, etc. see this link for more suggestions.
compression - if you have data or large resources within the dll, you can compress them and decompress only after the download or at runtime.

Loading DLL from a location in memory

As the question says, I want to load a DLL from a location in memory instead of a file, similarly to LoadLibrary(Ex). I'm no expert in WinAPI, so googled a little and found this article together with MemoryModule library that pretty much meets my needs.
On the other hand the info there is quite old and the library hasn't been updated for a while too. So I wanted to know if there are different, newer and better ways to do it. Also if somebody has used the library mentioned in the article, could they provide insight on what I might be facing when using it?
Just for the curious ones, I'm exploring the concept of encrypting some plug-ins for applications without storing the decrypted version on disk.

Implementing your own DLL loader can get really hairy really fast. Reading this article it's easy to miss what kind of crazy edge cases you can get yourself into. I strongly recommend against it.
Just for a taste - consider you can't use any conventional debugging tools for the code in the DLL you're loading since the code you're executing is not listed in the region of any DLL known by the OS.
Another serious issue is dealing with DEP in windows.

Well, you can create a RAM Drive according to these instructions, then copy the DLL you can in memory to a file there and the use LoadLibrary().
Of course this is not very practical if you plan to deploy this as some kind of product because people are going to notice a driver being installed, a reboot after the installation and a new drive letter under My Computer. Also, this does nothing to actually hide the DLL since its just sitting there in the RAM Drive for everybody to watch.
Another thing I'm interested about is Why you actually want to do this? Perhaps your end result can be achieved by some other means other than Loading the DLL from memory. For instance when using a binary packer such as UPX, the DLL that you have on disk is different from the one that is eventually executed. Immediately after the DLL is loaded normally with LoadLibrary, The unpacker kicks in and rewrites the memory which the DLL is loaded to with the uncompressed binary (the DLL header makes sure that there is enough space allocated)

Similar question was raised in here:
Load native C++ .dll from RAM in debugger friendly manner
One of the answers proposes dllloader sample application shown in github:
https://github.com/tapika/dllloader
It supports .dll debugging out of box.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js