c++ separate files? - c++

My question is, when you compile your c++ program, why is it all put into one exe file? The file could become to large. Would you use dll libraries to shrink the size, or are there other files you can make? I just want to know how to make a program that uses separate files to run.
(EDIT) I just don't want it all in a single file. Files could become too large eventually for the computer to handle it, right? There must be a way to separate the files. Like in java, everything is in a class file, which just seems easier and more efficient. Some drives like FAT32 can't have a file bigger than 4 gigabytes, so they need a more broken down program. I looked at my game called portal, its exe is 100KB and it has about 100 dll files!

To answer your question, yes. You absolutely can split your program into separate DLL files if you'd like.
I've seen some developers compile utility functions into a separate common DLL files which can be included in other projects as references. This way its objects and methods can be be called from it.
In hindsight, compiled code is relatively small. Binary data is really what consumes the most space: videos, images, models, sounds, etc. Although it is possible and common for smaller programs to pack these resources directly into the executable, it generally isn't a good idea for many obvious reasons.
Finally, large executables aren't a huge problem with today's technology. For smaller programs, I wouldn't sweat it. It's more about the design development the larger the project gets.

Too large for what? If it was due to storage space restrictions, splitting it into multiple files wouldn't buy you anything. Unless you are somehow overflowing the maximum size for a file on a platform (like a 2GB limit on some 32-bit platforms), which seems very unlikely, you are probably worrying about a non-issue.
You can reduce the size of the generated executable by turning off debug options in the compiler, "stripping" it on various platforms, setting optimization settings to optimize for code size rather than execution speed, etc.

The way to split an exe into several files (on Windows) is, as the OP suggested, to use dlls.
The commenters are correct that this will not actually take any less space on disk, nor save any memory, but there are other reasons to split an application into multiple files. For example, to share code (a single dll can be used by several applications), or to help an application load more quickly (only load the dll when it is actually needed).

Related

Given a dll/exe (with or without .pdb), can I see what .obj files contribute to its size and how much?

I compiled a dll file with a whole bunch of cpp files. I want to see how much each cpp contributes to the final size of the dll, in order to cut down its size (say by excluding some libraries). Is there any way to do that? Thank you!
This ranges from quite difficult (which object do you charge library functions against) to impossible (when whole program optimization is used to inline across compilation unit boundaries).
I also suggest that it's not very useful. You need to know which functions to target for slimming down, not just which files.
Generating a map file during the build (pass /MAP to LINK.EXE) is probably the best you can do. The documentation also mentions something about symbol groups, which you might be able to use to your advantage as well.

Distributing DLLs Inside an EXE (C++)

How can I include my programs dependency DLLs inside the EXE file (so I only have to distribute that one file)? I am using C++ so I can't use ILMerge like I usually do for C#, but is there an easier way to automatically do this in Visual Studio?
I know this is possible (thats why installers work), I just need some help being pointed to the best way to this.
Thank you for your time.
There are many problems with this approach. For one example, see this post from REAL Software. Their “REALbasic” product used to do this and had problems including:
When writing the DLLs out at run-time, it would trigger anti-virus warnings.
Problems with machines where the user doesn’t have write permissions or is low on disk space.
Their attempt to fix the problem caused more problems, including crashes. Eventually they relented and now distribute DLLs side-by-side with apps.
If you really need a single-EXE deployment, and can’t use an installer for some reason, the reliable way is to static-link all dependencies. This assumes that you have the correct .libs (and not just .libs that link in the DLL).
There exist two options, both of which are far from ideal:
write a temporary file somewhere
load the DLL to memory "by hand", i.e. create a memory block, put DLL image to memory, then process relocations and external references.
The downside of the first approach is described above by Nate. Second approach is possible, but is complicated (requires deep knowledge of certain low-level things) and doesn't allow the DLL code to access DLL resources (this is obvious - there's no image of the DLL so the OS doesn't know where to take resources).
One more option usable in some scenarios: create a virtual disk whose contents are stored in your EXE file resources, and load the DLL from there. This is possible using our SolFS product (OS edition), but creation of the virtual disk itself requires use of kernel-mode drivers which must be written to disk before use.
Most installers use a zip file (or something similar) to hold whatever files are needed. When you run the installer, it decompresses the data and puts the individual files where needed (and typically adds registry entries, registers any COM controls it installed, etc.)

Is it bad practice to have an application which ships with lots of DLL's?

Is it better to have lots of DLL dependencies or better to static link as mich as possible?
Thanks
No, it is not bad practice to ship with lots of DLLs; it is bad practice, though, to put them in %System32%. Actually, it is usually good to use DLLs instead of statically linking; for one thing, you can easily swap out just the DLL that you need to update, rather than having to replace the entire binary, and for another, if your program eventually needs multiple executables that work together, you only pay for one copy of the DLL code (whereas, with static linking, you would end up duplicating the code that was common).
Having static link gives your app a large memory footprint, therefore having DLL's is better from that POV i.e. you only load what you need. Nowadays installations are normally done by an installer so it doesn't matter if you have lots of DLL's.
I don't think it's a bad practice. Look at Office or Adobe or any large-scale application. They end up with lots of DLLs -- because they otherwise would have to pack everything into a 100M+ exe.
Break things into DLL when you don't absolutely need them.
Generally speaking it is not a bad practice. It is better to split the code of a program into separated dynamic libraries, especially if the functions provided are used from more than one executable.
That doesn't mean that every program should have its code split in more dynamic libraries; for simple utilities, that is not probably needed.
As mentioned by others, lots of DLLs is not a bad practice. Put some thought into what to put in each one. I like to keep the DLLs as 'tiny-island-ish' as I can. If these will be distributed, I like to have a specific naming convention that reflects the product and/or company name and/or initials of some sort.
Just wanted to add another observation from other programs that use many dynamically loaded DLLs. For example, the GIMP and its plug-ins. The way you load your DLLs will affect your client's perceived application speed, if that's a factor among the other very good ones (updates, reuse, etc.). I'm sure there's some overhead for the OS to load a DLL and you might run into process limits (like open file handles). Having very many very small DLLs might not be as desired as "smaller than that" number of "larger than that"-sized DLLs.

How can I get my very large program to link?

Our next product has grown too large to link on a machine running 32-bit Windows. The sum total of all the lib files exceeds 2Gb and can only be linked on a 64-bit Windows machine. Eventually we will exceed that boundary, since our software tends to grow rather than contract and we are using a 32-bit linker (MS Visual Studio 2005): we expect to hit trouble when our lib size total exceeds 3Gb.
How can I reduce the size of the .lib files, or the .obj files without trimming code? For example, we use a lot of templates: is there any way of reducing their footprint? Is there any way of finding out what's causing the bloat from examining the .lib/.obj files? Can this be automated rather than inspected by eye? 2.5Gb is a lot of text to peer through and compare.
External constraints prevent us from shipping as anything other than a single .exe, so a DLL solution is not available.
I had once been working on a project with several MLoC. While ours would still link on a 32bit machine, link times where abysmal and became a major problem, because developers were reduced to only get a dozen edit-compile-test cycles done per workday. (Compile times were handled pretty well by doing distributed compilation.)
We switched to dynamic linking. That increased startup time, but this could be managed by delay-loading of DLLs.
First, of course, make sure you compile with the 'Optimize for Size' option.
If you do that, I wouldn't expect inlining, at least, to contribute significantly to the code size. The compiler makes a tradeoff for every inlining candidate regarding how much (if at all) it'd increase code size, compared to the performance boost it'd give. And if you're optimizing for size, the compiler won't risk bloating the code much. (Note that inlining very small functions can actually decrease code size)
Second, have you considered unity builds? That'd pretty much eliminate the linker entirely, and with only one translation unit, there'd be much less duplicate work and hopefully, a smaller memory footprint.
Finally, I know Visual Studio (or possibly the Windows SDK) has a 64-bit compiler (that is, a compiler that is itself a 64-bit application, not just a compiler producing 64-bit code). Consider using that. (I don't know if there is also a 64-bit linker)
I don't know i the linker is built with the LARGEADDRESSAWARE flag set. If so, running it on a 64-bit machine will let the process consume a full 4GB of memory instead of the 2 GB it normally gets. (if necessary, you can add the flag yourself by modifying the PE header)
Perhaps limiting the linkage of various symbols could help as well. If you know that a symbol won't be needed outside of the current translation unit, put it in an anonymous namespace. That might allow the compiler to trim down unused symbols before passing everything on to the linker
Try using the Symbol Sort program to show you where the main bits of bloat are in your code. Also just looking at the size of the raw .obj files will give you a reasonable idea of where to target.
OMFG!!!!! That's huuuuuge!
Apart from the fact I think it's too big to be rational... can't you use dynamic linking to avoid linking all the mess in compile time and only link in runtime what's necesary (I mean, loading dlls in demand)?
Does it need to be one big app?
One option is to split various modules into DLLs and load/unload them as needed.
Alternatively, you might be able to split into several apps and share data using mapped memory, pipes a DBMS or even simple data files.
First of all, find out how to measure the size which is used by various features. Don't go ahead and try to play replace template usage or other things because you suspect that it makes a significant difference.
Run
dumpbin /HEADERS <somebinary>
to find out which sections in your binary are causing the huge size. Do you have a huge Debug Directory section? Strip symbols then. Is the Import Address Table large? Check the table and locate symbols which you don't need (a problem with templates is that symbols of template instantiations tend to be very very large). Similiar analysis can be done for the Exception Directory, COM Descriptor Directory etc..
I do not think there is any single tool that can give you statistics that you want/need. Using either .map files or the dumpbin utility with /SYMBOLS parameter plus some post-processing of the created log might help you get what you want.
If the statistics confirm your suspicion of template bloat, or even without the confirmation, it might be a good idea to do several things with the source:
Try using explicit instantiations and move template definitions into .cpp files. Of course this works only if you have limited and well known set of types/values that you use as arguments to the templates.
Add more abstraction and/or indirection. Factor code that does not depend on your template parameters into their own base classes or free functions. If you have several template type parameters, see if you cannot split the single class template into several base classes without overlapping template parameters. (See http://www2.research.att.com/~bs/SCARY.pdf.)
Try using the pimpl idiom; avoid instantiating templates in headers if you can, instantiate them only in .cpp files.
Templates are nice but sometimes ordinary classes work as well; e.g. avoid passing integer constants as non-type template parameters if you can pass them as parameter to ctor.
#hatcat and #jalf: There is indeed a full set of 64bit tools. For example, you can set an environment variable:
set PreferredToolArchitecture=x64
and then run Visual Studio (from de developer console).

Profiling DLL/LIB Bloat

I've inherited a fairly large C++ project in VS2005 which compiles to a DLL of about 5MB. I'd like to cut down the size of the library so it loads faster over the network for clients who use it from a slow network share.
I know how to do this by analyzing the code, includes, and project settings, but I'm wondering if there are any tools available which could make it easier to pinpoint what parts of the code are consuming the most space. Is there any way to generate a "profile" of the DLL layout? A report of what is consuming space in the library image and how much?
When you build your DLL, you can pass /MAP to the linker to have it generate a map file containing the addresses of all symbols in the resulting image. You will probably have to do some scripting to calculate the size of each symbol.
Using a "strings" utility to scan your DLL might reveal unexpected or unused printable strings (e.g. resources, RCS IDs, __FILE__ macros, debugging messages, assertions, etc.).
Also, if you're not already compiling with /Os enabled, it's worth a try.
If your end goal is only to trim the size of the DLL, then after tweaking compiler settings, you'll probably get the quickest results by running your DLL through UPX. UPX is an excellent compression utility for DLLs and EXEs; it's also open-source with a non-viral license, so it's okay to use in commercial/closed-source products.
I've only had it turn up a virus warning on the highest compression setting (the brute-force option), so you'll probably be fine if you use a lower setting than that.
While i don't know about any binary size profilers, you could alternatively look for what object files (.obj) are the biggest - that gives you at least an idea of where your problematic spots are.
Of course this requires a sufficiently modularized project.
You can also try to link statically instead of using a dll. Indeed, when the library is linked statically the linker removes all unused functions from the final exe. Sometime the final exe is only slightly bigger and you don't have any more dll.
If your DLL is this big because it's exporting C++ function with exceptionally long mangled names, an alternative is to use a .DEF file to export the functions by ordinal, without name (using NONAME in the .DEF file). Somewhat brittle, but it reduces the DLL size, EXE size and load times.
See e.g. http://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm
Given that all your .obj files are about the same size, assuming that you're using precompiled headers, try creating an empty obj file and see how large it is. That will give you an idea of the proportion of each .obj that's due to the PCH compilation. The linker will be able to remove all the duplicates there, incidentally. Alternatively you could try disabling PCH so that the obj files will give you a better indication of where the main culprits are.
All good suggestions. What I do is get the map file and then just eyeball it. The kind of thing I've found in the past is that a large part of the space is taken by one or more class libraries brought in by the fact that some variable somewhere was declared as having a type that sounded like it would save some coding effort but wasn't really necessary.
Like in MFC (remember that?) they have a wrapper class to go around every thing like controls, fonts, etc. that Win32 provides. Those take a ton of space and you don't always need them.
Another thing that can take a ton of space is collection classes you could manage without. Another is cout I/O routines you don't use.
i would recommend one of the following:
coverage - you can run a coverage tool in the hope of detecting some dead code
caching - cache the dll on the client side on the initial activatio
splitting - split the dll into several smaller dlls, start the application with the bootstrap dll and download the other dlls after the application starts
compilation and linking - use smaller run time library, compile with size optimization, etc. see this link for more suggestions.
compression - if you have data or large resources within the dll, you can compress them and decompress only after the download or at runtime.