How to figure out which methods increases size of 'exe' - c++

I'm trying to write my first 'demoscene' application in MS Visual Studio Express 2010. Suddenly I realized, that my binary expanded from 16kb to ~100kb in fully-optimized-for-size release version. My target size is 64k. Is there any way to somehow "browse" binary to figure out, which methods consumes a lot of space, and which I should rewrite? I really want to know what my binary consists of.
From what I found in web, VS2010 is not the best compiler for demoscenes, but I still want to understand what's happening inside my .exe file.

I think you should have MSVC generate a map file for you. This is a file that will tell you the addresses of most of the different functions in your executable. The difference between consecutive addresses should tell you how much space the function takes. To generate a map file, add the /MAP linker option. For more info, see:
http://msdn.microsoft.com/en-us/library/k7xkk3e2(v=VS.100).aspx

You can strip off lots of unnecessary stuff from the executable and compress it with utilities such as mew.

I've found this useful for examining executable sizes (although not for demoscene type things): http://aras-p.info/projSizer.html
I will say this: if you are using the standard library at all then stop immediately. It is a huge code bloater. For example, each unique usage std::sort adds around 5KB and there's similar numbers for many of the standard containers (of course, it depends what functions you use, but in general they add lots of code).
Also, I'm not into the demo scene, but I believe people use Crinkler to compress their executables.

Use your version contol system to see what caused the increase. Going forward, Id log the built exe size during the nightly builds. And dont forget you can optimize for minimal size with the compiler settings.

Related

What's the best way to make DLL size as small as possible?

I am using LoadLibraryA to load my DLL's into my project. I've just started to notice their sizes are starting to get large as I keep adding more functions, etc. Are there any options in my project settings that can help reduce the size of my DLL's?
As every other person mentioned, you can use compiler options to reduce your size. At first, try to tweak these options for better result. These options normally affect size of your code.
But if you have a lot of resources in your EXE/DLL, you will not see much difference. If you really need a small size in this case, I suggest you to use a PE-packer. A very good free PE-packer is UPX.
UPX is an advanced executable file compressor. UPX will typically
reduce the file size of programs and DLLs by around 50%-70%, thus
reducing disk space, network load times, download times and
other distribution and storage costs.
You need to run upx as a post build process to pack your EXE/DLL file with a command like this:
upx --best mydll.dll
PE-packers compress your code and resources and encapsulate them in another EXE/DLL file. Then these files will be unpacked at runtime automatically, so you can use them like a normal EXE/DLL file. Even though PE-packers compress codes too, they are super effective when you have a lot of resources in your EXE/DLL.
The size of DLLs depends mostly on what code is reachable from the exported functions. During the link phase, everything not reachable from any exported function is dropped, but you still end up storing everything you didn't actually use from the outside.
This behaves different from a static library, which also includes everything at first, but where the linker is deferred until consuming the library, so you never end up with dead code not reachable in the linked output.
LTGC + function level sections + inline everything + code folding + string folding + /O2 is likely going to cut away a lot of the file size - at the cost of any chance of proper debugging though. (With aggressive inlining and folding of redundant stuff, call stacks stop being resolvable correctly.)
Forget about what you might have been reading about /O1 being smaller. It is only initially (pre-linking), as it mostly prevents inlining and loop unrolling, but with advances in compiler technologies this is nowadays a hindrance for exhaustive compile-time constant expression evaluation which often saves not only computational cost at runtime (and we are often talking factor 2-10x!), but also often enough also compiles into something more compact.

Given a dll/exe (with or without .pdb), can I see what .obj files contribute to its size and how much?

I compiled a dll file with a whole bunch of cpp files. I want to see how much each cpp contributes to the final size of the dll, in order to cut down its size (say by excluding some libraries). Is there any way to do that? Thank you!
This ranges from quite difficult (which object do you charge library functions against) to impossible (when whole program optimization is used to inline across compilation unit boundaries).
I also suggest that it's not very useful. You need to know which functions to target for slimming down, not just which files.
Generating a map file during the build (pass /MAP to LINK.EXE) is probably the best you can do. The documentation also mentions something about symbol groups, which you might be able to use to your advantage as well.

How can I get my very large program to link?

Our next product has grown too large to link on a machine running 32-bit Windows. The sum total of all the lib files exceeds 2Gb and can only be linked on a 64-bit Windows machine. Eventually we will exceed that boundary, since our software tends to grow rather than contract and we are using a 32-bit linker (MS Visual Studio 2005): we expect to hit trouble when our lib size total exceeds 3Gb.
How can I reduce the size of the .lib files, or the .obj files without trimming code? For example, we use a lot of templates: is there any way of reducing their footprint? Is there any way of finding out what's causing the bloat from examining the .lib/.obj files? Can this be automated rather than inspected by eye? 2.5Gb is a lot of text to peer through and compare.
External constraints prevent us from shipping as anything other than a single .exe, so a DLL solution is not available.
I had once been working on a project with several MLoC. While ours would still link on a 32bit machine, link times where abysmal and became a major problem, because developers were reduced to only get a dozen edit-compile-test cycles done per workday. (Compile times were handled pretty well by doing distributed compilation.)
We switched to dynamic linking. That increased startup time, but this could be managed by delay-loading of DLLs.
First, of course, make sure you compile with the 'Optimize for Size' option.
If you do that, I wouldn't expect inlining, at least, to contribute significantly to the code size. The compiler makes a tradeoff for every inlining candidate regarding how much (if at all) it'd increase code size, compared to the performance boost it'd give. And if you're optimizing for size, the compiler won't risk bloating the code much. (Note that inlining very small functions can actually decrease code size)
Second, have you considered unity builds? That'd pretty much eliminate the linker entirely, and with only one translation unit, there'd be much less duplicate work and hopefully, a smaller memory footprint.
Finally, I know Visual Studio (or possibly the Windows SDK) has a 64-bit compiler (that is, a compiler that is itself a 64-bit application, not just a compiler producing 64-bit code). Consider using that. (I don't know if there is also a 64-bit linker)
I don't know i the linker is built with the LARGEADDRESSAWARE flag set. If so, running it on a 64-bit machine will let the process consume a full 4GB of memory instead of the 2 GB it normally gets. (if necessary, you can add the flag yourself by modifying the PE header)
Perhaps limiting the linkage of various symbols could help as well. If you know that a symbol won't be needed outside of the current translation unit, put it in an anonymous namespace. That might allow the compiler to trim down unused symbols before passing everything on to the linker
Try using the Symbol Sort program to show you where the main bits of bloat are in your code. Also just looking at the size of the raw .obj files will give you a reasonable idea of where to target.
OMFG!!!!! That's huuuuuge!
Apart from the fact I think it's too big to be rational... can't you use dynamic linking to avoid linking all the mess in compile time and only link in runtime what's necesary (I mean, loading dlls in demand)?
Does it need to be one big app?
One option is to split various modules into DLLs and load/unload them as needed.
Alternatively, you might be able to split into several apps and share data using mapped memory, pipes a DBMS or even simple data files.
First of all, find out how to measure the size which is used by various features. Don't go ahead and try to play replace template usage or other things because you suspect that it makes a significant difference.
Run
dumpbin /HEADERS <somebinary>
to find out which sections in your binary are causing the huge size. Do you have a huge Debug Directory section? Strip symbols then. Is the Import Address Table large? Check the table and locate symbols which you don't need (a problem with templates is that symbols of template instantiations tend to be very very large). Similiar analysis can be done for the Exception Directory, COM Descriptor Directory etc..
I do not think there is any single tool that can give you statistics that you want/need. Using either .map files or the dumpbin utility with /SYMBOLS parameter plus some post-processing of the created log might help you get what you want.
If the statistics confirm your suspicion of template bloat, or even without the confirmation, it might be a good idea to do several things with the source:
Try using explicit instantiations and move template definitions into .cpp files. Of course this works only if you have limited and well known set of types/values that you use as arguments to the templates.
Add more abstraction and/or indirection. Factor code that does not depend on your template parameters into their own base classes or free functions. If you have several template type parameters, see if you cannot split the single class template into several base classes without overlapping template parameters. (See http://www2.research.att.com/~bs/SCARY.pdf.)
Try using the pimpl idiom; avoid instantiating templates in headers if you can, instantiate them only in .cpp files.
Templates are nice but sometimes ordinary classes work as well; e.g. avoid passing integer constants as non-type template parameters if you can pass them as parameter to ctor.
#hatcat and #jalf: There is indeed a full set of 64bit tools. For example, you can set an environment variable:
set PreferredToolArchitecture=x64
and then run Visual Studio (from de developer console).

Profiling DLL/LIB Bloat

I've inherited a fairly large C++ project in VS2005 which compiles to a DLL of about 5MB. I'd like to cut down the size of the library so it loads faster over the network for clients who use it from a slow network share.
I know how to do this by analyzing the code, includes, and project settings, but I'm wondering if there are any tools available which could make it easier to pinpoint what parts of the code are consuming the most space. Is there any way to generate a "profile" of the DLL layout? A report of what is consuming space in the library image and how much?
When you build your DLL, you can pass /MAP to the linker to have it generate a map file containing the addresses of all symbols in the resulting image. You will probably have to do some scripting to calculate the size of each symbol.
Using a "strings" utility to scan your DLL might reveal unexpected or unused printable strings (e.g. resources, RCS IDs, __FILE__ macros, debugging messages, assertions, etc.).
Also, if you're not already compiling with /Os enabled, it's worth a try.
If your end goal is only to trim the size of the DLL, then after tweaking compiler settings, you'll probably get the quickest results by running your DLL through UPX. UPX is an excellent compression utility for DLLs and EXEs; it's also open-source with a non-viral license, so it's okay to use in commercial/closed-source products.
I've only had it turn up a virus warning on the highest compression setting (the brute-force option), so you'll probably be fine if you use a lower setting than that.
While i don't know about any binary size profilers, you could alternatively look for what object files (.obj) are the biggest - that gives you at least an idea of where your problematic spots are.
Of course this requires a sufficiently modularized project.
You can also try to link statically instead of using a dll. Indeed, when the library is linked statically the linker removes all unused functions from the final exe. Sometime the final exe is only slightly bigger and you don't have any more dll.
If your DLL is this big because it's exporting C++ function with exceptionally long mangled names, an alternative is to use a .DEF file to export the functions by ordinal, without name (using NONAME in the .DEF file). Somewhat brittle, but it reduces the DLL size, EXE size and load times.
See e.g. http://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm
Given that all your .obj files are about the same size, assuming that you're using precompiled headers, try creating an empty obj file and see how large it is. That will give you an idea of the proportion of each .obj that's due to the PCH compilation. The linker will be able to remove all the duplicates there, incidentally. Alternatively you could try disabling PCH so that the obj files will give you a better indication of where the main culprits are.
All good suggestions. What I do is get the map file and then just eyeball it. The kind of thing I've found in the past is that a large part of the space is taken by one or more class libraries brought in by the fact that some variable somewhere was declared as having a type that sounded like it would save some coding effort but wasn't really necessary.
Like in MFC (remember that?) they have a wrapper class to go around every thing like controls, fonts, etc. that Win32 provides. Those take a ton of space and you don't always need them.
Another thing that can take a ton of space is collection classes you could manage without. Another is cout I/O routines you don't use.
i would recommend one of the following:
coverage - you can run a coverage tool in the hope of detecting some dead code
caching - cache the dll on the client side on the initial activatio
splitting - split the dll into several smaller dlls, start the application with the bootstrap dll and download the other dlls after the application starts
compilation and linking - use smaller run time library, compile with size optimization, etc. see this link for more suggestions.
compression - if you have data or large resources within the dll, you can compress them and decompress only after the download or at runtime.

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?
This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.
The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.
Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.
No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.
Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.