How to compress C++ code? - c++

I need to compress some C++ code. The exe must be as compressed as possible. It's for Zero Robotics and the codesize usage now is 139% and I need to reduce this. Are there tools to compress the code?

Since all you can do is edit your source code, the only way to reduce the size of your executable is to find ways to consolidate stuff.
Things to look out for:
Find dead code and resources. Delete all functions/methods/variable that are not used.
Find duplicate code and data. For example, if you have a function/method that is copy&pasted into several files, refactor your code so that you only need one version of it.
Maybe try to reduce the amount of string constants and other resources you're using.
If you're using any 3rd party code/libraries, try if you can do without it or if there's a more lightweight alternative.
There is no automated way to do this I'm aware of. You really have to look through your source yourself and clean it up by hand.

Supposing you want to reduce the executable file size, you can check your compiler options to reduce the obj size. If you are using GCC, check the manual for the options -s and -Os.

A few years ago, i used the upx packer for executables, which compresses an exe-file.
< [1]: http://upx.sourceforge.net/>
Maybe, this is what you are looking for.

Related

What modifications will lead to size reduction of binary size in C++ code

I have been working on very big code base of C++ project. There are about 2k files and comprising about 200k lines of code.
The code includes heavy usage of templates
There is lot of inlining in code
Currently using clang++ with O2 option for compiling
The final executable size is about 50 MB
For some reason, I want to reduce the binary size still further.
Steps already taken
1> Replace templates with non-template code where possible
2> Replace XML library from xerces to expat
Any suggestions in this regard are welcome
The following methods are commonly used to reduce the size of programs,
Use your compiler specific Techniques to reduce the size.
Compile using gcc -S program.c to get the Assembler file. You can now perform assembler based space optimizations.
Reduce the number of global variables in C.
Instead of complex algorithms which gives you very small changes in the execution time, use simple algorithms. For example use bubble sort instead of Merge sort if the number of elements in the list is not very large.
Remove simple functions which are used just once or twice.
Eliminate dead code. Many often in large projects there are some.
Be careful about the library functions you include in your program.
Run strip on the finished executable to remove debugging information (n.b. you can keep the unstripped file too in case you later need the debugging info).
Make sure you link system-provided libraries dynamically, not statically.
Move nontrivial functions from header files to .cpp files (can even apply to some template functions, if they're only used in the same .cpp file as an implementation detail for example).
Hunt down and eliminate dead code. Many projects have quite a bit of this. Consider using a code coverage analyzer to help you find candidates for removal. Hopefully you have some tests to help.
Consider compressing the actual binary. How big is it if you run it through gzip or bzip2?
UPX is a good tool for reducing the size of executables. It supports many platforms and executable formats.
The executable is decompressed on startup, and the code for doing so is included in the executable itself. The performance loss is minimal.

How to figure out which methods increases size of 'exe'

I'm trying to write my first 'demoscene' application in MS Visual Studio Express 2010. Suddenly I realized, that my binary expanded from 16kb to ~100kb in fully-optimized-for-size release version. My target size is 64k. Is there any way to somehow "browse" binary to figure out, which methods consumes a lot of space, and which I should rewrite? I really want to know what my binary consists of.
From what I found in web, VS2010 is not the best compiler for demoscenes, but I still want to understand what's happening inside my .exe file.
I think you should have MSVC generate a map file for you. This is a file that will tell you the addresses of most of the different functions in your executable. The difference between consecutive addresses should tell you how much space the function takes. To generate a map file, add the /MAP linker option. For more info, see:
http://msdn.microsoft.com/en-us/library/k7xkk3e2(v=VS.100).aspx
You can strip off lots of unnecessary stuff from the executable and compress it with utilities such as mew.
I've found this useful for examining executable sizes (although not for demoscene type things): http://aras-p.info/projSizer.html
I will say this: if you are using the standard library at all then stop immediately. It is a huge code bloater. For example, each unique usage std::sort adds around 5KB and there's similar numbers for many of the standard containers (of course, it depends what functions you use, but in general they add lots of code).
Also, I'm not into the demo scene, but I believe people use Crinkler to compress their executables.
Use your version contol system to see what caused the increase. Going forward, Id log the built exe size during the nightly builds. And dont forget you can optimize for minimal size with the compiler settings.

GNU tool to analyze and reduce compile time for my application

I am using SUSE10 (64 bit)/AIX (5.1) and HP I64 (11.3) to compile my application. Just to give some background, my application has around 200KLOC (2Lacs) lines of code (without templates). It is purely C++ code. From measurements, I see that compile time ranges from 45 minutes(SUSE) to around 75 minutes(AIX).
Question 1 : Is this time normal (acceptable)?
Question 2 : I want to re-engineer the code arrangement and reduce the compile time. Is there any GNU tool which can help me to do this?
PS :
a. Most of the question in stackoverflow was related to Visual Studio, so I had to post a separate question.
b. I use gcc 4.1.2 version.
c. Another info (which might be useful) is code is spread across around 130 .cpp files but code distribution varies from 1KLOC to 8 KLOCK in a file.
Thanks in advance for you help!!!
Edit 1 (after comments)
#PaulR "Are you using makefiles for this ? Do you always do a full (clean) build or just build incrementally ?"
Yes we are using make files for project building.
Sometimes we are forced to do the full build (like over-night build/run or automated run or refresh complete code since many members have changed many files). So I have posted in general sense.
Excessive (or seemingly excessive) compilation times are often caused by an overly complicated include file hierarchy.
While not exactly a tool for this purpose, doxygen could be quite helpful: among other charts it can display the include file hierarchy for every source file in the project. I have found many interesting and convoluted include dependencies in my projects.
Read John Lakos's Large-Scale C++ Design for some very good methods of analysing and re-organising the structure of the project in order to minimise dependencies. Ultimately the time taken to build a large project increases as the amount of code increases, but also as the dependencies increase (or at least the impact of changes to header files increases as the dependencies increase). So minimising those dependencies is one thing to aim for. Lakos's concept of Levelization is very helpful in working out how to split several large monolothic inter-dependent libraries into something with a much better structure.
I can't address your specific questions but I use ccache to help with compile times, which caches object files and will use the same ones if source files do not change. If you are using SuSE, it should come with your distribution.
In addition to the already mentioned ccache, have a look at distcc. Throwing more hardware at such a scalable problem is cheap and simple.
Long compile times in large C++ projects are almost always caused by inappropriate use of header files. Section 9.3.2 of The C++ Programming Language provide some useful points this. Precompiling header files can considerably reduce compile time of large projects. See the GNU documentation on Precompiled Headers for more information.
Make sure that your main make targets can be executed in parallel (make -j <CPU_COUNT+1>) and of course try to use ccache. In addition we experimented with ccache and RAM disks, if you export CCACHE_DIR and point it to a RAM disk this will speed up your compilation process as well.

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?
This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.
The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.
Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.
No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.
Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.

Calculate SLOC GCC C/C++ Linux

We have a quite large (280 binaries) software project under Linux and currently it has a very dispersed code structure - that means one can't [work out] what code from the source tree is valid (builds to deployable binaries) and what is deprecated. But the Makefiles are good. We need to calculate C/C++ SLOC for entire project.
Here's a question - can I find out SLOC GCC has compiled? Or maybe I can gain this information from binary (debug info probably)? Or maybe I can find out what source files was the binary compiled from and use this info to calculate SLOC?
Thanks
Bogdan
It depends on what you mean by SLOC that GCC has compiled. If you mean, track the source files from your project that GCC used, then you'd probably use the dependency tracking options which lists source files and headers. That's -M and various related options. Beware of including system-provided headers. A technique I sometimes use is to replace the standard C compiler with an appropriate variation - for example, to ensure a 64-bit compilation, I use 'CC="gcc -m64"' to guarantee the when the C compiler is used, it will compile in 64-bit mode. Obviously, with a list of files, you can use wc to calculate the number of lines. You use 'sort -u' to eliminate duplicated headers.
One obvious gotcha is if you find that everything is included with relative path names - then you have to work out more carefully where each file is.
If you have some other definition of SLOC, then you will need to specify what you have in mind. Sometimes, people are looking for non-blank, non-comment SLOC, for example - but you still need the list of source files, which I think the -M options will help you determine.
The first thing you want is an accurate list of what you actually compiled. You can achieve this by using a wrapper script instead of gcc.
The second list you want is the list of files that were used for this. For this, consult the dependency list (as you said that was correct). (Seems you'd need make --print-data-base)
Then, sort and deduplicate the list of files, and throw out system headers. For each remaining file, determine the SLOC count using your prefered tool.
What you can do is do a pre-processor only compilation, using gcc's -E flag: this will result in output that is the actual code being compiled. Do a simple line count (wc -l) or something more advanced.
It might include extra code from macro's, etc. but especially if you compare it with a previous instance of your code it is a good comparison.
Here you can find a free (GPL) tool called sloccount dedicated to estimate SLOC in projects of any size:
http://www.dwheeler.com/sloccount/
I've used the following approach to get dirty metric value in 2 hours. Even though the preciseness was far from ideal it was enough to make the decision.
We took around 40 kb of code and calculated SLOC for this code using gcov. Then we calculated "source lines per byte" metric and used it to get approximate SLOC number using C source code size for the whole project.
It worked out just fine for our needs.
Thanks
You may want to try Resource Standard Metrics as it calculates effective lines of code which exclude the standalone braces etc which are programmer style and artificially inflate SLOC counts by 10 to 33%. Ask them for a free timed license to give it a try.
Their web page is http://msquaredtechnologies.com