We have a quite large (280 binaries) software project under Linux and currently it has a very dispersed code structure - that means one can't [work out] what code from the source tree is valid (builds to deployable binaries) and what is deprecated. But the Makefiles are good. We need to calculate C/C++ SLOC for entire project.
Here's a question - can I find out SLOC GCC has compiled? Or maybe I can gain this information from binary (debug info probably)? Or maybe I can find out what source files was the binary compiled from and use this info to calculate SLOC?
Thanks
Bogdan
It depends on what you mean by SLOC that GCC has compiled. If you mean, track the source files from your project that GCC used, then you'd probably use the dependency tracking options which lists source files and headers. That's -M and various related options. Beware of including system-provided headers. A technique I sometimes use is to replace the standard C compiler with an appropriate variation - for example, to ensure a 64-bit compilation, I use 'CC="gcc -m64"' to guarantee the when the C compiler is used, it will compile in 64-bit mode. Obviously, with a list of files, you can use wc to calculate the number of lines. You use 'sort -u' to eliminate duplicated headers.
One obvious gotcha is if you find that everything is included with relative path names - then you have to work out more carefully where each file is.
If you have some other definition of SLOC, then you will need to specify what you have in mind. Sometimes, people are looking for non-blank, non-comment SLOC, for example - but you still need the list of source files, which I think the -M options will help you determine.
The first thing you want is an accurate list of what you actually compiled. You can achieve this by using a wrapper script instead of gcc.
The second list you want is the list of files that were used for this. For this, consult the dependency list (as you said that was correct). (Seems you'd need make --print-data-base)
Then, sort and deduplicate the list of files, and throw out system headers. For each remaining file, determine the SLOC count using your prefered tool.
What you can do is do a pre-processor only compilation, using gcc's -E flag: this will result in output that is the actual code being compiled. Do a simple line count (wc -l) or something more advanced.
It might include extra code from macro's, etc. but especially if you compare it with a previous instance of your code it is a good comparison.
Here you can find a free (GPL) tool called sloccount dedicated to estimate SLOC in projects of any size:
http://www.dwheeler.com/sloccount/
I've used the following approach to get dirty metric value in 2 hours. Even though the preciseness was far from ideal it was enough to make the decision.
We took around 40 kb of code and calculated SLOC for this code using gcov. Then we calculated "source lines per byte" metric and used it to get approximate SLOC number using C source code size for the whole project.
It worked out just fine for our needs.
Thanks
You may want to try Resource Standard Metrics as it calculates effective lines of code which exclude the standalone braces etc which are programmer style and artificially inflate SLOC counts by 10 to 33%. Ask them for a free timed license to give it a try.
Their web page is http://msquaredtechnologies.com
Related
I need to compress some C++ code. The exe must be as compressed as possible. It's for Zero Robotics and the codesize usage now is 139% and I need to reduce this. Are there tools to compress the code?
Since all you can do is edit your source code, the only way to reduce the size of your executable is to find ways to consolidate stuff.
Things to look out for:
Find dead code and resources. Delete all functions/methods/variable that are not used.
Find duplicate code and data. For example, if you have a function/method that is copy&pasted into several files, refactor your code so that you only need one version of it.
Maybe try to reduce the amount of string constants and other resources you're using.
If you're using any 3rd party code/libraries, try if you can do without it or if there's a more lightweight alternative.
There is no automated way to do this I'm aware of. You really have to look through your source yourself and clean it up by hand.
Supposing you want to reduce the executable file size, you can check your compiler options to reduce the obj size. If you are using GCC, check the manual for the options -s and -Os.
A few years ago, i used the upx packer for executables, which compresses an exe-file.
< [1]: http://upx.sourceforge.net/>
Maybe, this is what you are looking for.
I'm trying to write my first 'demoscene' application in MS Visual Studio Express 2010. Suddenly I realized, that my binary expanded from 16kb to ~100kb in fully-optimized-for-size release version. My target size is 64k. Is there any way to somehow "browse" binary to figure out, which methods consumes a lot of space, and which I should rewrite? I really want to know what my binary consists of.
From what I found in web, VS2010 is not the best compiler for demoscenes, but I still want to understand what's happening inside my .exe file.
I think you should have MSVC generate a map file for you. This is a file that will tell you the addresses of most of the different functions in your executable. The difference between consecutive addresses should tell you how much space the function takes. To generate a map file, add the /MAP linker option. For more info, see:
http://msdn.microsoft.com/en-us/library/k7xkk3e2(v=VS.100).aspx
You can strip off lots of unnecessary stuff from the executable and compress it with utilities such as mew.
I've found this useful for examining executable sizes (although not for demoscene type things): http://aras-p.info/projSizer.html
I will say this: if you are using the standard library at all then stop immediately. It is a huge code bloater. For example, each unique usage std::sort adds around 5KB and there's similar numbers for many of the standard containers (of course, it depends what functions you use, but in general they add lots of code).
Also, I'm not into the demo scene, but I believe people use Crinkler to compress their executables.
Use your version contol system to see what caused the increase. Going forward, Id log the built exe size during the nightly builds. And dont forget you can optimize for minimal size with the compiler settings.
I am using SUSE10 (64 bit)/AIX (5.1) and HP I64 (11.3) to compile my application. Just to give some background, my application has around 200KLOC (2Lacs) lines of code (without templates). It is purely C++ code. From measurements, I see that compile time ranges from 45 minutes(SUSE) to around 75 minutes(AIX).
Question 1 : Is this time normal (acceptable)?
Question 2 : I want to re-engineer the code arrangement and reduce the compile time. Is there any GNU tool which can help me to do this?
PS :
a. Most of the question in stackoverflow was related to Visual Studio, so I had to post a separate question.
b. I use gcc 4.1.2 version.
c. Another info (which might be useful) is code is spread across around 130 .cpp files but code distribution varies from 1KLOC to 8 KLOCK in a file.
Thanks in advance for you help!!!
Edit 1 (after comments)
#PaulR "Are you using makefiles for this ? Do you always do a full (clean) build or just build incrementally ?"
Yes we are using make files for project building.
Sometimes we are forced to do the full build (like over-night build/run or automated run or refresh complete code since many members have changed many files). So I have posted in general sense.
Excessive (or seemingly excessive) compilation times are often caused by an overly complicated include file hierarchy.
While not exactly a tool for this purpose, doxygen could be quite helpful: among other charts it can display the include file hierarchy for every source file in the project. I have found many interesting and convoluted include dependencies in my projects.
Read John Lakos's Large-Scale C++ Design for some very good methods of analysing and re-organising the structure of the project in order to minimise dependencies. Ultimately the time taken to build a large project increases as the amount of code increases, but also as the dependencies increase (or at least the impact of changes to header files increases as the dependencies increase). So minimising those dependencies is one thing to aim for. Lakos's concept of Levelization is very helpful in working out how to split several large monolothic inter-dependent libraries into something with a much better structure.
I can't address your specific questions but I use ccache to help with compile times, which caches object files and will use the same ones if source files do not change. If you are using SuSE, it should come with your distribution.
In addition to the already mentioned ccache, have a look at distcc. Throwing more hardware at such a scalable problem is cheap and simple.
Long compile times in large C++ projects are almost always caused by inappropriate use of header files. Section 9.3.2 of The C++ Programming Language provide some useful points this. Precompiling header files can considerably reduce compile time of large projects. See the GNU documentation on Precompiled Headers for more information.
Make sure that your main make targets can be executed in parallel (make -j <CPU_COUNT+1>) and of course try to use ccache. In addition we experimented with ccache and RAM disks, if you export CCACHE_DIR and point it to a RAM disk this will speed up your compilation process as well.
I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?
This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.
The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.
Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.
No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.
Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.
We have project which uses gcc and make files. Project also contains of one big subproject (SDK) and a lot of relatively small subprojects which use that SDK and some shared framework.
We use precompiled headers, but that helps only for re-compilation to be faster.
Is there any known techniques and tools to help with build-time optimizations? Or maybe you know some articles/resources about this or related topics?
You can tackle the problem from two sides: refactor the code to reduce the complexity the compiler is seeing, or speed up the compiler execution.
Without touching the code, you can add more compilation power into it. Use ccache to avoid recompiling files you have already compiled and distcc to distribute the build time among more machines. Use make -j where N is the number of cores+1 if you compile locally, or a bigger number for distributed builds. That flag will run more than one compiler in parallel.
Refactoring the code. Prefer forward declaration to includes (simple). Decouple as much as you can to avoid dependencies (use the PIMPL idiom).
Template instantiation is expensive, they are recompiled in every compilation unit that uses them. If you can refactor your templates as to forward declare them and then instantiate them in only one compilation unit.
The best I can think of with make is the -j option. This tells make to run as many jobs as possible in parallel:
make -j
If you want to limit the number of concurrent jobs to n you can use:
make -j n
Make sure the dependencies are correct so make doesn't run jobs it doesn't have to.
Another thing to take into account is optimizations that gcc does with the -O switch. You can specify various levels of optimization. The higher the optimization, the longer the compile and link times. A project I work with runs takes 2 minutes to link with -O3, and half a minute with -O1. You should make sure you're not optimizing more than you need to. You could build without optimization for development builds and with optimization for deployment builds.
Compiling with debug info (gcc -g) will probably increase the size of your executable and may impact your build time. If you don't need it, try removing it to see if it affects you.
The type of linking (static vs. dynamic) should make a difference. As far as I understand static linking takes longer (though I may be wrong here). You should see if this affects your build.
From the description of the project I guess that you have one Makefile per directory and are using recursive make a lot. In that case techniques from "Recursive Make Considered Harmful" should help very much.
If you have multiple computers available gcc is well distributed by distcc.
You can also use ccache in addition.
All this works with very little changes of the makefiles.
Also, you'll probably want to keep your source code files as small and self-contained as possible/feasible, i.e. prefer many smaller object files over one huge single object file.
This will also help avoid unnecessary recompilations, in addition you can have one static library with object files for each source code directory or module, basically allowing the compiler to reuse as much previously compiled code as possible.
Something else, which wasn't yet mentioned in any of the previous responses, is making symbol linkage as 'private' as possible, i.e. prefer static linkage (functions, variables) for your code if it doesn't have to be visible externally.
In addition, you may also want to look into using the GNU gold linker, which is much more efficient for compiling C++ code for ELF targets.
Basically, I'd advise you to carefully profile your build process and check where the most time is spend, that'll give you some hints as to how to optimize your build process or your projects source code structure.
You could consider switching to a different build system (which obviously won't work for everyone), such as SCons. SCons is much smarter than make. It automatically scans header dependencies, so you always have the smallest set of rebuild dependencies. By adding the line Decider('MD5-timestamp') to your SConstruct file, SCons will first look at the time stamp of a file, and if it's newer than the previously built time stamp, it will use the MD5 of the file to make sure you actually changed something. This works not just on source files but object files as well. This means that if you change a comment, for instance, you don't have to re-link.
The automatic scanning of header files has also ensured that I never have to type scons --clean. It always does the right thing.
If you have a LAN with developer machines, perhaps you should try implementing a distributed compiler solution, such as distcc.
This might not help if all of the time during the build is spent analyzing dependencies, or doing some single serial task. For the raw crunch of compiling many source files into object files, parallel building obviously helps, as suggested (on a single machine) by Nathan. Parallelizing across multiple machines can take it even further.
http://ccache.samba.org/ speeds up big time.
I work on a middle sized project, and that's the only thing we do to speed up the compile time.
You can use distcc distributed compiler to reduce the build time if you have access to several machines.
Here's an article from from IBM developerWorks related to distcc and how you can use it:
http://www.ibm.com/developerworks/linux/library/l-distcc.html
Another method to reduce build time is to use precompiled headers. Here's a starting point for gcc.
Also don't forget to use -j when building with make if your machine has more than one cpu/core(2x the number of cores/cpus is just fine).
Using small files may not always be a good recommendation. A disk have a 32 or 64K min sector size, with a file taking at least a sector. So 1024 files of 3K size (small code inside) will actually take 32 or 64 Meg on disk, instead of the expected 3 meg. 32/64 meg that needs to be read by the drive. If files are dispersed around on the disk you increase read time even more with seek time. This is helped with Disk Cache obviously, to a limit. pre-compiled header can also be of good help alleviating this.
So with due respect to coding guidelines, there is no point in going out of them just to place each strcuct, typedef or utility class into separate files.