Dynamic dead code elimination tools for complex C++ projects

Dynamic dead code elimination tools for complex C++ projects - c++

We have a project with a lot of code, part of it is legacy.
As part of the work flow, every once in a while, all the functionality of the product is checked.
I wonder if there is a way to use this fact to dynamically check which parts of the code were never used? (The difficult part is the C++ code, the .Net and Java are more under control and have less legacy).
Also - are there dynamic dead code elimination tools are there that can work with lots of code and complex projects (i.e. ~1M lines)?
All the similar questions I found talked about static analysis which we all ready do.
Thank you!

You might want to look at the code coverage tools that are used in testing. The idea of these tools is that they instrument the code and after running the set of tests you know what lines of code were executed at least once and what lines were never executed. After that you can improve tests.
The same thing can be used to identify dead code in case if you have diverse enough execution environment.

I don't know what platform you are on but we have used Gcov with success if you're compiling with the gnu toolchain:
http://gcc.gnu.org/onlinedocs/gcc/Gcov.html

Related

Why is Code Contract analysis not installed by default

We're just beginning a new project and we're keen to include testing from the ground up.
While we were looking at which unit test solution to use I came across Code Contracts which seem like they offer an easier way to check things like null parameter passing without having to write independent unit test methods.
One thing I am little confused about and makes me wary of investing heavily in Code Contract checks, is the fact that the analysis tool needs to be downloaded from DevLabs and isn't included in VS2012 by default.
What is the reason for this?
Additionally: It seems people are reporting that VS2012 support for Code Contract analysis seems flakey, why would be use Code Contracts if the analysis capabilities aren't very good?

I can't speak for VS2012, but it works perfectly fine for me in VS2010. There is a very minor conflict with Code Analysis in that a false alert is raised, but you simply switch off that Code Analysis rule and rely on the Code Contracts static checking (which is a lot more comprehensive).
Probably the best place to get the answer for your IDE integration is via the email address located on the Code Contracts Website, however I suspect that because its a research project, not having it as an "official" part of the IDE provides the ability for more regular updates.

Do you recommend Enabling Code Analysis for C/C++ on Build?

I'm using Visual Studio 2010, and in my C++/CLI project there are two Code Analysis settings:
Enable Code Analysis on Build
Enable Code Analysis for C/C++ on Build
My question is about the second setting.
I've enabled it and it takes a long time to run and it doesn't find much.
Do you recommend enabling this feature? Why?

The two options you specify control the automatic execution of Code Analysis on managed and native C++ respectively.
Code Analysis of managed code is performed by FXCop engine analyzing the generated IL.
Code Analysis of native code is performed during compilation by the PREFast engine analyzing the C++ source code.
I strongly encourage you to require your developers to have run CA on their code before checking it in. If you don't, you're:
Delaying the process of ensuring that your code has no known vulnerabilities and issues that could otherwise have been systematically removed from your product's source.
Denying your developers their right to improve their skills by learning incrementally what code they should not be writing and why.
Selling your customers short because they're the ones who will suffer from crashes and security issues when they're using your product.
Further, if you're writing native C++ and have not already planned to start adorning your code with SAL Annotations, then, frankly, someone at your place of work deserves to be dragged out into the street and humiliated! There's some great stuff coming down the pipe shortly in the next version of the SAL annotations - get on it now and be way ahead of the curve compared to your competitors! :)

Never did anything for me. In theory, it's supposed to help catch logical errors, but I've never found it to report anything.

We are using LINT to do a static code analysis for plain C++ applications (no .Net, no C++/CLI).
This is different from what you are using but probably the same principles can be applied.
We execute LINT like this:
During a build, only the changed sources (CPP files) are run through LINT. Possibly many more files are being recompiled (if a header file is changed), but only the changed .CPP files are run through LINT.
Run the static code analysis on all files on a Continuous Integration server. If it finds something, let it mail the error to the developers that most recently committed changes to the versioning system, or to the main developer.
What you could do additionally is to perform a static code analysis on all files that are committed to your versioning system. E.g. in Subversion you could do this in a commit-trigger.

Tool for analyzing C++ sources (MSVC)

I need a tool which analyzes C++ sources and says what code isn't used. Size of sources is ~500mb

PC-Lint is good. If it needs to be free/open source your choices dwindle. Cppcheck is free, and will check for unused private functions. I don't think that it looks for things like uninstantiated classes like PC-Lint.

Once again, I'll throw AQTime into the discussion. Has static code analysis for most, if not all, of the supported languages. I didn't really go into that part though, I mainly used the dynamic profilers (memory, performance and so on).

You could use a code coverage tool (dynamic analysis) to get an idea of what code isn't
being executed, and then hand analyze to see if that code is really useless.
If you want a static analysis, you need a tool that can read the entire
500Mb of source code (est. 20 million lines? Wow!) and compute a
conservative estimate of what is used. This requires doing a points-to
analysis over the entire system.
Here's why: If you leave out any module Z, and
decide that FOO is unused, you
might find out later that Z happened to be the one that used FOO,
or more subtly, Z copied a pointer value that happened to have
&FOO in it to a third module M that in turn called the "unused" function
throught the pointer.
What this means is that no static analysis tool that reads just
single modules (compilation units) can answer this question safely.
And at your scale, you can't afford to make dumb mistakes.
My company, Semantic Designs has done points-to analysis for 35 million line systems
of C code using our DMS Software Reengineering Toolkit. DMS
can read very large systems of source code. It required
a custom tool, not so much because the source code was in an odd (archiac)
dialect of C++ (systems in extremely modern dialects can't be this big,
not enough time to code them!), but rather because in very large systems
there are other peculiar factors at play. For the C system we did,
there was a custom dynamic linker, and that affected the points-to analysis,
which in turn had to be customized.
Because systems of the scale you are discussing alway have surprises like this (BIBSEH: "Because In Big Systems, Everything Happens"), you will
likely need a custom tool to answer the question. DMS is designed
to be customized.
See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html
and http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html

Code coverage tool is what you need, but you will have to run our program through all functionality and see what is repoted as unused. Since the code could be DLL exported functions you will have to make sure nothing uses them externally. Some code coverage tools: Purify, CTC++, Boundschecker may have code coverage functionality if I remember right and a bunch of other tools.
Be very careful about removing any function that may have been exported without knowing what external program may be linking/using it.

Automated Dead code detection in native C++ application on Windows?

Background
I have an application written in native C++ over the course of several years that is around 60 KLOC. There are many many functions and classes that are dead (probably 10-15% like the similar Unix based question below asked). We recently began doing unit testing on all new code and applying it to modified code whenever possible. However, I would make a SWAG that we have less than 5% test coverage at the present moment.
Assumptions/Constraints
The method and/or tools must support:
Native (i.e. unmanaged) C++
Windows XP
Visual Studio 2005
Must not require user supplied test cases for coverage. (e.g. can't depend on unit tests to generate code coverage)
If the methods support more than these requirements, then great.
NOTE: We currently use the Professional edition of Visual Studio 2005, not the Team System. Therefore, using Team System might be a valid suggestion (I don't know, I've never used it) however I'm hoping it is not the only solution.
Why using unit tests for code coverage is problematic
I believe that it is impossible for a generic tool to find all the dead (e.g. unreachable code) in any arbitrary application with zero false positives (I think this would be equivalent to the Halting problem). However, I also believe it is possible for a generic tool to find many types of dead code that are highly probable to in fact be dead, like classes or functions which are never reference in the code by anything else.
By using unit tests to provide this coverage, you no longer using a generic algorithm and are thus increasing both the percentage of dead code you can detect and the probability that any hits are not false positives. Conversely, using unit tests could result in false negatives since the unit tests themselves might be the only thing exercising a given piece of code. Ideally, I would have regression testing that exercises all externally available methods, APIs, user controls, etc. which would serve as a baseline measurement of code coverage analysis to rule out certain methods from being false positives. Sadly however, I do not have this automated testing at the present time.
Since I have such a large code base with such a low test case coverage percentage however, I'm looking for something that could help without requiring huge amounts of time invested in writing test cases.
Question
How do you go about detecting dead code in an automated or semi-automated fashion in a native C++ application on the Windows platform with the Visual Studio 2005 development environment?
See Also
Dead code detection in legacy C/C++ project
I want tell the VC++ Compiler to compile all code. Can it be done?

Ask the linker to remove unreferenced objects (/OPT:REF). If you use function-level linking, and verbose linker output, the linker output will list every function it can prove is unused. This list may be far from complete, but you already have the tools needed.

We use Bullseye, and I can recommend it. It doesn't need to be run from a unit test environment, although that's what we do.

Use a code coverage tool against your unit test suite.

Dead code detection in legacy C/C++ project [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
How would you go about dead code detection in C/C++ code? I have a pretty large code base to work with and at least 10-15% is dead code. Is there any Unix based tool to identify this areas? Some pieces of code still use a lot of preprocessor, can automated process handle that?

You could use a code coverage analysis tool for this and look for unused spots in your code.
A popular tool for the gcc toolchain is gcov, together with the graphical frontend lcov (http://ltp.sourceforge.net/coverage/lcov.php).
If you use gcc, you can compile with gcov support, which is enabled by the '--coverage' flag. Next, run your application or run your test suite with this gcov enabled build.
Basically gcc will emit some extra files during compilation and the application will also emit some coverage data while running. You have to collect all of these (.gcdo and .gcda files). I'm not going in full detail here, but you probably need to set two environment variables to collect the coverage data in a sane way: GCOV_PREFIX and GCOV_PREFIX_STRIP...
After the run, you can put all the coverage data together and run it through the lcov toolsuite. Merging of all the coverage files from different test runs is also possible, albeit a bit involved.
Anyhow, you end up with a nice set of webpages showing some coverage information, pointing out the pieces of code that have no coverage and hence, were not used.
Off course, you need to double check if the portions of code are not used in any situation and a lot depends on how good your tests exercise the codebase. But at least, this will give an idea about possible dead-code candidates...

Compile it under gcc with -Wunreachable-code.
I think that the more recent the version, the better results you'll get, but I may be wrong in my impression that it's something they've been actively working on. Note that this does flow analysis, but I don't believe it tells you about "code" which is already dead by the time it leaves the preprocessor, because that's never parsed by the compiler. It also won't detect e.g. exported functions which are never called, or special case handling code which just so happen to be impossible because nothing ever calls the function with that parameter - you need code coverage for that (and run the functional tests, not the unit tests. Unit tests are supposed to have 100% code coverage, and hence execute code paths which are 'dead' as far as the application is concerned). Still, with these limitations in mind it's an easy way to get started finding the most completely bollixed routines in the code base.
This CERT advisory lists some other tools for static dead code detection

For C code only and assuming that the source code of the whole project
is available, launch an analysis with the Open Source tool Frama-C.
Any statement of the program that displays red in the GUI is
dead code.
If you have "dead code" problems, you may also be interested in
removing "spare code", code that is executed but does not
contribute to the end result. This requires you to provide
an accurate modelization of I/O functions (you wouldn't want
to remove a computation that appears to be "spare" but
that is used as an argument to printf). Frama-C has an option for pointing out spare code.

Your approach depends on the availability (automated) tests. If you have a test suite that you trust to cover a sufficient amount of functionality, you can use a coverage analysis, as previous answers already suggested.
If you are not so fortunate, you might want to look into source code analysis tools like SciTools' Understand that can help you analyse your code using a lot of built in analysis reports. My experience with that tool dates from 2 years ago, so I can't give you much detail, but what I do remember is that they had an impressive support with very fast turnaround times of bug fixes and answers to questions.
I found a page on static source code analysis that lists many other tools as well.
If that doesn't help you sufficiently either, and you're specifically interested in finding out the preprocessor-related dead code, I would recommend you post some more details about the code. For example, if it is mostly related to various combinations of #ifdef settings you could write scripts to determine the (combinations of) settings and find out which combinations are never actually built, etc.

Both Mozilla and Open Office have home-grown solutions.

g++ 4.01 -Wunreachable-code warns about code that is unreachable within a function, but does not warn about unused functions.
int foo() {
return 21; // point a
}
int bar() {
int a = 7;
return a;
a += 9; // point b
return a;
}
int main(int, char **) {
return bar();
}
g++ 4.01 will issue a warning about point b, but say nothing about foo() (point a) even though it is unreachable in this file. This behavior is correct although disappointing, because a compiler cannot know that function foo() is not declared extern in some other compilation unit and invoked from there; only a linker can be sure.

Dead code analysis like this requires a global analysis of your entire project. You can't get this information by analyzing translation units individually (well, you can detect dead entities if they are entirely within a single translation unit, but I don't think that's what you are really looking for).
We've used our DMS Software Reengineering Toolkit to implement exactly this for Java code, by parsing all the compilation-units involved at once, building symbol tables for everything and chasing down all the references. A top level definition with no references and no claim of being an external API item is dead. This tool also automatically strips out the dead code, and at the end you can choose what you want: the report of dead entities, or the code stripped of those entities.
DMS also parses C++ in a variety of dialects (EDIT Feb 2014: including MS and GCC versions of C++14 [EDIT Nov 2017: now C++17]) and builds all the necessary symbol tables. Tracking down the dead references would be straightforward from that point. DMS could also be used to strip them out. See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html

Bullseye coverage tool would help. It is not free though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js