Stripping C++ library of not required code for an application [closed]

Stripping C++ library of not required code for an application [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have an application dependent on many libraries. I am building everything from sources on an ubuntu machine. I want to remove any function/class that is not required by an application. Is there any tool to help with that?
P.S. I want to remove source code from the library not just symbols from object files.

Standard strip utility was created exactly for this.

I have now researched this a bit in the context of my own project and decided this was worth a full answer rather than just a comment. This answer is based on Apple's toolchain on macOS (which uses clang, rather than gcc), but I think things work in much the same way for both.
The key to this is enabling 'link time optimization' when building your libraries and executable(s). The mechanics of this are actually very simple - just pass -flto to gcc and ld on the command line. This has two effects:
Code (functions / methods) in object files or archives that is never called is omitted from the final executable.
The linker performs the sort of optimisations that the compiler can perform (such as function inlining), but with knowledge that extends across compilation unit boundaries.
It won't help you if you are linking against a shared library, but it might help if that shared library links with other (static) libraries which contain code that the shared library never calls.
On the upside, this reduced the size of my final executable by about 5%, which I'm pleased about. YMMV.
On the downside, my object files roughly doubled in size and sometimes link times increased dramatically (by something like a factor of 100). Then, if I re-linked, it was much faster. This behaviour might be a peculiarity of Apple's toolchain however. Perhaps it is stashing away some build intermediates somewhere on the first link. In any case, if you only enable this option for release builds it should not be a major issue.
There are more details of the full set of gcc command line options that control optimisation here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. Search that page for flto to narrow down your search.
And for a glimpse behind the scenes, see: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
Edit:
A bit more information about link times. Apple's linker creates some huge files in a directory called LTOCache when you link. I've not seen these before today so these look to be the build intermediates that speed up linking second time around. As for my initial link being so slow, this may in part be due to the fact that, in my case, these are created on an SMB server. But then again, the CPU was maxed out so maybe not.

OK, now that I understand the OP's requirements better I have another answer for this that I think might better suit his needs. I think the way to tackle this is with a code coverage tool. After all, the problem is identifying what you can safely get rid of it. Actually stripping it out is easy.
My IDE (Visual Studio) has one of these built in but I think the OP is using gcc so the first port of call appears to be gcov. There are a number of commercial options, but they are expensive. There's also a potentially useful post here.
The other thing you need, of course, is a program that exercises all the parts of the library that you want to keep to give you a coverage report to work from, but it sounds like the OP already has that. A good IDE will also help as it makes navigating around the code so much easier. In Visual Studio, I find Jump to Definition and quick and easy 'bookmarking' to be key features.

Related

Creating a C++ compiler/linker for a homemade opcode list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way how I can tell a C++ compiler/linker to compile the source code into my own homemade opcode list? I need it for my virtual machine which will execute on a microcontroller.
I don't want to create a C++ compiler from scratch, but only change the opcodes, addresses of CPU status register, stack pointer and GPIO registers, program memory and data memory from an existing compiler that is open source so that people making programs for it don't have to rewrite the whole code, but just port it using the libraries that are compatible with my own compiler's libraries.
Example is an avr-gcc compiler.
The compiler and its libraries must not be proprietary in the way that I or any programmer have to pay for it and I don't want it to be either GPL in such way that a programmer must reveal source for their own projects. I want all my programmers to freely use my compiler, be free to license their work in whatever way they want as well as choose to make it open source or proprietary.

Let's consider the steps involved:
Retargeting an existing C++ compiler: Several production-quality, retargetable C++ compilers are freely available today. For instance, the LLVM platform (clang++) provides some pointers on writing a backend for a new hardware architecture (this naturally applies to VM's as well!). Unfortunately, up-to-date documentation on porting the GNU compilers is harder to come by. It's entirely possible that many of the older documents remain relevant today, but I know far too little about GCC to say.
Note that the effort required to retarget either compiler is likely to depend on how well the instruction set of your virtual machine matches the compiler's low-level intermediate representation. Since they often (at least semantically) take the form of three-address code ― that is, instructions with two source operands and one destination ― writing a code generator for, say, a stack machine (in which all operands are implicitly addressed) could prove to be a bit more difficult.
From this point on, you really have two options. You could stick to the conventional way in which C++ programs are compiled, i.e., from source, to assembly, to object files, to linked executable or library. That involves going through the steps I have outlined below. But since you are targeting a virtual machine, it may have requirements that are radically different from those of modern hardware architectures. In that case, you may want to steer clear of existing software like binutils and roll your own assembler and linker.
Writing or porting an assembler: Unless your chosen compiler is able to directly generate machine code, you will most likely also need to write an assembler for your virtual machine, or port an existing one. If your virtual machine's instruction set looks anything like that of a modern machine, and if you want to use the standard C++ compilation/linking pipeline, you could look into porting binutils, specifically gas, the GNU assembler.
Writing or porting a linker: The object files produced by your assembler are not in themselves executable programs. Addresses must be assigned to symbols and segments, and references between object files must be resolved. This means that the linker needs some understanding of your instruction set. In particular, it must be able to find and patch locations in code and data that address memory. The binutils porting guide I linked above is relevant here, too; you may also enjoy reading Linkers and Loaders.
As #Mat noted in the comment section above, the GPL doesn't usually "infect" the output of a program licensed under it. See this section. Notably:
The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work.
I am not a lawyer, but I take this to mean that an exception would be made for, say, compiling the compiler with itself ― the output would still be subject to the terms of the GPL.

How do you combine components of large c++ projects? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a question about how large c++ projects with many components are supposed to be managed (I guess is the best term). For all intents and purposes I'm a beginning programmer. I understand the basics of compiling, header files, etc., but I've never really worked on anything bigger than homework assignments. So, let's take something like a game engine that has various components like a memory manager, renderer, physics simulation, and so on. How would one work on these components separately, but in a way that makes it easy to integrate back into the whole? For example, would you make a separate visual studio project for each piece with its own main? If you have one big project for everything, how would you work on one component without potentially another unfinished component making it fail every compile? I feel like I'm missing some major concept. Like, for projects with multiple programmers that have to check out portions to work on... do they grab all the code so they can compile, or do they set up their own temporary project to work on their bit? Both options sound wrong. You have to have a main function to compile right?
I would very much appreciate anyone educating me on this topic as I feel this is something i should have and just somehow missed completely.

When you are working with larger programs it is customary to have one source file with a main program and the rest (there can be many source files) are called from main. Then you need a build strategy. You can write a script file that compiles each of your source files and then links them all together. Unfortunately this can lead to long build times, so professional programmers use of make files which rebuild only the files that change.
As a further refinement, you can organize groups of sources into libraries and build the libraries separately and then link them with your remaining compiled source files.
Try looking up gmake (for linux) to see how to build larger projects. I guess you are using Microsoft VC++, in which case compiled files have .obj extensions and libraries .lib extensions. Microsoft have there own way of building libraries which is slighly more complicated than using gmake.
When you look further you'll come across shared libraries (dynamic link libraries on windows - DLLs).

This isn't really a great question for Stack overflows format. C++ does support language facilities for managing large code bases, like namespaces, classes, and header files. But your question seems to suggest a lack of perspective as to what they are for, or a limited understanding of the technical framework and process for contributing code to a software project. Which isn't a c++ specific issue.
When working on a living project, a primary concern is dealing with complexity. Or, in other words, reducing the number of things you have to think about at any one point in time. What that means is if another programmer is working on the user interface, ideally your code in the physics engine shouldn't have to change to reflect those changes. So interfaces, for forming abstractions and hiding information, are essential.
Granted I'm pretty green as well, so I can't give any real solid advice. I only mention this point to give some perspective as to how vague your question is. If I understand your question correctly, you might enjoy a book like Code Complete 2 by McConnell.

Large projects are separated into pieces. Normally, you should have the ability to compile each piece separately. The best practice that I know is to declare the interfaces among the various components, minimizing dependencies as close as possible to zero, and then building 'test' programs, which are small and serve two reasons: test a small piece of code, have main().
The directory structure is usually:
yourlib/
lib/
ext-inc/
test/
other dirs/
...
the lib contains the output library object (.a, .so)
the ext-lib contains the headers external code will use (sometimes called 'public' or just 'inc')
the test directory usually have a main.c (cpp) file and might have some more, as needed.
When you checkout(svn) / clone(git) / sync(p4) / etc. you would take everything, but work only on your area. once done, you merge/submit your changes into the main branch.

Tool to parse C++ source and move in-header inline methods to the .cpp source file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
The source code of our application is hundreds of thousands of line, thousands of files, and in places very old - the app was first written in 1995 or 1996. Over the past few years my team has greatly improved the quality of the source, but one issue remains that particularly bugs me: a lot of classes have a lot of methods fully defined in their header file.
I have no problem with methods declared inline in a header in some cases - a struct's constructor, a simple method where inlining measurably makes it faster (we have some math functions like this), etc. But the liberal use of inlined methods for no apparent reason is:
Messy
Makes it hard to find the implementation of a method (especially searching through a tree of classes for a virtual function, only to find one class had its version declared in the header...)
Probably increases the compiled code size
Probably causes issues for our linker, which is notoriously flaky for large codebases. To be fair, it has got much better in the past few years, but it's not perfect.
That last reason may now be causing problems for us and it's a good reason to go through the codebase and move most definitions to the source file.
Our codebase is huge. Is there an automated tool that can do (most of) this for us?
Notes:
We use Embarcadero RAD Studio 2010. In other words, the dialect of C++ includes VCL and other extensions, etc.
A few headers are standalone, but most are paired with a corresponding .cpp file, as you normally would. Apart from the extension the filename is the same, i.e., if there are methods defined in X.h, they can be moved to X.cpp. This also means the tool doesn't have to handle parsing the whole project - it could probably just parse individual pairs of .cpp/.h files, ignore the includes, etc, so long as it could reliably recognise a method with a body defined in a class declaration and move it.

You might try Lazy C++. I have not used it, but I believe it is a command line tool to do just what you want.

If the code is working then I would vote against any major automated rewrite.
Lots of work could be involved fixing it up.
Small iterative improvements over time is a better technique as you will be able to test each change in isolation (and add unit tests). Anyway your major complaint about not being able to find the code is not a real problem and is already solved. There are already tools that will index your code base so your editor will jump to the correct function definition without you having to search for it. Take a look at ctags or the equivalent for your editor.
Messy
Subjective
Makes it hard to find the implementation of a method (especially searching through a tree of classes for a virtual function, only to find one class had its version declared in the header...)
There are already tools available for finding the function. ctags will make a file that allows you to jump directly to the function from any decent editor (vim/emacs). I am sure your editor if nto one of these has the equivalent tool.
Probably increases the compiled code size
Unlikely. The compiler will choose to inline or not based on internal metrics not weather it is marked inline in the source.
Probably causes issues for our linker, which is notoriously flaky for large codebases. To be fair, it has got much better in the past few years, but it's not perfect.
Unlikely. If your linker is flakey then it is flakey it is not going to make much difference where the functions are defined as this has no bearing on if they are inlined anyway.

XE2 includes a new static analyzer. It might be worthwhile to give the new version of C++Builer's trial a spin.

You have a number of problems to solve:
How to regroup the source and header files ideally
How to automate the code modifications to carry this out
In both cases, you need a robust C++ parser with full name resolution to determine the dependencies accurately.
Then you need machinery that can reliably modify the C++ source code.
Our DMS Software Reengineering Toolkit with its C++ Front End could be used for this. DMS has been used for large-scale C++ code restructuring; see http://www.semdesigns.com/Company/Publications/ and track down the first paper "Case Study: Re-engineering C++ Component Models Via Automatic Program Transformation". (There's an older version of this paper you can download from there, but the published one is better). AFAIK, DMS is the only tool to have ever been applied to transforming C++ on large scale.
This SO discussion on reorganizing code addresses the problem of grouping directly.

Visually marking conditional compilation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
We have a large amount of C/C++ code that's compiled for multiple targets, separated by #ifdefs. One of the targets is very different from the others and it's often important to know if the code you're editing is compiled for that target. Unfortunately the #ifdefs can be very spread out, so it's not always obvious which code is compiled for which targets.
Visual Studio's #ifdef highlighting can be helpful for visually identifying which code is compiled for which target, but changing the highlighting apparently requires modifications to the project file.
I'm interested in finding a tool or method that can help coders quickly recognize which targets are using each line of code. Even if it requires some sort of manual in-source annotation I think it could still be helpful. Best case it's automated, not tied to a specific editor or IDE, and it could be configured to warn in certain conditions (eg "you modified some code on Target X, be sure to test your code on that platform!").

If your code is getting that big that you can't tell what #ifdef your in then it's time to refactor your code. I would recommend that you refactor it into seperate cpp files per platform.
I noramlly only use #idef when the code is only one or two lines long, any longer and I normally refactor into it's only function or class into there own cpp file. That makes it simple to figure out where you are.

Check out Visual SlickEdit. The "Selective Display" option might be what you are looking for. I can't find any on-line documentation on it, but it will allow you to essentially apply a set of macro definitions to the code. So you can tell it to show you the code as the compiler will see it with a set of macros defined. This is a lot more than preprocessor output since it literally hides blocks of code that would be excluded based on the macro definitions.
This doesn't give you the ability to answer the question "Under what preprocessor conditions is this line of code included in compilation" though. The nice thing is that it applies the selective display filter to searches and printing.

I know for a fact that eclipse cdt does it. It has other nice features and some not-so-nice features for an IDE. Now, I code with vi, so I might be biased.

I don't know if there is a tool for this already, but I'd guess would be fairly easy to roll your own by using the precompiler. Precompile your file with a set of specific #defines and the output is what the compiler sees for that platform. I reckon this is not the same as highlighting the current file, but it can be automated and integrated into your IDE, push a button get a temp file with te current edited one under specific #define. Didn't try it myself, is just an idea.
PS. Yes, I had to read couple times your post to searching for where exactly is 'code coverage' involved lol.

Check XRefactory and Cscout.

Dead code detection in legacy C/C++ project [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
How would you go about dead code detection in C/C++ code? I have a pretty large code base to work with and at least 10-15% is dead code. Is there any Unix based tool to identify this areas? Some pieces of code still use a lot of preprocessor, can automated process handle that?

You could use a code coverage analysis tool for this and look for unused spots in your code.
A popular tool for the gcc toolchain is gcov, together with the graphical frontend lcov (http://ltp.sourceforge.net/coverage/lcov.php).
If you use gcc, you can compile with gcov support, which is enabled by the '--coverage' flag. Next, run your application or run your test suite with this gcov enabled build.
Basically gcc will emit some extra files during compilation and the application will also emit some coverage data while running. You have to collect all of these (.gcdo and .gcda files). I'm not going in full detail here, but you probably need to set two environment variables to collect the coverage data in a sane way: GCOV_PREFIX and GCOV_PREFIX_STRIP...
After the run, you can put all the coverage data together and run it through the lcov toolsuite. Merging of all the coverage files from different test runs is also possible, albeit a bit involved.
Anyhow, you end up with a nice set of webpages showing some coverage information, pointing out the pieces of code that have no coverage and hence, were not used.
Off course, you need to double check if the portions of code are not used in any situation and a lot depends on how good your tests exercise the codebase. But at least, this will give an idea about possible dead-code candidates...

Compile it under gcc with -Wunreachable-code.
I think that the more recent the version, the better results you'll get, but I may be wrong in my impression that it's something they've been actively working on. Note that this does flow analysis, but I don't believe it tells you about "code" which is already dead by the time it leaves the preprocessor, because that's never parsed by the compiler. It also won't detect e.g. exported functions which are never called, or special case handling code which just so happen to be impossible because nothing ever calls the function with that parameter - you need code coverage for that (and run the functional tests, not the unit tests. Unit tests are supposed to have 100% code coverage, and hence execute code paths which are 'dead' as far as the application is concerned). Still, with these limitations in mind it's an easy way to get started finding the most completely bollixed routines in the code base.
This CERT advisory lists some other tools for static dead code detection

For C code only and assuming that the source code of the whole project
is available, launch an analysis with the Open Source tool Frama-C.
Any statement of the program that displays red in the GUI is
dead code.
If you have "dead code" problems, you may also be interested in
removing "spare code", code that is executed but does not
contribute to the end result. This requires you to provide
an accurate modelization of I/O functions (you wouldn't want
to remove a computation that appears to be "spare" but
that is used as an argument to printf). Frama-C has an option for pointing out spare code.

Your approach depends on the availability (automated) tests. If you have a test suite that you trust to cover a sufficient amount of functionality, you can use a coverage analysis, as previous answers already suggested.
If you are not so fortunate, you might want to look into source code analysis tools like SciTools' Understand that can help you analyse your code using a lot of built in analysis reports. My experience with that tool dates from 2 years ago, so I can't give you much detail, but what I do remember is that they had an impressive support with very fast turnaround times of bug fixes and answers to questions.
I found a page on static source code analysis that lists many other tools as well.
If that doesn't help you sufficiently either, and you're specifically interested in finding out the preprocessor-related dead code, I would recommend you post some more details about the code. For example, if it is mostly related to various combinations of #ifdef settings you could write scripts to determine the (combinations of) settings and find out which combinations are never actually built, etc.

Both Mozilla and Open Office have home-grown solutions.

g++ 4.01 -Wunreachable-code warns about code that is unreachable within a function, but does not warn about unused functions.
int foo() {
return 21; // point a
}
int bar() {
int a = 7;
return a;
a += 9; // point b
return a;
}
int main(int, char **) {
return bar();
}
g++ 4.01 will issue a warning about point b, but say nothing about foo() (point a) even though it is unreachable in this file. This behavior is correct although disappointing, because a compiler cannot know that function foo() is not declared extern in some other compilation unit and invoked from there; only a linker can be sure.

Dead code analysis like this requires a global analysis of your entire project. You can't get this information by analyzing translation units individually (well, you can detect dead entities if they are entirely within a single translation unit, but I don't think that's what you are really looking for).
We've used our DMS Software Reengineering Toolkit to implement exactly this for Java code, by parsing all the compilation-units involved at once, building symbol tables for everything and chasing down all the references. A top level definition with no references and no claim of being an external API item is dead. This tool also automatically strips out the dead code, and at the end you can choose what you want: the report of dead entities, or the code stripped of those entities.
DMS also parses C++ in a variety of dialects (EDIT Feb 2014: including MS and GCC versions of C++14 [EDIT Nov 2017: now C++17]) and builds all the necessary symbol tables. Tracking down the dead references would be straightforward from that point. DMS could also be used to strip them out. See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html

Bullseye coverage tool would help. It is not free though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js