Capture all compiler invocations and command line parameters during build - c++

I want to run tools for static C/C++ (and possibly Python, Java etc.) code analysis for a large software project built with help of make. As it is known, make (or any other build tool) invokes compiler and similar tools for specified source code files. It is also possible to control compilation by defining environmental variables to be later passed to the compiler via its arguments.
The key to accurate static analysis is to provide defines and include paths exactly as they were passed to the compiler (basically all its -D and -I arguments). This way, the tool will be able to follow same code paths the compiler have followed.
The problem is, the high complexity of the project means there is no way to statically determine such environment, as different files are built with different sets of defines/include paths and other compilation flags.
The idea is that it should be somehow possible to capture individual invocations of the compiler with all arguments passed to it for each input file. Having such information and after its straightforward filtering (e.g. there is no need to know -O optimization levels or -W warning settings) it should be possible to invoke the static analyzer for each input file with the identical set of defines/includes used just for that input file.
The question is: are there existing tools/workflows that implement the idea I've described? I am mostly interested in a solution for POSIX systems, but ideas for Windows are also welcome.
A few ideas I've come to on my own.
The most trivial solution would be to collect make output and process it afterwards. However, certain projects have makefile rules that give very concise output instead of verbose one, so it might require some tinkering with Makefiles, which is not always desirable. Parallel builds may also have their console output mixed up and impossible to parse. Adaptation to other build systems (Cmake) will not be trivial either, so it is far from being the most convenient way.
Running make under ptrace and recording all invocations of exec* system calls that correspond to starting new applications, including compiler invocations. Then one will need to parse ptrace's output. This approach is build system and language agnostic (will catch all invocations of any compiler for any language) and should work for parallel builds. However it seems to be more technically complex. Performance degradation to the build process because of ptrace sitting on make's back is unclear either. It will also be harder to port it to Windows, as program-tracing API is somewhat different there.
The proprietary static analyzer for C++ on Windows (and recently Linux AFAIK) PVS-Studio seems to implement the second approach, however details on how they do it are welcome. If there are other IDEs/tools that already have something similar to what I need, please share information on them.

There are the following ways to gather information about the parameters of compilation in Linux:
Override environment CC/CXX variables. It is used in the utility scan-build from Clang Analyzer. This method works reliably only with simple projects for Make.
procfs - all the information on the processes is stored in /proc/PID/... . Reading from a disk is a slow process, you might not be able to receive information about all processes of a build.
strace utility (ptrace library). The output of this utility contains a lot of useful information, but it requires a complicated parsing, because information is written randomly. If you do not use many threads to build the project, it is a fairly reliable way to gather information about the processes. It’s used in PVS-Studio.
JSON Compilation Database in CMake. You can get all the compilation parameters using the definition -DCMAKE_EXPORT_COMPILE_COMMANDS=On. It is a reliable method if a project does not depend on non-standard environment variables. Also the project for CMake can be written with errors and issue incorrect Json, although this doesn’t affect the project build. It’s supported in PVS-Studio.
Bear utility (function substitution using LD_PRELOAD). You can get JSON Database Compilation for any project. But without environment variables it’ll be impossible to run the analyzer for some projects. Also, you cannot use it with projects, which already use LD_PRELOAD for a build. It’s supported in PVS-Studio.
Collecting information about compiling in Windows for PVS-Studio:
Visual Studio API to get the compilation parameters of standard projects;
MSBuild API to get the compilation parameters of standard projects;
Win API to get the information on any compilation processes as, for example, Windows Task Manager does it.

VERBOSE=true is a default make option to display all commands with all parameters. It also works with CMake, for instance.
You might want to look at Coverity. They are attaching their tool to the compiler to get everything that the compiler receives. You could overwrite the environment variables CC or CXX to first collect everything and then call the compiler as usual.

Related

set output path for cmake generated files

My question is the following:
Is there a way to tell CMakeFiles where to generate it's makefiles, such as cmake_install.cmake, CMakeCache.txt etc.?
More specifically, is there a way to set some commands in the CMakeFiles that specifies where to output these generated files? I have tried to search around the web to find some answers, and most people say there's no explicit way of doing this, while others say I might be able to, using custom commands. Sadly, I'm not very strong in cmake, so I couldn't figure this out.
I'm currently using the CLion IDE and there you can specifically set the output path through the settings, but for flexibility reasons I would like as much as possible to be done through the CMakeFiles such that compiling from different computers isn't that big of a hassle.
I would also like to avoid explicitly adding additional command line arguments etc.
I hope someone might have an answer for me, thanks in advance!
You can't (easily) do this and you shouldn't try to do it.
The build tree is CMake's territory. It allows you some tiny amount of customization there (for instance you can specify where the final build artifacts will be placed through the *_OUTPUT_DIRECTORY target properties), but it does not give you any direct control over where intermediate files, like object files or internal make scripts used for bookkeeping are being placed.
This is a feature. You have no idea how all the build systems supported by CMake work internally. Maybe you can move that internal file to a different location in your build process, which is based on Unix Makefiles. But maybe that will also horribly break my build process, which is using Visual Studio. The bottom line is: You shouldn't have to care about this. CMake should take care of it, and by taking some freedom away from you, it ensures that it can actually do that job on all supported build toolchains.
But this might still be an unsatisfactory answer to you. You're the developer, shouldn't you be in full control of the results produced by your build? Of course you should, which is why CMake again grants you full control over what goes into the install tree. That is, whatever ends up in the install directory when you call make install (or whatever is the equivalent of installing in your build toolchain) is again under your control.
So you do control everything that matters: The source tree, the install tree, and that tiny portion of the build tree where the final build artifacts go. The rest of the build tree is off-limits for you and for good reasons.

How to make building / compilation more comfortable

My current workflow when developing Apps or programs with Java or C/C++ is as follows:
I don't use any IDE like IntelliJ, Visual Studio, ...
Using linux or OS X, I use vim as code editor. When I build with a makefile or (when in Java) gradle, I :!make and wait for the compiler and linker to create the executable, which will be run automatically.
In case of compilation errors, the output of the compiler can get very long and the lines exceed the columns of the console. So everything gets messy, and sometimes takes too much time to find out, what the first error ist (often causing all following compile errors).
My question is, what is your workflow as a C++ developer? For example is there a way, to generate a nicely formatted local html file, that you can view / update in your browser window. Or other ideas?
Yes, I know. I could use Xcode or any other IDE. But I just don't want.
Compiling in vim with :!make instead of :make doesn't make any sense -- it's even one of the early features of vim. The former will expect us to have good eyes. The latter will display compilation errors into the quickfix window, which we can navigate. In other words, no need to use an auxiliary log file: we can navigate compilation errors even in (a coupled of) editors that run into a console.
I did expand on a related topic in https://stackoverflow.com/a/35702919/15934.
Regarding compilation, there are a few plugins that permits to compile in background. I've added this facility in build-tool-wrapper lately (it requires vim 7.4-1980 -- and it's still in a development branch at this time). This plugin also permits me to easily filter errors in the standard library with the venerable STLfilt, and to manage several build configurations (each in a separate directory).

Eclipse CDT: Managing conditional compile (#ifdef) in one codebase

I am working in a very large code base that has conditional compile flags to build code for several different embedded hardware platforms. There is a large part of the code that is common and there is a hardware adaptation layer that is supposed to be h/w independent but also has a lot of common code with function calls to specific hardware functions that are wrapped in #ifdef #else for conditional compilation. This is unfortunately the paradigm imposed on us for how we work across several teams so I need to work with it ie- no option to move to really hardware independent files. I develop and debug for all 3 (so far) of these platforms and keep having to add/delete the compiler flags from my Symbols and re-build my CDT index each time I need to context switch from developing/debugging an issue with one platform to another. Rebuilding the index can take a long time (up to an hour) , even with aggressive resource filtering.
We work with Perforce as our CVS and I want to work within a single Perforce workspace so I don't get out of sync with which files are checked out. I tried to create separate Eclipse projects for each of these types of platforms but I get an error message that the resource (the Perforce workspace code) is already in use by another project.
Does anyone have any suggestions?
I am using Eclipse Luna with CDT.
Thanks
For the part where you mentioned the need to delete and add Symbols and change build options in the Project Properties, this is what Configurations are for. Assuming the settings are pretty static for a given configuration (specific hardware platform), define a list of configurations, one per platform, and set the options according to the platform in question. This way, just changing configs will change the set of build options.
This is also true for file-specific settings, like "exclude from build". You can have varying set of files to build for each platform.
I don't know if Eclipse will re-index every time you switch configurations.

What is the difference between compile code and executable code?

I always use the terms compile and build interchangeably.
What exactly do these terms stand for?
Compiling is the act of turning source code into object code.
Linking is the act of combining object code with libraries into a raw executable.
Building is the sequence composed of compiling and linking, with possibly other tasks such as installer creation.
Many compilers handle the linking step automatically after compiling source code.
From wikipedia:
In the field of computer software, the term software build refers either to the process of converting source code files into standalone software artifact(s) that can be run on a computer, or the result of doing so. One of the most important steps of a software build is the compilation process where source code files are converted into executable code.
While for simple programs the process consists of a single file being compiled, for complex software the source code may consist of many files and may be combined in different ways to produce many different versions.
A build could be seen as a script, which comprises of many steps - the primary one of which would be to compile the code.
Others could be
running tests
reporting (e.g. coverage)
static analysis
pre and post-build steps
running custom tools over certain files
creating installs
labelling them and deploying/copying them to a repository
They often are used to mean the same thing. However, "build" may also mean the full process of compiling and linking a whole application (in the case of e.g. C and C++), or even more, including, among others
packaging
automatic (unit and/or integration) testing
installer generation
installation/deployment
documentation/site generation
report generation (e.g. test results, coverage).
There are systems like Maven, which generalize this with the concept of lifecycle, which consists of several stages, producing different artifacts, possibly using results and artifacts from previous stages.
From my experience I would say that "compiling" refers to the conversion of one or several human-readable source files to byte code (object files in C) while "building" denominates the whole process of compiling, linking and whatever else needs to be done of an entire package or project.
Most people would probably use the terms interchangeably.
You could see one nuance : compiling is only the step where you pass some source file through the compiler (gcc, javac, whatever).
Building could be heard as the more general process of checking out the source, creating a target folder for the compiled artifacts, checking dependencies, choosing what has to be compiled, running automated tests, creating a tar / zip / ditributions, pushing to an ftp, etc...

How to locate a compiler in a path with a version number in it?

I'm trying to design an SConstruct file for an embedded system project. The compiler on my machine is at "C:\Program Files\IAR Systems\Embedded Workbench 5.4\arm\bin" I would like the build system to try to locate the toolchain even if there is another verison of Embedded Workbench installed, or if the user has chosen to install it elsewhere.
I'd also be interested in strategies used in makefiles or ant files since they are probably useful here as well.
What are some strategies for doing this? Do I have options other than searching the Windows registry or looking for "C:\Program Files\IAR Systems\Embedded Workbench *\arm\bin"?
The simplest solution is to use an environment variable. You still have to set that up manually for each build host, but the build system need only refer to the environment variable, so can be common for all build hosts.
For example in your case you might have:
EWBARM_V0504="C:\Program Files\IAR Systems\Embedded Workbench 5.4\arm\bin"
And similar for other versions installed, and then in your build system you would use %EWBARM_V0504% in place of the path. The worse that will happen is if the variable does not exist the build will fail, which is preferable to using the wrong compiler, and easily fixed.
Since different versions of toolchains may have different bugs and/or features, silently falling back onto different sets of tools is probably a bad idea. When I've supported multiple tools versions on a single project, I usually have the version number assigned via a makefile or the environment. Then you can pass -D TOOLS_VERSION=$(TOOLS_VERSION) to your compiler and use that value to key bugfixes and workarounds you need for particular versions of the tools. This system makes it clear which tools you want to support, while still making it easy for other developers to switch tool versions by making a single edit.
The nice thing about SCons is you have all of python at your disposal. So you can use win32.winreg to look in the registry, or glob around in sets of paths, whatever works for you. And of course you can have a command-line option or an options file to override the autodetection. Then once you've found your tool of choice, you have basically two ways to make SCons use it: either prepend the tool's dir to env['ENV']['PATH'] (you can use env.PrependEnvPath for that), or just use the tool's full path as the value of your $CC (and set $LINK, $SHLINK etc. appropriately too).
I usually make a TOOL_MYCOMPILER function that takes an env and sets it all up for use with the compiler and its toolchain (cpp, linker, whatever). It keeps things cleaner in your SConstruct/SConscript.