Building C/C++ Dependencies with Different Compiler/Linker options with Bazel

Building C/C++ Dependencies with Different Compiler/Linker options with Bazel - c++

So I'm working on modifying a project to build with clang++ and sanitizers (for fuzzing), as opposed to just g++. It builds using bazel.
The project currently downloads some of its build dependencies (m4/bison/flex) and builds those using the make_configure rule in https://github.com/bazelbuild/rules_foreign_cc. Importantly, these are just used for code generation, and aren't linked/compiled against.
Unfortunately, those very dependencies happen to have various sanitizer issues. That means if we build using --copt="-fsanitize=address", we can't then use them for code generation and the build fails! Now, if we ran into sensitization issues with linked dependencies, that would be unavoidable and something we'd need to work with the maintainers to fix, but right now we'd really just prefer to work around those, since they don't as directly affect the security and reliability of the actual target that we're compiling.
Is there an easy way to specify, more or less, "please ignore the compiler flags/linker options/etc. passed on the command line for just this target and use this other set instead" for a rule? It seems like linkopts/copts/cxxopts that are passed via the command line (or via a global config) are additive in most cases, and we'd like to avoid that. If there isn't, what's the best way to approach solving that? Saving/unsetting/resetting all of the variables as we go in a custom rule that wraps the actual build rule?
Thanks,
Everett

So, found my answer eventually.
There's a few approaches that will probably work here. One is a custom SkyLark rule that overrides the tools provided by the command line. That does have the potential advantage of being more "future proof" against other command line options that may not play nicely with some options passed to the compiler/linker.
However, what wound up being easier was learning the fact that the gcc/clang family of compilers has a fun behavior that plays nicely with bazel--if you specify "conflicting" command line parameters, it appears that the later one is always taken. Therefore, all we had to do is ensure that "-fno-sanitize=address" was passed into the build rule for the targets that we wanted to turn off ASAN for, while we passed "-fsanitize=address" into the global build.
Hope this helps anyone else who gets stuck on a problem like this.
--Everett

"The Correct Way" would be to define your own toolchain(s) and therein add a feature, e.g.:
asan_feature = feature(
name = "address_sanitizer",
enabled = True,
flag_set(
actions = all_compile_actions + all_link_actions,
flag_groups = [
flag_group(flags = ["-fsanitize=address"]),
],
),
],
)
that you pass along with features to create_cc_toolchain_config_info().
You can make this feature enabled by default or not for any given toolchain and you can control features used build wide on CLI, per package and per target (e.g. cc_binary):
cc_binary(
...
features = ["-address_sanitizer"],
)
The obvious downside is, there is some initial work involved.
The upside being, compiler config / behavior really is (should be) a function of toolchain configuration and not something for each target to fiddle with. You get an abstract structure that is independent from the specific compiler underlying any given toolchain.

Related

Running Halide generators from cmake with the most optimal compiler flags and configurations

OK, so: I have successfully integrated the first working Halide generator into the cmake build system for my little image-processing project.
The generator implements an image-resizing and -resampling algorithm, based on the example code from the Halide codebase – Halide/apps/resize/resize.cpp – I adapted the sample in order to leverage generator parameters, and tied the generators’ compilation and invocation to my cmake script using the functions defined in HalideGenerator.cmake, just as the Halide project does in its own build script.
All this works great, so far – but my domain expertise is lacking in the realm of code-generation nuances. For example, I tweaked the scheduling method to get the best observed empirical speed on my laptop – but despite taking many long tinkering sessions and code-reading sojourns into the depths of Halide’s many generator-related tools and scripts, I have only the most superficial understanding of the code-generation process.
Specifically, I don’t know how to approach this. Is it best to use defaults or try to turn on specific options for my target platform – and if the latter, do I have to have conditional code somewhere, or can the binary include fallbacks?
Here’s what I am talking about: in the source for Halide tutorial lesson #15, there’s a complex script that invokes a generator with various options. Here’s a snippet from code comments in this script:
# If you're compiling and linking multiple Halide pipelines, then the
# multiple copies of the runtime should combine into a single copy
# (via weak linkage). If you're compiling and linking for multiple
# different targets (e.g. avx and non-avx), then the runtimes might be
# different, and you can't control which copy of the runtime the
# linker selects.
# You can control this behavior explicitly by compiling your pipelines
# with the no_runtime target flag. Let's generate and link several
# different versions of the first pipeline for different x86 variants: [snip]
… from this it is hard to separate what must be done, from what should be done, or what may be done, discretionally. Comparatively, one doesn’t have to deal with these issues when setting up C++ or Objective-C projects (even more Byzantine examples) as the compiler and linker make most these decisions for you, and at most need a flag or two.
My question is: how can I integrate the Halide generator’s output library binaries into my existing project – such that the generator output is as fast as possible (e.g. uses GPU, SSE2/3, AVX2 etc) without further constraining portability (e.g. it won’t mysteriously segfault on a slightly different machine)?
Specifically, what should my process be – as in, should I only target the lowest-common-denominator at first, and then leverage more exotic processor features incrementally?
EDIT: As I mentioned in comments below, this is what my GenGen binary outputs to stdout when invoked with no options:

As of recently, AOT-compilation now supports generating customizations for multiple CPU features with runtime detection. Just use GenGen with a comma-separated list of targets and static_library as the output, e.g.
GenGen -e static_library,h target=x86-64-linux-sse41-avx,x86-64-linux-sse41,x86-64-linux
This will produce a .a file that contains:
3 versions of your code, compiled with specializations for AVX+SSE4.1, SSE4.1, and plain-old-x86-64
a thin wrapper that uses the halide_can_use_target_features() runtime call to choose the right one for the actual target machine
See Func::compile_to_multitarget_static_library() and multitarget_generator.cpp/multitarget_aottest.cpp for more information.

For the case of pre-generating your binaries (AOT), it sounds like you want runtime dispatch. Your program will examine the CPU/GPU environment at startup and decide what features (AVX, OpenCL, etc.) should be used. This is not Halide specific.
Select a set of advanced features to target (high powered desktop GPU) as a best case and a set of minimal features that will work on every machine (SSE2 only).
Build a DLL/dylib/so for each of these feature sets that contains every performance hungry function. These can be scheduled differently or even built with completely different Func definitions. You can have both sets in the same source code file and test the Target object at generation time to choose between them.
At program startup, see if your best case features are present and, if so, load that library and use it. If any features are missing, default to the most compatible version.
You are free to choose how many feature sets and libraries you want to support.
The alternative is to compile your Halide code at program startup (JIT). I recommend using the Target object returned by get_jit_target_from_environment(), which uses the contents of the environment variable HL_JIT_TARGET or "host" if that variable is not set. The "host" target string is the same as get_host_target() and means Halide will examine the CPU/GPU environment and set whatever features it finds. You can then dynamically test the Target object and use GPU or CPU scheduling.

Config file location and binaries and build systems like autoconf

Most build systems, like autoconf/automake, allow the user to specify a target directory to install the various files needed to run a program. Usually this includes binaries, configuration files, auxilliary scripts, etc.
At the same time, many executables often need to read from a configuration file in order to allow a user to modify runtime settings.
Ultimately, a program (let's say, a compiled C or C++ program) needs to know where to look to read in a configuration file. A lot of times I will just hardcode the path as something like /etc/MYPROGAM/myprog.conf, which of course is not a great idea.
But in the autoconf world, the user might specify an install prefix, meaning that the C/C++ code needs to somehow be aware of this.
One solution would be to specify a C header file with a .in prefix, which simply is used to define the location of the config file, like:
const char* config_file_path = "#CONFIG_FILE_PATH#"; // `CONFIG_FILE_PATH` is defined in `configure.ac`.
This file would be named something like constants.h.in and it would have to be process by the configure.ac file to output an actual header file, which could then be included by whatever .c or .cpp files need it.
Is that the usual way this sort of thing is handled? It seems a bit cumbersome, so I wonder if there is a better solution.

There are basically two choices for how to handle this.
One choice is to do what you've mentioned -- compile the relevant paths into the resulting executable or library. Here it's worth noting that if files are installed in different sub-parts of the prefix, then each such thing needs its own compile-time path. That's because the user might specify --prefix separately from --bindir, separately from --libexecdir, etc. Another wrinkle here is that if there are multiple installed programs that refer to each other, then this process probably should take into account the program name transform (see docs on --program-transform-name and friends).
That's all if you want full generality of course.
The other approach is to have the program be relocatable at runtime. Many GNU projects (at least gdb and gcc) take this approach. The idea here is for the program to attempt to locate its data in the filesystem at runtime. In the projects I'm most familiar with, this is done with the libiberty function make_relative_prefix; but I'm sure there are other ways.
This approach is often touted as being nicer because it allows the program's install tree to be tared up and delivered to users; but in the days of distros it seems to me that it isn't as useful as it once was. I think the primary drawback of this approach is that it makes it very hard, if not impossible, to support both relocation and the full suite of configure install-time options.
Which one you pick depends, I think, on what your users want.
Also, to answer the above comment: I think changing the prefix between configure- and build time is not really supported, though it may work with some packages. Instead the usual way to handle this is either to require the choice at configure time, or to supported the somewhat more limited DESTDIR feature.

Protection against accidental object incompatibility?

TL;DR
Protection against binary incompatibility resulting from compiler argument typos in shared, possibly templated headers' preprocessor directives, which control conditional compilation, in different compilation units?
Ex.
g++ ... -DYOUR_NORMAl_FLAG ... -o libA.so
/**Another compilation unit, or even project. **/
g++ ... -DYOUR_NORMA1_FLAG ... -o libB.so
/**Another compilation unit, or even project. **/
g++ ... -DYOUR_NORMAI_FLAG ... main.cpp libA.so //The possibilities!
The Basic Story
Recently, I ran into a strange bug: the symptom was a single SIGSEGV, which always seemed to occur at the same location after recompling. This led me to believe there was some kind of memory corruption going on, and the actual underlying pointer is not a pointer at all, but some data section.
I save you from the long and strenuous journey taking almost two otherwise perfectly good work days to track down the problem. Sufficient to say, Valgrind, GDB, nm, readelf, electric fence, GCC's stack smashing protection, and then some more measures/methods/approaches failed.
In utter devastation, my attention turned to the finest details in the build process, which was analogous to:
Build one small library.
Build one large library, which uses the small one.
Build the test suite of the large library.
Only in case when the large library was used as a static, or a dynamic library dependency (ie. the dynamic linker loaded it automatically, no dlopen)
was there a problem. The test case where all the code of the library was simply included in the tests, everything worked: this was the most important clue.
The "Solution"
In the end, it turned out to be the simplest thing: a single (!) typo.
Turns out, the compilation flags differed by a single char in the test suite, and the large library: a define, which was controlling the behavior of the small library, was misspelled.
Critical information morsel: the small library had some templates. These were used directly in every case, without explicit instantiation in advance. The contents of one of the templated classes changed when the flag was toggled: some data fields were simply not present in case the flag was defined!
The linker noticed nothing of this. (Since the class was templated, the resultant symbols were weak.)
The code used dynamic casts, and the class affected by this problem was inheriting from the mangled class -> things went south.
My question is as follows: how would you protect against this kind of problem? Are there any tools or solutions which address this specific issue?
Future Proofing
I've thought of two things, and believe no protection can be built on the object file level:
1: Save options implemented as preprocessor symbols in some well defined place, preferably extracted by a separate build step. Provide check script which uses this to check all compiler defines, and defines in user code. Integrate this check into the build process. Possibly use Levenshtein distance or similar to check for misspellings. Expensive, and the script / solution can get complicated. Possible problem with similar flags (but why have them?), additional files must accompany compiled library code. (Well, maybe with DWARF 2, this is untrue, but let's assume we don't want that.)
2: Centralize build options: cheap, customization option left open (think makefile.local), but makes monolithic monstrosities, strong project couplings.
I'd like to go ahead and quench a few likely flame inducing embers possibly flaring up in some readers: "do not use preprocessor symbols" is not an option here.
Conditional compilation does have it's place in high performance code, and doing everything with templates and enable_if-s would needlessly overcomplicate things. While the above solution is usually not desirable it can arise form the development process.
Please assume you have no control over the situation, assume you have legacy code, assume everything you can to force yourself to avoid side-stepping.
If those won't do, generalize into ABI incompatibility detection, though this might escalate the scope of the question too much for SO.
I'm aware of:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
DT_SONAME is not applicable.
Other version schemes therein are not applicable either - they were designed to protect a package which is in itself not faulty.
Mixing C++ ABIs to build against legacy libraries
Static analysis tool to detect ABI breaks in C++

If it matters, don't have a default case.
#ifdef YOUR_NORMAL_FLAG
// some code
#elsif YOUR_SPECIAL_FLAG
// some other code
#else
// in case of a typo, this is a compilation error
# error "No flag specified"
#endif
This may lead to a large list of compiler options if conditional compilation is overused, but there are ways around this like defining config-files
flag=normal
flag2=special
which get parsed by build scripts and generate the options and can possibly check for typos or could be parsed directly from the Makefile.

When full build and when partial build?

Hi I am trying to find out when full build is required and when partial build is sufficient.
There are many articals but I am not able to find the specific answers.
Below are my thoughts
Full build is required when :
1.Change in build of dependent modules.
---change in build option or using optimization techniques.
2.changes in the object layout:
---Any change in the headder file, adding and deleting of new methods in class .
---Changing object size by adding or removing of variables or virtual functions.
---Data alignment changes using pragma pack.
3.Any change in global variables
Partial build is sufficient when:
1.Any change in the logic as long as it is not altering the interface specified
2.change in stack variable

In the ideal world a full build should never be necessary, because all the build tools detecting automatically if one of their dependencies have changed.
But this is true only in the ideal world. Practically build tools are written by humans and humans
make failures, so the tools may not take every possible change into account,
are lazy, so the tools may not take any change into account.
For you this means you have to have some experience with your build tools. With a good written makefile may take everything into account and you rarely have to do a full build. But in the 21st century a makefile is not really state of the art any more, and they become complex very soon. Todays development environments do a fairly good job in finding dependencies, but for larger projects, you may have dependencies which are hard to put in the concept of your development environment and you will writing a script.
So there is no real answer to your question. In practise it is good to do a full rebuild for every release, then this rebuild should be done by pressing just one button. And do a partial build for daily work, since nobody wants to wait 2 hours to see if is code is compilable or not. But even in dayly work a full rebuild is sometimes neccessary because the linker/compiler/(your choice of tool here) had not recognized even the simplest change.

Where do I learn "what I need to know" about C++ compilers?

I'm just starting to explore C++, so forgive the newbiness of this question. I also beg your indulgence on how open ended this question is. I think it could be broken down, but I think that this information belongs in the same place.
(FYI -- I am working predominantly with the QT SDK and mingw32-make right now and I seem to have configured them correctly for my machine.)
I knew that there was a lot in the language which is compiler-driven -- I've heard about pre-compiler directives, but it seems like someone would be able to write books the different C++ compilers and their respective parameters. In addition, there are commands which apparently precede make (like qmake, for example (is this something only in QT)).
I would like to know if there is any place which gives me an overview of what compilers are out there, and what their different options are. I'd also like to know how each of them views Makefiles (it seems that there is a difference in syntax between them?).
If there is no website regarding, "Everything you need to know about C++ compilers but were afraid to ask," what would be the best way to go about learning the answers to these questions?

Concerning the "numerous options of the various compilers"
A piece of good news: you needn't worry about the detail of most of these options. You will, in due time, delve into this, only for the very compiler you use, and maybe only for the options that pertain to a particular set of features. But as a novice, generally trust the default options or the ones supplied with the make files.
The broad categories of these features (and I may be missing a few) are:
pre-processor defines (now, you may need a few of these)
code generation (target CPU, FPU usage...)
optimization (hints for the compiler to favor speed over size and such)
inclusion of debug info (which is extra data left in the object/binary and which enables the debugger to know where each line of code starts, what the variables names are etc.)
directives for the linker
output type (exe, library, memory maps...)
C/C++ language compliance and warnings (compatibility with previous version of the compiler, compliance to current and past C Standards, warning about common possible bug-indicative patterns...)
compile-time verbosity and help
Concerning an inventory of compilers with their options and features
I know of no such list but I'm sure it probably exists on the web. However, suggest that, as a novice you worry little about these "details", and use whatever free compiler you can find (gcc certainly a great choice), and build experience with the language and the build process. C professionals may likely argue, with good reason and at length on the merits of various compilers and associated runtine etc., but for generic purposes -and then some- the free stuff is all that is needed.
Concerning the build process
The most trivial applications, such these made of a single unit of compilation (read a single C/C++ source file), can be built with a simple batch file where the various compiler and linker options are hardcoded, and where the name of file is specified on the command line.
For all other cases, it is very important to codify the build process so that it can be done
a) automatically and
b) reliably, i.e. with repeatability.
The "recipe" associated with this build process is often encapsulated in a make file or as the complexity grows, possibly several make files, possibly "bundled together in a script/bat file.
This (make file syntax) you need to get familiar with, even if you use alternatives to make/nmake, such as Apache Ant; the reason is that many (most?) source code packages include a make file.
In a nutshell, make files are text files and they allow defining targets, and the associated command to build a target. Each target is associated with its dependencies, which allows the make logic to decide what targets are out of date and should be rebuilt, and, before rebuilding them, what possibly dependencies should also be rebuilt. That way, when you modify say an include file (and if the make file is properly configured) any c file that used this header will be recompiled and any binary which links with the corresponding obj file will be rebuilt as well. make also include options to force all targets to be rebuilt, and this is sometimes handy to be sure that you truly have a current built (for example in the case some dependencies of a given object are not declared in the make).
On the Pre-processor:
The pre-processor is the first step toward compiling, although it is technically not part of the compilation. The purposes of this step are:
to remove any comment, and extraneous whitespace
to substitute any macro reference with the relevant C/C++ syntax. Some macros for example are used to define constant values such as say some email address used in the program; during per-processing any reference to this constant value (btw by convention such constants are named with ALL_CAPS_AND_UNDERSCORES) is replace by the actual C string literal containing the email address.
to exclude all conditional compiling branches that are not relevant (the #IFDEF and the like)
What's important to know about the pre-processor is that the pre-processor directive are NOT part of the C-Language proper, and they serve several important functions such as the conditional compiling mentionned earlier (used for example to have multiple versions of the program, say for different Operating Systems, or indeed for different compilers)
Taking it from there...
After this manifesto of mine... I encourage to read but little more, and to dive into programming and building binaries. It is a very good idea to try and get a broad picture of the framework etc. but this can be overdone, a bit akin to the exchange student who stays in his/her room reading the Webster dictionary to be "prepared" for meeting native speakers, rather than just "doing it!".

Ideally you shouldn't need to care what C++ compiler you are using. The compatability to the standard has got much better in recent years (even from microsoft)
Compiler flags obviously differ but the same features are generally available, it's just a differently named option to eg. set warning level on GCC and ms-cl
The build system is indepenant of the compiler, you can use any make with any compiler.

That is a lot of questions in one.
C++ compilers are a lot like hammers: They come in all sizes and shapes, with different abilities and features, intended for different types of users, and at different price points; ultimately they all are for doing the same basic task as the others.
Some are intended for highly specialized applications, like high-performance graphics, and have numerous extensions and libraries to assist the engineer with those types of problems. Others are meant for general purpose use, and aren't necessarily always the greatest for extreme work.
The technique for using each type of hammer varies from model to model—and version to version—but they all have a lot in common. The macro preprocessor is a standard part of C and C++ compilers.
A brief comparison of many C++ compilers is here. Also check out the list of C compilers, since many programs don't use any C++ features and can be compiled by ordinary C.
C++ compilers don't "view" makefiles. The rules of a makefile may invoke a C++ compiler, but also may "compile" assembly language modules (assembling), process other languages, build libraries, link modules, and/or post-process object modules. Makefiles often contain rules for cleaning up intermediate files, establishing debug environments, obtaining source code, etc., etc. Compilation is one link in a long chain of steps to develop software.
Also, many development environments abstract the makefile into a "project file" which is used by an integrated development environment (IDE) in an attempt to simplify or automate many programming tasks. See a comparison here.
As for learning: choose a specific problem to solve and dive in. The target platform (Linux/Windows/etc.) and problem space will narrow the choices pretty well. Which you choose is often linked to other considerations, such as working for a particular company, or being part of a team. C++ has something like 95% commonality among all its flavors. Learn any one of them well, and learning the next is a piece of cake.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js