How can I use ccache with gcc -march=native across multiple architectures? - c++

The -march=native option to gcc generates different code depending on the architecture of the host. ccache does not store the machine architecture in its hash which means that if you change the architecture of the machine, for example to switch to a high performance VPS node, the cached object files may be incompatible.
How can I make sure I get the correct object files while still taking advantage of caching?

ccache does not store the architecture, but it does store the compiler flags that were used when building an object for the first time. Therefore a potential solution to your problem could be to use a thin wrapper script that would expand -march=native to the actual set of flags (e.g. using something like this), before passing them to ccache.
(I will, of course, leave the actual implementation as an exercise to the reader..)

You can make sure to use machines with identical architectures in your build farm. Apart from that, I don't see how you can solve that problem.
Also remember that if you use -march=native then anyone who wants to run your binary needs to have the same (or possibly, newer but backwards-compatible) machine architecture. Which may or may not be a problem.

Related

Are there any downsides to compiling with -g flag?

GDB documentation tells me that in order to compile for debugging, I need to ask my compiler to generate debugging symbols. This is done by specifying a '-g' flag.
Furthermore, GDB doc recommends I'd always compile with a '-g' flag. This sounds good, and I'd like to do that.
But first, I'd like to find out about downsides. Are there any penalties involved with compiling-for-debugging in production code?
I am mostly interested in:
GCC as the compiler of choice
Red hat Linux as target OS
C and C++ languages
(Although information about other environments is welcome as well)
Many thanks!
If you use -g (which on recent GCC or Clang can be used with optimization flags like -O2):
compilation time is slower (and linking will use a lot more memory)
the executable is a bigger file (see elf(5) and use readelf(1)...)
the executable carries a lot of information about your source code.
you can use GDB easily
some interesting libraries, like Ian Taylor's libbacktrace, requires DWARF information (e.g. -g)
If you don't use -g it would be harder to use the GDB debugger (but possible).
So if you transmit the binary executable to a partner that should not understand how your source code was written, you need to avoid -g
See also the strip(1) and strace(1) commands.
Notice that using the -g flag for debugging information is also valid for Ocaml, Rust
PS. Recent GCC (e.g. GCC 10 or GCC 11 in 2021) accept many debugger flags. With -g3 your executable carries more debug information (e.g. description of C++ macros and their expansion) that with -g or -g1. Of course, compilation time increases, and executable size also. In principle, your GCC plugin (perhaps Bismon in 2021, or those inside the source code of the Linux kernel) could add even more debug information. In practice, you won't do that unless you can improve your debugger. However, a GCC plugin (or some #pragmas) can remove some debug information (e.g. remove debug information for a selected set of functions).
Generally, adding debug information increases the size of the binary files (or creates extra files for the debug information). That's nowadays usually not a problem, unless you're distributing it over slow networks. And of course this debug information may help others in analyzing your code, if they want to do that. Typically, the -g flag is used together with -O0 (the default), which disables compiler optimization and generates code that is as close as possible to the source, so that debugging is easier. While you can use debug information together with optimizations enabled, this is really tricky, because variables may not exist, or the sequence of instructions may be different than in the source. This is generally only done if an error needs to be analyzed that only happens after the optimizations are enabled. Of course, the downside of -O0 is poorer performance.
So as a conclusion: Typically one uses -g -O0 during development, and for distribution or production code just -O3.

Disable std::map::at()

I'm compiling some code with gcc4.7, which was written for c++11, but I'd like it to be compatible with gcc4.4. The weird thing is that code with std::map::at() (which is only supposed to be defined in c++11) used doesn't seem to give me compiling errors, even after I remove the -std=c++11 flag. I'd like to be getting compiler errors, since this code has to be shared with colleagues who may not be using gcc4.7. Is this normal? Is there some way to restrict the behavior of std::map?
Apparently it is not possible to achieve this with a new gcc and new libraries, at least without compiling them yourself.
As a practical solution, assuming you have a relatively modern PC (6+GB of memory, perhaps 4GB will do), I suggest you
Install an older Linux distro in a virtual machine, which has both the desired old gcc, and matching old standard libraries. This is far less hassle, than trying to set up an alternative compiler and library environment in your main development OS.
Keep your sources in version control, if you already don't.
Either set up a script in the old VM to check out and build the software manually, or go the extra mile, and set up a Jenkins on the VM, and create a job to poll your version control repo and do a test build automatically when you do commit on your main development environment.
Good thing about this is, you can easily set up as many different environments and OSes as you want to keep compatibility with, and still keep the main development OS up to date with latest versions.
Original answer for the ideal world where things work right:
To get strict C++03, use these flags:
-std=c++03 -pedantic
Also, if you only want to support gcc, you may want -std=g++03 "standard" instead, but unless there is some specific feature, say C99-style VLA, which you really want to use, then I'd recommend against that. You never know what compiler you or someone else may want to use in the future.
As a side note, also recommended (at least if you want to fix the warnings too): -Wall -Wextra
Sad reality looks like selecting the C++ standard indeed does not solve the problem. As far as I can tell, this is not really a problem in the gcc compiler, it is a problem in the GNU C++ standard library, which evidently does not check the desired C++ standard version (with #ifdefs in header files). If it bothers you, you might consider filing a bug report (if there already isn't one, though I did not find one with a quick search).

Why isn't ccache used with gcc more often?

I've been wondering...
Are there some limitations with ccache?
If the difference in later compile times are so large,
why aren't more Linux developers using ccache more often?
I guess that the simple answer is that ccache is great when the build system is broken (i.e. the dependencies are not correctly tracked, and to get everything built correctly you might need make clean; make). On the other hand, if dependencies are correctly tracked, then ccache will not yield any advantage over plain make, and will actually incur the cost of maintaining the cache and updating it (the size of the cache might be huge depending on the size of the project)

Intermediate code from C++

I want to compile a C++ program to an intermediate code. Then, I want to compile the intermediate code for the current processor with all of its resources.
The first step is to compile the C++ program with optimizations (-O2), run the linker and do most of the compilation procedure. This step must be independent of operating system and architecture.
The second step is to compile the result of the first step, without the original source code, for the operating system and processor of the current computer, with optimizations and special instructions of the processor (-march=native). The second step should be fast and with minimal software requirements.
Can I do it? How to do it?
Edit:
I want to do it, because I want to distribute a platform independent program that can use all resources of the processor, without the original source code, instead of distributing a compilation for each platform and operating system. It would be good if the second step be fast and easy.
Processors of the same architecture may have different features. X86 processors may have SSE1, SSE2 or others, and they can be 32 or 64 bit. If I compile for a generic X86, it will lack of SSE optimizations. After many years, processors will have new features, and the program will need to be compiled for new processors.
Just a suggestion - google clang and LLVM.
How much do you know about compilers? You seem to treat "-O2" as some magical flag.
For instance, register assignment is a typical optimization. You definitely need to now how many registers are available. No point in assigning foo to register 16, and then discover in phase 2 that you're targetting an x86.
And those architecture-dependent optimizations can be quite complex. Inlining depends critically on call cost, and that in turn depends on architecture.
Once you get to "processor-specific" optimizations, things get really tricky. It's really tough for a platform-specific compiler to be truly "generic" in its generation of object or "intermediate" code at an appropriate "level": Unless it's something like "IL" (intermediate language) code (like the C#-IL code, or Java bytecode), it's really tough for a given compiler to know "where to stop" (since optimizations occur all over the place at different levels of the compilation when target platform knowledge exists).
Another thought: What about compiling to "preprocessed" source code, typically with a "*.i" extension, and then compile in a distributed manner on different architectures?
For example, most (all) the C and C++ compilers support something like:
cc /P MyFile.cpp
gcc -E MyFile.cpp
...each generates MyFile.i, which is the preprocessed file. Now that the file has included ALL the headers and other #defines, you can compile that *.i file to the target object file (or executable) after distributing it to other systems. (You might need to get clever if your preprocessor macros are specific to the target platform, but it should be quite straight-forward with your build system, which should generate the command line to do this pre-processing.)
This is the approach used by distcc to preprocess the file locally, so remote "build farms" need not have any headers or other packages installed. (You are guaranteed to get the same build product, no matter how the machines in the build farm are configured.)
Thus, it would similarly have the effect of centralizing the "configuration/pre-processing" for a single machine, but provide cross-compiling, platform-specific compiling, or build-farm support in a distributed manner.
FYI -- I really like the distcc concept, but the last update for that particular project was in 2008. So, I'd be interested in other similar tools/products if you find them. (In the mean time, I'm writing a similar tool.)

Optimization and flags for making a static library with g++

I am just starting with g++ compiler on Linux and got some questions on the compiler flags. Here are they
Optimizations
I read about optimization flags -O1, -O2 and -O3 in the g++ manual page. I didn't understood when to use these flags. Usually what optimization level do you use? The g++ manual says the following for -O2.
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. The compiler does not perform loop unrolling or function inlining when you specify -O2. As compared to -O, this option increases both compilation time and the performance of the generated code.
If it is not doing inlining and loop unrolling, how the said performance befits are achieved and is this option recommended?
Static Library
How do I create a static library using g++? In Visual Studio, I can choose a class library project and it will be compiled into "lib" file. What is the equivalent in g++?
The rule of thumb:
When you need to debug, use -O0 (and -g to generate debugging symbols.)
When you are preparing to ship it, use -O2.
When you use gentoo, use -O3...!
When you need to put it on an embedded system, use -Os (optimize for size, not for efficiency.)
The gcc manual list all implied options by every optimization level. At O2, you get things like constant folding, branch prediction and co, which can change significantly the speed of your application, depending on your code. The exact options are version dependent, but they are documented in great detail.
To build a static library, you use ar as follows:
ar rc libfoo.a foo.o foo2.o ....
ranlib libfoo.a
Ranlib is not always necessary, but there is no reason for not using it.
Regarding when to use what optimization option - there is no single correct answer.
Certain optimization levels may, at times, decrease performance. It depends on the kind of code you are writing and the execution pattern it has, and depends on the specific CPU you are running on.
(To give a simple canonical example - the compiler may decide to use an optimization that makes your code slightly larger than before. This may cause a certain part of the code to no longer fit into the instruction cache, at which point many more accesses to memory would be required - in a loop, for example).
It is best to measure and optimize for whatever you need. Try, measure and decide.
One important rule of thumb - the more optimizations are performed on your code, the harder it is to debug it using a debugger (or read its disassembly), because the C/C++ source view gets further away from the generated binary. It is a good rule of thumb to work with fewer optimizations when developing / debugging for this reason.
There are many optimizations that a compiler can perform, other than loop unrolling and inlining. Loop unrolling and inlining are specifically mentioned there since, although they make the code faster, they also make it larger.
To make a static library, use 'g++ -c' to generate the .o files and 'ar' to archive them into a library.
In regards to the Static library question the answer given by David Cournapeau is correct but you can alternatively use the 's' flag with 'ar' rather than running ranlib on your static library file. The 'ar' manual page states that
Running ar s on an archive is equivalent to running ranlib on it.
Whichever method you use is just a matter of personal preference.