Does C/C++ program performance depend on compiler? - c++

I read an article in which different compilers were compared to infer which is the best in different circumstances. It gave me a thought. Even though I tried to google, I didn't manage to find a clear and lucid answer: will the program run faster or slower if I use different compilers to compile it? Suppose, it's some uncommon complicated algorithm that is used along with templating.

Yes. The compiler is what writes a program that implements the behavior you've described with your C or C++ code. Different compilers (or even the same compiler, given different options) can come up with vastly different programs that implement the same behavior.
Remember, your CPU does not execute C or C++ code. It only executes machine code. There is no defined standard for how the former gets transformed into the latter.

It may depend on the compiler, compiler version, compiler optimization settings, C++ language version used when compiling, the linker used, linker optimization options and much more. So in short, the answer to your question is Yes.

Related

C++ code compiles differently on different OS

I was wondering so why does c++ code compile differently on different version of the OS. Such as when the same code is complied on the OS no warning or anything will be brought up, but when the same code is complied on a different OS, there will be warnings or errors.
So why does this happen. Is the difference between gcc versions or what actually makes the c++ code unique when its complied on two different OS such Ubuntu 14 and Ubuntu 16. I am just trying to understand how the c++ code is unique to the OS compilation.
C++ as a language is defined by its standard. The standard is an enormous, lawyer-lingo document that defines the language's syntax, rules, standard library, and some guidelines for how compilers should correctly process source code. Compilers, the bridge between the abstract language and real, executable programs, are implemented by different vendors or organizations, and should adhere to that standard as closely as possible. In practice, their correctness varies[1].
Many compiler errors are part of the standard (diagnostics in standardese), and so should in principle be essentially the same across compilers[2]. Compiler warnings generally are less technical, and are often ways that compiler vendors try to help you catch common programming errors that aren't technically ill-formed programs. A program may be ill-formed according to the standard, meaning that it is syntactically invalid and does not represent a real program. Compilers are required by the standard to issue a diagnostic for an ill-formed program.
There are however lesser, more subtle ways that programs can be incorrect though, for example by using what the standard refers to as undefined behavior (UB) and implementation-defined behavior. These are situations where the standard doesn't specify how a compiler should correctly translate source code into a program, and compiler vendors are legally allowed to proceed how they please. While many compilers are likely to produce code that does approximately what you expect it to, invoking undefined behavior in a program is generally a very bad idea because there's no guarantee of any kind how your program will behave. Code with UB that compiles quietly and passes tests on one compiler may fail tests or fail to compile altogether, or encounter a bug at the worst possible time, on a different compiler. The situation gets hairy too if you're using compiler-specific language extensions.
When faced with potential UB, some compilers may offer very helpful advice and others may be misleadingly silent. The best practice would be to be familiar with causes of UB by learning C++ from a good source and reading documentation carefully, both C++ language documentation and that of any libraries you may be using.
[1] Take a look at the 'Standard conformance' columns of the list of C++ compilers at https://en.wikipedia.org/wiki/List_of_compilers#C++_compilers
[2] A comparison of error messages and warnings from three very popular compilers: https://easyaspi314.github.io/gcc-vs-clang.html

Fortran: differences between generated code compiled using two different compilers

I have to work on a fortran program, which used to be compiled using Microsoft Compaq Visual Fortran 6.6. I would prefer to work with gfortran but I have met lots of problems.
The main problem is that the generated binaries have different behaviours. My program takes an input file and then has to generate an output file. But sometimes, when using the binary compiled by gfortran, it crashes before its end, or gives different numerical results.
This a program written by researchers which uses a lot of float numbers.
So my question is: what are the differences between these two compilers which could lead to this kind of problem?
edit:
My program computes the values of some parameters and there are numerous iterations. At the beginning, everything goes well. After several iterations, some NaN values appear (only when compiled by gfortran).
edit:
Think you everybody for your answers.
So I used the intel compiler which helped me by giving some useful error messages.
The origin of my problems is that some variables are not initialized properly. It looks like when compiling with compaq visual fortran these variables take automatically 0 as a value, whereas with gfortran (and intel) it takes random values, which explain some numerical differences which add up at the following iterations.
So now the solution is a better understanding of the program to correct these missing initializations.
There can be several reasons for such behaviour.
What I would do is:
Switch off any optimization
Switch on all debug options. If you have access to e.g. intel compiler, use ifort -CB -CU -debug -traceback. If you have to stick to gfortran, use valgrind, its output is somewhat less human-readable, but it's often better than nothing.
Make sure there are no implicit typed variables, use implicit none in all the modules and all the code blocks.
Use consistent float types. I personally always use real*8 as the only float type in my codes. If you are using external libraries, you might need to change call signatures for some routines (e.g., BLAS has different routine names for single and double precision variables).
If you are lucky, it's just some variable doesn't get initialized properly, and you'll catch it by one of these techniques. Otherwise, as M.S.B. was suggesting, a deeper understanding of what the program really does is necessary. And, yes, it might be needed to just check the algorithm manually starting from the point where you say 'some NaNs values appear'.
Different compilers can emit different instructions for the same source code. If a numerical calculation is on the boundary of working, one set of instructions might work, and another not. Most compilers have options to use more conservative floating point arithmetic, versus optimizations for speed -- I suggest checking the compiler options that you are using for the available options. More fundamentally this problem -- particularly that the compilers agree for several iterations but then diverge -- may be a sign that the numerical approach of the program is borderline. A simplistic solution is to increase the precision of the calculations, e.g., from single to double. Perhaps also tweak parameters, such as a step size or similar parameter. Better would be to gain a deeper understanding of the algorithm and possibly make a more fundamental change.
I don't know about the crash but some differences in the results of numerical code in an Intel machine can be due to one compiler using 80-doubles and the other 64-bit doubles, even if not for variables but perhaps for temporary values. Moreover, floating-point computation is sensitive to the order elementary operations are performed. Different compilers may generate different sequence of operations.
Differences in different type implementations, differences in various non-Standard vendor extensions, could be a lot of things.
Here are just some of the language features that differ (look at gfortran and intel). Programs written to fortran standard work on every compiler the same, but a lot of people don't know what are the standard language features, and what are the language extensions, and so use them ... when compiled with a different compiler troubles arise.
If you post the code somewhere I could take a quick look at it; otherwise, like this, 'tis hard to say for certain.

C++ : In how many ways compiler optimizes away our code?

.
I would like to know all the possible ways (or atleast the popular ones) in which compilers can/do optimize away our code written in C++? I also would like to know how exactly the optimization is done (in each case)!
So far, I'm aware of two optimizations, viz. Empty Base Optimization (EBO) and Return Value Optimization (RVO). What else are there? I've heard of "const" optimization,"unused variable" optimization. What are they?
.
All possible ways? Surely you're joking. For that, look through years of compiler research and practice.
For concrete examples, look up each of the options here: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
From Standard docs., Section 1.9,
4) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard
as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance,
an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the
observable behavior of the program are produced.
So actually the standard compliant compiler can perform any kind of optimizations as long as it produces the desired result.
Incredibly broad, because there are many optimizations and compiler writers are always thinking up more. There are tons of them, some optimizing for run time, others optimizing for binary size. Many aren't specifically C++ either, the general compiler optimization techniques are implemented for many compilers/interpreters for many different languages.
Just a handful:
Dead code removal
Loop unrolling
Constants folding
More info:
Optimizing C++ Applications with Intel C++ Compiler
New Optimizations in VC++ 2010
Generic Compiler Optimizations Wiki

Is a C++ compiler allowed to emit different machine code compiling the same program?

Consider a situation. We have some specific C++ compiler, a specific set of compiler settings and a specific C++ program.
We compile that specific programs with that compiler and those settings two times, doing a "clean compile" each time.
Should the machine code emitted be the same (I don't mean timestamps and other bells and whistles, I mean only real code that will be executed) or is it allowed to vary from one compilation to another?
The C++ standard certainly doesn't say anything to prevent this from happening. In reality, however, a compiler is normally deterministic, so given identical inputs it will produce identical output.
The real question is mostly what parts of the environment it considers as its inputs -- there are a few that seem to assume characteristics of the build machine reflect characteristics of the target, and vary their output based on "inputs" that are implicit in the build environment instead of explicitly stated, such as via compiler flags. That said, even that is relatively unusual. The norm is for the output to depend on explicit inputs (input files, command line flags, etc.)
Offhand, I can only think of one fairly obvious thing that changes "spontaneously": some compilers and/or linkers embed a timestamp into their output file, so a few bytes of the output file will change from one build to the next--but this will only be in the metadata embedded in the file, not a change to the actual code that's generated.
According to the as-if rule in the standard, as long as a conforming program (e.g., no undefined behavior) cannot tell the difference, the compiler is allowed to do whatever it wants. In other words, as long as the program produces the same output, there is no restriction in the standard prohibiting this.
From a practical point of view, I wouldn't use a compiler that does this to build production software. I want to be able to recompile a release made two years ago (with the same compiler, etc) and produce the same machine code. I don't want to worry that the reason I can't reproduce a bug is that the compiler decided to do something slightly different today.
There is no guarantee that they will be the same. Also according to http://www.mingw.org/wiki/My_executable_is_sometimes_different
My executable is sometimes different, when I compile and recompile the same source. Is this normal?
Yes, by default, and by design, ~MinGW's GCC does not produce ConsistentOutput, unless you patch it.
EDIT: Found this post that seems to explain how to make them the same.
I'd bet it would vary every time due to some metadata compiler writes (for instance, c# compiled dlls always vary in some bytes even if I do "build" twice in a row without changing anything). But anyways, I would never rely on that it would not vary.

Are there conclusive studies/experiments on C compilation using a C++ compiler?

I've seen a lot of arguments over the general performance of C code compiled with a C++ compiler -- I'm curious as to whether there are any solid experimental studies buried beneath all the anecdotal flame wars you find in web searches. I'm particularly interested in the GCC suite, but any data points would be interesting. (Comparing the assembly of "Hello, World!" is not as robust as I'd like. :-)
I'm generally assuming you use the "embedded style" flags -- no exceptions or RTTI. I also wouldn't mind knowing if there are studies on the compilation time itself. TIA!
Adding a datapoint (or at least an anecdote):
We were recently writing a math library for a small embedded-like target, and started writing it in C. About halfway through the project, we switched some of the files to C++, largely in order to use templates for some of the functions where we'd otherwise be writing many nearly-identical pieces of code (or else embedding 40-line functions in preprocessor macros).
At the point where we started switching over, we had a very careful look at the generated assembly code (using GCC) on a number of the functions, and confirmed that it was in fact essentially identical whether the file was compiled as C or C++ -- where by "essentially identical" I mean the differences were in things like symbol names and the stuff at the beginning and end of the assembly file; the actual instructions in the middle of the functions were exactly identical.
Sorry that I don't have a more solid answer.
Edit to add, 2013-03-24: Recently I came across an article where Rusty Russell compared performance on GCC compiled with a C compiler and compiled with a C++ compiler, in response to the recent switch to compiling GCC as C++: http://rusty.ozlabs.org/?p=330. The conclusions are interesting: The version compiled with a C++ compiler was very slightly slower; the difference was about 0.3%. However, that was entirely explained by load time differences caused by larger debug info; when he stripped the binaries and removed the debug info, the differences were less than 0.1% -- i.e., essentially indistinguishable from measurement noise.
I don't know of any studies off-hand, but given the C++ philosophy that you don't pay the price for features you don't use, I doubt there'd be any significant difference between compiling C code with the C compiler and with the C++ compiler.
I don't know of any studies and I doubt that anyone will spend the time to do them. Basically, when compiling with a C++ compiler, the code has the same semantic as when compiling with a C compiler, so it's down to optimization and code generation. But IMO these are much too much compiler-specifc in order to allow any general statements about C vs. C++.
What you mainly gain when you compile C code with a C++ compiler is a much stricter checking (function declarations etc.). IMO this would make compiling C code with a C++ compiler quite attractive. But note that, if you have a large C code base that's never run through a C++ compiler, you're likely facing a very steep up-hill battle until the code compiles clean enough to be able to see any meaningful warnings.
The GCC project is currently under a transition from C to C++ - that is, GCC may be implemented in C++ in the future, it is currently written in C. The next release of GCC will be written in the subset of C which is also valid C++.
Some performance tests were performed on g++ vs gcc, on GCC's codebase. They compared the "bootstrap" time, which means compiling gcc with the sysmem compiler, then compiling it with the resulting compiler, then repeating and checking the results are the same.
Summary: Using g++ was 20% slower. The compiler versions were slightly different, but it was thought that this wouldn't cause there 20% difference.
Note that this measures different programs, gcc vs g++, which although they mostly use the same code, have different front-ends.
I've not tried it from a performance standpoint, but I think compiling C applications with the C++ compiler is a good idea, as it will prevent you from doing "naughty" things such as using functions not declared.
However, the output won't be the same - at the very least, you'll get different symbols, which will render it (mostly) unlinkable with code from the C compiler.
So I think what you really mean is "Is it ok from a performance standpoint, to write C++ code which is very C-like and compile it with the C++ compiler" ?
You would also have to not be using some C99 things such as bool_t which C++ doesn't support on the grounds of having its own ones.
Don't do it if the code has not been designed for. The same valid language constructs can lead to different behavior if interpreted as C or as C++. You would potentially introduce very difficult to understand bugs. Less problematic but still a maintainability nightmare; some C constructs (especially from C99) are not valid in C++.
In the past I have done things like look at the size of the binary which for C++ was huge, that doesnt mean they simply linked in a bunch of unusued libraries. The easiest might be to use gcc -S myprog.cpp vs gcc -S myprog.c and diff the assembler output.