Compiling via C/C++ and compiler bugs/limits - c++

I'm looking at the possibility of implementing a high-level (Lisp-like) language by compiling via C (or possibly C++ if exceptions turn out to be useful enough); this is a strategy a number of projects have previously used. Of course, this would generate C code unlike, and perhaps in some dimensions exceeding the complexity of, anything you would write by hand.
Modern C compilers are highly reliable in normal usage, but it's hard to know what bugs might be lurking in edge cases under unusual stresses, particularly if you go over some "well no programmer would ever write an X bigger than Y" hidden limit.
It occurs to me the coincidence of these facts might lead to unhappiness.
Are there any known cases, or is there a good way to find cases, of generated code tripping over edge case bugs/limits in reasonably recent versions of major compilers (e.g. GCC, Microsoft C++, Clang)?

This may not quite be the answer you were looking for, but quite a few years ago, I worked on a project, where parts of the system was written in some higher level language, where it was easy to build state machines in processes. This language produced C-code that was then compiled. The compiler we were using for this project was gcc (version around 2.95 - don't quote me on that, but pre-3.0 for sure). We did run into a couple of code generation bugs, but that was, from my memory more to do with using a not-so-popular processor [revealing which processor may reveal something I shouldn't about the project, so I'd rather not say what it was, even if it was a very long time ago].
A colleague close to me was investigating one of those code generation bugs, which was in a function of around 200k lines, all of the function a big switch-statement, with each case in the switch statement being around 50-1000 lines each (with several layers of sub-switch statements inside it).
From my memory of it, the code was crashing because it produced an invalid operation or stored something in a register already occupied for something else, so once you hit the right bit of code, it would fail due to an illegal memory access - and it had nothing to do with the long size of the code, because my colleague managed to get it down to about 30 lines of code eventually (after a lot of "lets cut this out and see if it still goes wrong"), and after a few days we had a new version of the compiler with a fix. Nice to know that your many thousands of dollars to pay for the compiler service contract is worth having at least sometimes...
My point here is that modern compilers tolerate a lot of large code. There are also minimum limits that "a compliant compiler must support at least of ". For example, I believe (from memory, again), that the compiler needs to support 127 levels of nested statements (that is, a combination of 127 if, switch, while and do-while) within a function. And, from a discussion somewhere (which is where the "the compiler should support 127 levels of nested statements" comes from), we found that MSVC and GCC both support a whole lot more (enough that we gave up on finding it...)

Short answer:
You have no choice if you need the performance of a compiler rather than the ease of life of an interpreter (or pre-compiler + interpreter). You will be compiling into some lower level language, and C is the assembly language of today, with C++ being about as available and as apt for the task as C. There is no reason why you should fear this route. It is actually, in some sense, a very common route.
Long answer:
Generated code is in no way unusual. Even relatively modest uses of generated code are resulting in C source that is "unlike what any programmer would ever write", in terms of quantity (trivial code patterns repeated with small variations millions of times), or quality (combinations of language features that a human would never use but that are still, say, legal C++). There are also quite a few compilers that compile into C or C++, some famous and well known to people who wrote the C and C++ language standards.
The most common C and C++ compilers cope well with generated code. Yes, they are never perfect.
Compilers may have various simple limitations, such as maximum code line
length; these tend to be documented and easy to comply with in your generator once you run into them.
Compilers may have defects.
Some kinds of defects are actually less of concern for generated code than for hand written code. Your code generator actually gives you a degree of freedom to deal with many situations in a systematic way as soon as you start understanding the pattern and caring about the problem.
Defects that actually result in the compiler not compiling valid code correctly are quickly discovered, given enough users of the compiler. They are treated by compiler vendors as particularly high priority defects. Even if the compiler is essentially dead or serving a niche market, so that no fix is available, plenty of information tends to be available on the Internet, including people's existing experience of working around the defect (different compiler, different code construction, different compiler switches... the solution varies a lot and may feel awkward but nobody gives up on their job just because some software is buggy, right)? So there is usually a choice of searchable solutions.
It's often good striving for portability across compilers, but also knowing and tracking the limits of portability. If you have not tested a particular C or C++ compiler very well, do not claim that it would work as part of your toolset.
You are asking an implied question between C and C++. Well, there are more shades of grey here. C++ is a very rich language. You can use almost all C++ features for good purposes within your generator, but in some cases you should ask yourself whether a particular major feature could become a liability, costing you more than it brings to you. For example, different compilers use different strategies for template instantiation. Implicit instantiation can lead to unforeseen complexity with portable generated code. If templates do help you design the generator tremendously, don't hesitate to use them; but if you only have a marginal use case for them, remember that you have a better reason than most people to restrict the language you generate into.

There are all sorts of implementation defined limits in C. Some well defined and visible to the programmer (think numeric limits), others are not. In my copy of the draft standard, section 5.2.4.1 details the lower bounds on these limits:
5.2.4 Environmental limits
Both the translation and execution environments constrain the implementation of
language translators and libraries. The following summarizes the language-related
environmental limits on a conforming implementation; the library-related limits are
discussed in clause 7.
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:18)
— 127 nesting levels of blocks
— 63 nesting levels of conditional inclusion
— 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or void type in a declaration
[...]
18) Implementations should avoid imposing fixed translation limits whenever possible.
I can't say for sure if your translator is likely to hit any of these or if the C compiler(s) you're targeting will have a problem even if you do, but I think you'll probably be fine.
As for bugs:
Clang - http://llvm.org/bugs/buglist.cgi?bug_status=all&product=clang
GCC - http://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=all&product=gcc
Visual Studio - https://connect.microsoft.com/VisualStudio/feedback

It may sound obvious but the only way to really know is by testing. If you can't do all the effort yourself, at least make your product cross-platform so that people can easily test for you! If people like your project, they are usually willing to submit bug reports or even patches for free :)

Related

Why is C++ template use not recommended in a space/radiated environment?

By reading this question, I understood, for instance, why dynamic allocation or exceptions are not recommended in environments where radiation is high, like in space or in a nuclear power plant.
Concerning templates, I don't see why. Could you explain it to me?
Considering this answer, it says that it is quite safe to use.
Note: I'm not talking about complex standard library stuff, but purpose-made custom templates.
Notice that space-compatible (radiation-hardened, aeronautics compliant) computing devices are very expensive (including to launch in space, since their weight exceeds kilograms), and that a single space mission costs perhaps hundred million € or US$. Losing the mission because of software or computer concerns has generally a prohibitive cost so is unacceptable and justifies costly development methods and procedures that you won't even dream using for developing your mobile phone applet, and using probabilistic reasoning and engineering approaches is recommended, since cosmic rays are still somehow an "unusual" event. From a high-level point of view, a cosmic ray and the bit flip it produces can be considered as noise in some abstract form of signal or of input. You could look at that "random bit-flip" problem as a signal-to-noise ratio problem, then randomized algorithms may provide a useful conceptual framework (notably at the meta level, that is when analyzing your safety-critical source code or compiled binary, but also, at critical system run-time, in some sophisticated kernel or thread scheduler), with an information theory viewpoint.
Why C++ template use is not recommended in space/radiated environment?
That recommendation is a generalization, to C++, of MISRA C coding rules and of Embedded C++ rules, and of DO178C recommendations, and it is not related to radiation, but to embedded systems. Because of radiation and vibration constraints, the embedded hardware of any space rocket computer has to be very small (e.g. for economical and energy-consumption reasons, it is more -in computer power- a Raspberry Pi-like system than a big x86 server system). Space hardened chips cost 1000x much as their civilian counterparts. And computing the WCET on space-embedded computers is still a technical challenge (e.g. because of CPU cache related issues). Hence, heap allocation is frowned upon in safety-critical embedded software-intensive systems (how would you handle out-of-memory conditions in these? Or how would you prove that you have enough RAM for all real run time cases?)
Remember that in the safety-critical software world, you not only somehow "guarantee" or "promise", and certainly assess (often with some clever probabilistic reasoning), the quality of your own software, but also of all the software tools used to build it (in particular: your compiler and your linker; Boeing or Airbus won't change their version of GCC cross-compiler used to compile their flight control software without prior written approval from e.g. FAA or DGAC). Most of your software tools need to be somehow approved or certified.
Be aware that, in practice, most C++ (but certainly not all) templates internally use the heap. And standard C++ containers certainly do. Writing templates which never use the heap is a difficult exercise. If you are capable of that, you can use templates safely (assuming you do trust your C++ compiler and its template expansion machinery, which is the trickiest part of the C++ front-end of most recent C++ compilers, such as GCC or Clang).
I guess that for similar (toolset reliability) reasons, it is frowned upon to use many source code generation tools (doing some kind of metaprogramming, e.g. emitting C++ or C code). Observe, for example, that if you use bison (or RPCGEN) in some safety critical software (compiled by make and gcc), you need to assess (and perhaps exhaustively test) not only gcc and make, but also bison. This is an engineering reason, not a scientific one. Notice that some embedded systems may use randomized algorithms, in particular to cleverly deal with noisy input signals (perhaps even random bit flips due to rare-enough cosmic rays). Proving, testing, or analyzing (or just assessing) such random-based algorithms is a quite difficult topic.
Look also into Frama-Clang and CompCert and observe the following:
C++11 (or following) is an horribly complex programming language. It has no complete formal semantics. The people
expert enough in C++ are only a few dozens worldwide (probably, most
of them are in its standard committee). I am capable of coding in
C++, but not of explaining all the subtle corner cases of move
semantics, or of the C++ memory model. Also, C++ requires in practice many optimizations to be used efficiently.
It is very difficult to make an error-free C++ compiler, in particular because C++ practically requires tricky optimizations, and because of the complexity of the C++ specification. But current
ones (like recent GCC or Clang) are in practice quite good, and they have few (but still some)
residual compiler bugs. There is no CompCert++ for C++ yet, and making one requires several millions of € or US$ (but if you can collect such an amount of money, please contact me by email, e.g. to basile.starynkevitch#cea.fr, my work email). And the space software industry is extremely conservative.
It is difficult to make a good C or C++ heap memory allocator. Coding
one is a matter of trade-offs. As a joke, consider adapting this C heap allocator to C++.
proving safety properties (in particular, lack of race conditions or undefined behavior such as buffer overflow at run-time) of template-related C++ code is still, in 2Q2019, slightly ahead of the state of the art of static program analysis of C++ code. My draft Bismon technical report (it is a draft H2020 deliverable, so please skip pages for European bureaucrats) has several pages explaining this in more details. Be aware of Rice's theorem.
a whole system C++ embedded software test could require a rocket launch (a la Ariane 5 test flight 501, or at least complex and heavy experimentation in lab). It is very expensive. Even testing, on Earth, a Mars rover takes a lot of money.
Think of it: you are coding some safety-critical embedded software (e.g. for train braking, autonomous vehicles, autonomous drones, big oil platform or oil refinery, missiles, etc...). You naively use some C++ standard container, e.g. some std::map<std::string,long>. What should happen for out of memory conditions? How do you "prove", or at least "convince", to the people working in organizations funding a 100M€ space rocket, that your embedded software (including the compiler used to build it) is good enough? A decade-year old rule was to forbid any kind of dynamic heap allocation.
I'm not talking about complex standard library stuff but purposed-made custom templates.
Even these are difficult to prove, or more generally to assess their quality (and you'll probably want to use your own allocator inside them). In space, the code space is a strong constraint. So you would compile with, for example, g++ -Os -Wall or clang++ -Os -Wall. But how did you prove -or simply test- all the subtle optimizations done by -Os (and these are specific to your version of GCC or of Clang)? Your space funding organization will ask you that, since any run-time bug in embedded C++ space software can crash the mission (read again about Ariane 5 first flight failure - coded in some dialect of Ada which had at that time a "better" and "safer" type system than C++17 today), but don't laugh too much at Europeans. Boeing 737 MAX with its MACS is a similar mess).
My personal recommendation (but please don't take it too seriously. In 2019 it is more a pun than anything else) would be to consider coding your space embedded software in Rust. Because it is slightly safer than C++. Of course, you'll have to spend 5 to 10 M€ (or MUS$) in 5 or 7 years to get a fine Rust compiler, suitable for space computers (again, please contact me professionally, if you are capable of spending that much on a free software Compcert/Rust like compiler). But that is just a matter of software engineering and software project managements (read both the Mythical Man-Month and Bullshit jobs for more, be also aware of Dilbert principle: it applies as much to space software industry, or embedded compiler industry, as to anything else).
My strong and personal opinion is that the European Commission should fund (e.g. through Horizon Europe) a free software CompCert++ (or even better, a Compcert/Rust) like project (and such a project would need more than 5 years and more than 5 top-class, PhD researchers). But, at the age of 60, I sadly know it is not going to happen (because the E.C. ideology -mostly inspired by German policies for obvious reasons- is still the illusion of the End of History, so H2020 and Horizon Europe are, in practice, mostly a way to implement tax optimizations for corporations in Europe through European tax havens), and that after several private discussions with several members of CompCert project. I sadly expect DARPA or NASA to be much more likely to fund some future CompCert/Rust project (than the E.C. funding it).
NB. The European avionics industry (mostly Airbus) is using much more formal methods approaches that the North American one (Boeing). Hence some (not all) unit tests are avoided (since replaced by formal proofs of source code, perhaps with tools like Frama-C or Astrée - neither have been certified for C++, only for a subset of C forbidding C dynamic memory allocation and several other features of C). And this is permitted by DO-178C (not by the predecessor DO-178B) and approved by the French regulator, DGAC (and I guess by other European regulators).
Also notice that many SIGPLAN conferences are indirectly related to the OP's question.
The argumentation against the usage of templates in safety code is that they are considered to increase the complexity of your code without real benefit. This argumentation is valid if you have bad tooling and a classic idea of safety. Take the following example:
template<class T> fun(T t){
do_some_thing(t);
}
In the classic way to specify a safety system you have to provide a complete description of each and every function and structure of your code. That means you are not allowed to have any code without specification. That means you have to give a complete description of the functionality of the template in its general form. For obvious reasons that is not possible. That is BTW the same reason why function-like macros are also forbidden. If you change the idea in a way that you describe all actual instantiations of this template, you overcome this limitation, but you need proper tooling to prove that you really described all of them.
The second problem is that one:
fun(b);
This line is not a self-contained line. You need to look up the type of b to know which function is actually called. Proper tooling which understands templates helps here. But in this case it is true that it makes the code harder to check manually.
This statement about templates being a cause of vulnerability seems completely surrealistic to me. For two main reasons:
templates are "compiled away", i.e. instantiated and code-generated like any other function/member, and there is no behavior specific to them. Just as if they never existed;
no construction in any language is neither safe or vulnerable; if an ionizing particule changes a single bit of memory, be it in code or in data, anything is possible (from no noticeable problem occurring up to processor crash). The way to shield a system against this is by adding hardware memory error detection/correction capabilities. Not by modifying the code !

Why allow function declarations inside function bodies

Why does the syntax allow function declarations inside function bodies?
It does create a lot of questions here on SO where function declarations are mistaken for variable initializations and the like.
a object();
Not to mention most vexing parse.
Is there any use-case that is not easily achieved by the more common scope hiding means like namespaces and members?
Is it for historical reasons?
Addendum: If for historical reasons, inherited from C to limit scope, what is the problem banning them?
Whilst many C++ applications are written solely in C++ code, there are also a lot of code that is some mixture of C and C++ code. This mixture is definitely something of C++'s important part of its usefulness (including the "easy" interfacing to existing API's, anything from OpenGL or cURL to custom hardware drivers that are written in C, can pretty much be used directly with very little effort, where trying to interface your custom hardware C driver into a Basic interpreter is pretty diffcult)
If we start breaking that compatibility by removing things "for no particular value", then C++ is no longer as useful. Aside from giving better error messages in a condition that is confusing, which of course is useful in itself, it's hard to see how it's useful to REMOVE this - and that's of course assuming NONE of the C++ is using this in itself - and I wouldn't be surprised if it DOES happen at times even in modern code (for whatever good or bad reasons).
In general, C++ tries very hard to not break backwards compatibility - and this, in my mind, is a good thing. That's why the keyword static is used for a bunch of different things, rather than adding a new keyword, and auto means something different now than it used to in C, but it's not a "new" keyword that could break existing code that happened to use whatever other word chosen (and that is a small break, but nobody really used it for the past 20 years anyway).
Well, the ability to declare functions within function bodies is inherited from C so, by definition, there is a reason involving historical and backward-compatibility reasons. When there is likely to be real-world code which uses a feature, the argument to remove that feature from the language is weakened.
People - particularly those who only use the latest version of the language, and are not required to maintain legacy code - do tend to under-estimate how strong an argument backward compatibility is in C++. The original C++ standard was specifically required to maintain backward compatibility with C. As a rough rule, standards discourage removing old features if doing so is likely to break existing code. It can be done, however, if the only possible usage causes a danger that cannot be prevented (which is reason for removal of gets(), for example).
When maintaining legacy code there are often significant costs with updating a code base to replace all instances of an old construct with some modern replacement. A coding change that may be insignificant for a hobbyist programmer may be extremely costly when maintaining large-scale code bases in regulatory environments, where it is necessary to provide formal evidence and audit trail that the change of code does not affect ability to meet its original requirement.
There are certain programming styles where it is useful to be able to limit the scope of any declarations. Not everyone uses such programming styles, but the reason such features are in the language is to allow the programmer the choice of programming technique. And, whether advocates of removing such features like it or not, there is a certain amount of code which uses such constructs usefully. That significantly weakens the case for removing the feature from the language.
These sorts of arguments will tend to come up for languages that are used in large-scale development, to develop systems in regulatory environments, etc etc. C and C++ (and a number of other languages) are used in such settings, so will tend to accumulate some set of features for "historical" or "backward compatibility" reasons. It is possible to make a case for removing such features by providing evidence that the feature is not in real-world use. But, since the argument is about justifying a negative claim, that is difficult (all it needs is someone to provide ONE example of continuing real-world beneficial usage and suddenly a counter-example exists which supports the case for keeping the feature).

How do I find how C++ compiler implements something except inspecting emitted machine code?

Suppose I crafted a set of classes to abstract something and now I worry whether my C++ compiler will be able to peel off those wrappings and emit really clean, concise and fast code. How do I find out what the compiler decided to do?
The only way I know is to inspect the disassembly. This works well for simple code, but there're two drawbacks - the compiler might do it different when it compiles the same code again and also machine code analysis is not trivial, so it takes effort.
How else can I find how the compiler decided to implement what I coded in C++?
I'm afraid you're out of luck on this one. You're trying to find out "what the compiler did". What the compiler did is to produce machine code. The disassembly is simply a more readable form of the machine code, but it can't add information that isn't there. You can't figure out how a meat grinder works by looking at a hamburger.
I was actually wondering about that.
I have been quite interested, for the last few months, in the Clang project.
One of Clang particular interests, wrt optimization, is that you can emit the optimized LLVM IR code instead of machine code. The IR is a high-level assembly language, with the notion of structure and type.
Most of the optimizations passes in the Clang compiler suite are indeed performed on the IR (the last round is of course architecture specific and performed by the backend depending on the available operations), this means that you could actually see, right in the IR, if the object creation (as in your linked question) was optimized out or not.
I know it is still assembly (though of higher level), but it does seem more readable to me:
far less opcodes
typed objects / pointers
no "register" things or "magic" knowledge required
Would that suit you :) ?
Timing the code will directly measure its speed and can avoid looking at the disassembly entirely. This will detect when compiler, code modifications or subtle configuration changes have affected the performance (either for better or worse). In that way it's better than the disassembly which is only an indirect measure.
Things like code size can also serve as possible indicators of problems. At the very least they suggest that something has changed. It can also point out unexpected code bloat when the compiler should have boiled down a bunch of templates (or whatever) into a concise series of instructions.
Of course, looking at the disassembly is an excellent technique for developing the code and helping decide if the compiler is doing a sufficiently good translation. You can see if you're getting your money's worth, as it were.
In other words, measure what you expect and then dive in if you think the compiler is "cheating" you.
You want to know if the compiler produced "clean, concise and fast code".
"Clean" has little meaning here. Clean code is code which promotes readability and maintainability -- by human beings. Thus, this property relates to what the programmer sees, i.e. the source code. There is no notion of cleanliness for binary code produced by a compiler that will be looked at by the CPU only. If you wrote a nice set of classes to abstract your problem, then your code is as clean as it can get.
"Concise code" has two meanings. For source code, this is about saving the scarce programmer eye and brain resources, but, as I pointed out above, this does not apply to compiler output, since there is no human involved at that point. The other meaning is about code which is compact, thus having lower storage cost. This can have an impact on execution speed, because RAM is slow, and thus you really want the innermost loops of your code to fit in the CPU level 1 cache. The size of the functions produced by the compiler can be obtained with some developer tools; on systems which use GNU binutils, you can use the size command to get the total code and data sizes in an object file (a compiled .o), and objdump to get more information. In particular, objdump -x will give the size of each individual function.
"Fast" is something to be measured. If you want to know whether your code is fast or not, then benchmark it. If the code turns out to be too slow for your problem at hand (this does not happen often) and you have some compelling theoretical reason to believe that the hardware could do much better (e.g. because you estimated the number of involved operations, delved into the CPU manuals, and mastered all the memory bandwidth and cache issues), then (and only then) is it time to have a look at what the compiler did with your code. Barring these conditions, cleanliness of source code is a much more important issue.
All that being said, it can help quite a lot if you have a priori notions of what a compiler can do. This requires some training. I suggest that you have a look at the classic dragon book; but otherwise you will have to spend some time compiling some example code and looking at the assembly output. C++ is not the easiest language for that, you may want to begin with plain C. Ideally, once you know enough to be able to write your own compiler, then you know what a compiler can do, and you can guess what it will do on a given code.
You might find a compiler that had an option to dump a post-optimisation AST/representation - how readable it would be is another matter. If you're using GCC, there's a chance it wouldn't be too hard, and that someone might have already done it - GCCXML does something vaguely similar. Of little use if the compiler you want to build your production code on can't do it.
After that, some compiler (e.g. gcc with -S) can output assembly language, which might be usefully clearer than reading a disassembly: for example, some compilers alternate high-level source as comments then corresponding assembly.
As for the drawbacks you mentioned:
the compiler might do it different when it compiles the same code again
absolutely, only the compiler docs and/or source code can tell you the chance of that, though you can put some performance checks into nightly test runs so you'll get alerted if performance suddenly changes
and also machine code analysis is not trivial, so it takes effort.
Which raises the question: what would be better. I can image some process where you run the compiler over your code and it records when variables are cached in registers at points of use, which function calls are inlined, even the maximum number of CPU cycles an instruction might take (where knowable at compile time) etc. and produces some record thereof, then a source viewer/editor that colour codes and annotates the source correspondingly. Is that the kind of thing you have in mind? Would it be useful? Perhaps some more than others - e.g. black-and-white info on register usage ignores the utility of the various levels of CPU cache (and utilisation at run-time); the compiler probably doesn't even try to model that anyway. Knowing where inlining was really being done would give me a warm fuzzy feeling. But, profiling seems more promising and useful generally. I fear the benefits are more intuitively real than actually, and compiler writers are better off pursuing C++0x features, run-time instrumentation, introspection, or writing D "on the side" ;-).
The answer to your question was pretty much nailed by Karl. If you want to see what the compiler did, you have to start going through the assembly code it produced--elbow grease is required. As to discovering the "why" behind the "how" of how it implemented your code...every compiler (and every build, potentially), as you mentioned, is different. There are different approaches, different optimizations, etc. However, I wouldn't worry about whether it's emitting clean, concise machine code--cleanliness and concision should be left to the source code. Speed, on the other hand, is pretty much the programmer's responsibility (profiling ftw). More interesting concerns are correctness, maintainability, readability, etc. If you want to see if it made a specific optimization, the compiler docs might help (if they're available for your compiler). You can also just trying searching to see if the compiler implements a known technique for optimizing whatever. If those approaches fail, though, you're right back to reading assembly code. Keep in mind that the code that you're checking out might have little to no impact on performance or executable size--grab some hard data before diving into any of this stuff.
Actually, there is a way to get what you want, if you can get your compiler to
produce DWARF debugging information. There will be a DWARF description for each
out-of-line function and within that description there will (hopefully) be entries
for each inlined function. It's not trivial to read DWARF, and sometimes compilers
don't produce complete or accurate DWARF, but it can be a useful source of information
about what the compiler actually did, that's not tied to any one compiler or CPU.
Once you have a DWARF reading library there are all sorts of useful tools you can
build around it.
Don't expect to use it with Visual C++ as that uses a different debugging format.
(But you might be able to do similar queries through the debug helper library
that comes with it.)
If your compiler manages to translate your "wrappings and emit really clean, concise and fast code" the effort to follow-up the emitted code should be reasonable.
Contrary to another answer I feel that emitted assembly code may well be "clean" if it is (relatively) easily mappable to the original source code, if it doesn't consist of calls all over the place and that the system of jumps is not too complex. With code scheduling and re-ordering an optimized machine code which is also readable is, alas, a thing of the past.

Is it worth writing part of code in C instead of C++ as micro-optimization?

I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases; but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
I'm going to agree with a lot of the comments. C syntax is supported, intentionally (with divergence only in C99), in C++. Therefore all C++ compilers have to support it. In fact I think it's hard to find any dedicated C compilers anymore. For example, in GCC you'll actually end up using the same optimization/compilation engine regardless of whether the code is C or C++.
The real question is then, does writing plain C code and compiling in C++ suffer a performance penalty. The answer is, for all intents and purposes, no. There are a few tricky points about exceptions and RTTI, but those are mainly size changes, not speed changes. You'd be so hard pressed to find an example that actually takes a performance hit that it doesn't seem worth it do write a dedicate module.
What was said about what features you use is important. It is very easy in C++ to get sloppy about copy semantics and suffer huge overheads from copying memory. In my experience this is the biggest cost -- in C you can also suffer this cost, but not as easily I'd say.
Virtual function calls are ever so slightly more expensive than normal functions. At the same time forced inline functions are cheaper than normal function calls. In both cases it is likely the cost of pushing/popping parameters from the stack that is more expensive. Worrying about function call overhead though should come quite late in the optimization process -- as it is rarely a significant problem.
Exceptions are costly at throw time (in GCC at least). But setting up catch statements and using RAII doesn't have a significant cost associated with it. This was by design in the GCC compiler (and others) so that truly only the exceptional cases are costly.
But to summarize: a good C++ programmer would not be able to make their code run faster simply by writing it in C.
measure! measure before thinking about optimizing, measure before applying optimization, measure after applying optimization, measure!
If you must run your code 1 nanosecond faster (because it's going to be used by 1000 people, 1000 times in the next 1000 days and that second is very important) anything goes.
Yes! it is worth ...
changing languages (C++ to C; Python to COBOL; Mathlab to Fortran; PHP to Lisp)
tweaking the compiler (enable/disable all the -f options)
use different libraries (even write your own)
etc
etc
What you must not forget is to measure!.
pmg nailed it. Just measure instead of global assumptions. Also think of it this way, compilers like gcc separate the front, middle, and back end. so the frontend fortran, c, c++, ada, etc ends up in the same internal middle language if you will that is what gets most of the optimization. Then that generic middle language is turned into assembler for the specific target, and there are target specific optimizations that occur. So the language may or may not induce more code from the front to middle when the languages differ greatly, but for C/C++ I would assume it is the same or very similar. Now the binary size is another story, the libraries that may get sucked into the binary for C only vs C++ even if it is only C syntax can/will vary. Doesnt necessarily affect execution performance but can bulk up the program file costing storage and transfer differences as well as memory requirements if the program loaded as a while into ram. Here again, just measure.
I also add to the measure comment compile to assembler and/or disassemble the output and compare the results of your different languages/compiler choices. This can/will supplement the timing differences you see when you measure.
The question has been answered to death, so I won't add to that.
Simply as a generic question, assuming you have measured, etc, and you have identified that a certain C++ (or other) code segment is not running at optimal speed (which generally means you have not used the right tool for the job); and you know you can get better performance by writing it in C, then yes, definitely, it is worth it.
There is a certain mindset that is common, trying to do everything from one tool (Java or SQL or C++). Not just Maslow's Hammer, but the actual belief that they can code a C construct in Java, etc. This leads to all kinds of performance problems. Architecture, as a true profession, is about placing code segments in the appropriate architectural location or platform. It is the correct combination of Java, SQL and C that will deliver performance. That produces an app that does not need to be re-visited; uneventful execution. In which case, it will not matter if or when C++ implements this constructors or that.
I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
no. keep it readable. if your team prefers c++ or c, prefer that - especially if it is already functioning in production code (don't rewrite it without very good reasons).
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference
then forbid copying and assigning
or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases
could you elaborate? if you are referring to templates, they don't have additional cost in runtime (although they can lead to additional exported symbols, resulting in a larger binary). in fact, using a template method can improve performance if (for example) a conversion would otherwise be necessary.
but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
in my experience, an expert c++ developer can create a faster, more maintainable program.
you have to be selective about the language features that you use (and do not use). if you break c++ features down to the set available in c (e.g., remove exceptions, virtual function calls, rtti) then you're off to a good start. if you learn to use templates, metaprogramming, optimization techniques, avoid type aliasing (which becomes increasingly difficult or verbose in c), etc. then you should be on par or faster than c - with a program which is more easily maintained (since you are familiar with c++).
if you're comfortable using the features of c++, use c++. it has plenty of features (many of which have been added with speed/cost in mind), and can be written to be as fast as c (or faster).
with templates and metaprogramming, you could turn many runtime variables into compile-time constants for exceptional gains. sometimes that goes well into micro-optimization territory.

Languages faster than C++ [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It is said that Blitz++ provides near-Fortran performance.
Does Fortran actually tend to be faster than regular C++ for equivalent tasks?
What about other HL languages of exceptional runtime performance? I've heard of a few languages suprassing C++ for certain tasks... Objective Caml, Java, D...
I guess GC can make much code faster, because it removes the need for excessive copying around the stack? (assuming the code is not written for performance)
I am asking out of curiosity -- I always assumed C++ is pretty much unbeatable barring expert ASM coding.
Fortran is faster and almost always better than C++ for purely numerical code. There are many reasons why Fortran is faster. It is the oldest compiled language (a lot of knowledge in optimizing compilers). It is still THE language for numerical computations, so many compiler vendors make a living of selling optimized compilers. There are also other, more technical reasons. Fortran (well, at least Fortran77) does not have pointers, and thus, does not have the aliasing problems, which plague the C/C++ languages in that domain. Many high performance libraries are still coded in Fortran, with a long (> 30 years) history. Neither C or C++ have any good array constructs (C is too low level, C++ has as many array libraries as compilers on the planet, which are all incompatible with each other, thus preventing a pool of well tested, fast code).
Whether fortran is faster than c++ is a matter of discussion. Some say yes, some say no; I won't go into that. It depends on the compiler, the architecture you're running it on, the implementation of the algorithm ... etc.
Where fortran does have a big advantage over C is the time it takes you to implement those algorithms. And that makes it extremely well suited for any kind of numerical computing. I'll state just a few obvious advantages over C:
1-based array indexing (tremendously helpful when implementing larger models, and you don't have to think about it, but just FORmula TRANslate
has a power operator (**) (God, whose idea was that a power function will do ? Instead of an operator?!)
it has, I'd say the best support for multidimensional arrays of all the languages in the current market (and it doesn't seem that's gonna change so soon) - A(1,2) just like in math
not to mention avoiding the loops - A=B*C multiplies the arrays (almost like matlab syntax with compiled speed)
it has parallelism features built into the language (check the new standard on this one)
very easily connectible with languages like C, python, so you can make your heavy duty calculations in fortran, while .. whatever ... in the language of your choice, if you feel so inclined
completely backward compatible (since whole F77 is a subset of F90) so you have whole century of coding at your disposal
very very portable (this might not work for some compiler extensions, but in general it works like a charm)
problem oriented solving community (since fortran users are usually not cs, but math, phy, engineers ... people with no programming, but rather problem solving experience whose knowledge about your problem can be very helpful)
Can't think of anything else off the top of my head right now, so this will have to do.
What Blitz++ is competing against is not so much the Fortran language, but the man-centuries of work going into Fortran math libraries. To some extent the language helps: an older language has had a lot more time to get optimizing compilers (and , let's face it, C++ is one of the most complex languages). On the other hand, high level C++ libraries like Blitz++ and uBLAS allows you to state your intentions more clearly than relatively low-level Fortran code, and allows for whole new classes of compile-time optimizations.
However, using any library effectively all the time requires developers to be well acquainted with the language, the library and the mathematics. You can usually get faster code by improving any one of the three...
FORTAN is typically faster than C++ for array processing because of the different ways the languages implement arrays - FORTRAN doesn't allow aliasing of array elements, whereas C++ does. This makes the FORTRAN compilers job easier. Also, FORTRAN has many very mature mathematical libraries which have been worked on for nearly 50 years - C++ has not been around that long!
This will depend a lot on the compiler, programmers, whether it has gc and can vary too much. If it is compiled directly to machine code then expect to have better performance than interpreted most of the time but there is a finite amount of optimization possible before you have asm speed anyway.
If someone said fortran was slightly faster would you code a new project in that anyway?
the thing with c++ is that it is very close to the hardware level. In fact, you can program at the hardware level (via assembly blocks). In general, c++ compilers do a pretty good job at optimisations (for a huge speed boost, enable "Link Time Code Generation" to allow the inlining of functions between different cpp files), but if you know the hardware and have the know-how, you can write a few functions in assembly that work even faster (though sometimes, you just can't beat the compiler).
You can also implement you're own memory managers (which is something a lot of other high level languages don't allow), thus you can customize them for your specific task (maybe most allocations will be 32 bytes or less, then you can just have a giant list of 32-byte buffers that you can allocate/deallocate in O(1) time). I believe that c++ CAN beat any other language, as long as you fully understand the compiler and the hardware that you are using. The majority of it comes down to what algorithms you use more than anything else.
You must be using some odd managed XML parser as you load this page then. :)
We continously profile code and the gain is consistently (and this is not naive C++, it is just modern C++ with boos). It consistensly paves any CLR implementation by at least 2x and often by 5x or more. A bit better than Java days when it was around 20x times faster but you can still find good instances and simply eliminate all the System.Object bloat and clearly beat it to a pulp.
One thing managed devs don't get is that the hardware architecture is against any scaling of VM and object root aproaches. You have to see it to believe it, hang on, fire up a browser and go to a 'thin' VM like Silverlight. You'll be schocked how slow and CPU hungry it is.
Two, kick of a database app for any performance, yes managed vs native db.
It's usually the algorithm not the language that determines the performance ballpark that you will end up in.
Within that ballpark, optimising compilers can usually produce better code than most assembly coders.
Premature optimisation is the root of all evil
This may be the "common knowledge" that everyone can parrot, but I submit that's probably because it's correct. I await concrete evidence to the contrary.
D can sometimes be faster than C++ in practical applications, largely because the presence of garbage collection helps avoid the overhead of RAII and reference counting when using smart pointers. For programs that allocate large amounts of small objects with non-trivial lifecycles, garbage collection can be faster than C++-style memory management. Also, D's builtin arrays allow the compiler to perform better optimizations in some cases than C++'s STL vector, which the compiler doesn't understand. Furthermore, D2 supports immutable data and pure function annotations, which recent versions of DMD2 optimize based on. Walter Bright, D's creator, wrote a JavaScript interpreter in both D and C++, and according to him, the D version is faster.
C# is much faster than C++ - in C# I can write an XML parser and data processor in a tenth the time it takes me to write it C++.
Oh, did you mean execution speed?
Even then, if you take the time from the first line of code written to the end of the first execution of the code, C# is still probably faster than C++.
This is a very interesting article about converting a C++ program to C# and the effort required to make the C++ faster than the C#.
So, if you take development speed into account, almost anything beats C++.
OK, to address tht OP's runtime only performance requirement: It's not the langauge, it's the implementation of the language that determines the runtime performance. I could write a C++ compiler that produces the slowest code imaginable, but it's still C++. It is also theoretically possible to write a compiler for Java that targets IA32 instructions rather than the Java VM byte codes, giving a runtime speed boost.
The performance of your code will depend on the fit between the strengths of the language and the requirements of the code. For example, a program that does lots of memory allocation / deallocation will perform badly in a naive C++ program (i.e. use the default memory allocator) since the C++ memory allocation strategy is too generalised, whereas C#'s GC based allocator can perform better (as the above link shows). String manipulation is slow in C++ but quick in languages like php, perl, etc.
It all depends on the compiler, take for example the Stalin Scheme compiler, it beats almost all languages in the Debian micro benchmark suite, but do they mention anything about compile times?
No, I suspect (I have not used Stalin before) compiling for benchmarks (iow all optimizations at maximum effort levels) takes a jolly long time for anything but the smallest pieces of code.
if the code is not written for performance then C# is faster than C++.
A necessary disclaimer: All benchmarks are evil.
Here's benchmarks that in favour of C++.
The above two links show that we can find cases where C++ is faster than C# and vice versa.
Performance of a compiled language is a useless concept: What's important is the quality of the compiler, ie what optimizations it is able to apply. For example, often - but not always - the Intel C++ compiler produces better performing code than g++. So how do you measure the performance of C++?
Where language semantics come in is how easy it is for the programmer to get the compiler to create optimal output. For example, it's often easier to parallelize Fortran code than C code, which is why Fortran is still heavily used for high-performance computation (eg climate simulations).
As the question and some of the answers mentioned assembler: the same is true here, it's just another compiled language and thus not inherently 'faster'. The difference between assembler and other languages is that the programmer - who ideally has absolute knowledge about the program - is responsible for all of the optimizations instead of delegating some of them to the 'dumb' compiler.
Eg function calls in assembler may use registers to pass arguments and don't need to create unnecessary stack frames, but a good compiler can do this as well (think inlining or fastcall). The downside of using assembler is that better performing algorithms are harder to implement (think linear search vs. binary seach, hashtable lookup, ...).
Doing much better than C++ is mostly going to be about making the compiler understand what the programmer means. An example of this might be an instance where a compiler of any language infers that a region of code is independent of its inputs and just computes the result value at compile time.
Another example of this is how C# produces some very high performance code simply because the compiler knows what particular incantations 'mean' and can cleverly use the implementation that produces the highest performance, where a transliteration of the same program into C++ results in needless alloc/delete cycles (hidden by templates) because the compiler is handling the general case instead of the particular case this piece of code is giving.
A final example might be in the Brook/Cuda adaptations of C designed for exotic hardware that isn't so exotic anymore. The language supports the exact primitives (kernel functions) that map to the non von-neuman hardware being compiled for.
Is that why you are using a managed browser? Because it is faster. Or managed OS because it is faster. Nah, hang on, it is the SQL database.. Wait, it must be the game you are playing. Stop, there must be a piece of numerical code Java adn Csharp frankly are useless with. BTW, you have to check what your VM is written it to slag the root language and say it is slow.
What a misconecption, but hey show me a fast managed app so we can all have a laugh. VS? OpenOffice?
Ahh... The good old question - which compiler makes faster code?
It only matters in code that actually spends much time at the bottom of the call stack, i.e. hot spots that don't contain function calls, such as matrix inversion, etc.
(Implied by 1) It only matters in code the compiler actually sees. If your program counter spends all its time in 3rd-party libraries you don't build, it doesn't matter.
In code where it does matter, it all comes down to which compiler makes better ASM, and that's largely a function of how smartly or stupidly the source code is written.
With all these variables, it's hard to distinguish between good compilers.
However, as was said, if you've got a lot of Fortran code to compile, don't re-write it.