Compiler optimizations on map/set in c++ - c++

Does compiler make optimzation on data structures when input size is small ?
unordered_set<int>TmpSet;
TmpSet.insert(1);
TmpSet.insert(2);
TmpSet.insert(3);
...
...
Since since is small using hashing would be not required we can simply store this in 3variables. Do optimization like this happen ? If yes who is responsible for them ?
edit: Replaced set with unordered_set as former doesn't do hasing.

Possible in theory to totally replace whole data structures with a different implementation. (As long as escape analysis can show that a reference to it couldn't be passed to separately-compiled code).
But in practice what you've written is a call to a constructor, and then three calls to template functions which aren't simple. That's all the compiler sees, not the high-level semantics of a set.
Unless it's going to optimize stuff away, real compilers aren't going to use a different implementation which would have different object representations.
If you want to micro-optimize like this, in C++ you should do it yourself, e.g. with a std::bitset<32> and set bits to indicate set membership.
It's far too complex a problem for a compiler to reliably do a good job. And besides, there could be serious quality-of-implementation issues if the compiler starts inventing different data-structures that have different speed/space tradeoffs, and it guesses wrong and uses one that doesn't suit the use-case well.
Programmers have a reasonable expectation that their compiled code will work somewhat like what they wrote, at least on a large scale. Including calling those standard header template functions with their implementation of the functions.
Maybe there's some scope for a C++ implementation adapting to the use-case, but I that would need much debate and probably a special mechanism to allow it in practice. Perhaps name-recognition of std:: names could be enough, making those into builtins instead of header implementations, but currently the implementation strategy is to write those functions in C++ in .h headers. e.g. in libstdc++ or libc++.

Related

In general, does using C++ templates produce larger executables than doing the same code with macros?

In C, when you'd like to do generic programming, your only language-supported option is macros. They work great and are widely used, but are discouraged if you can get by with inline functions or regular functions instead. (If using gcc, you can also use gcc statement expressions, which avoid the double-evaluation "bug". Example.)
C++, however, has done away with the so-called "evils" of macros by creating templates. I'm still somewhat new to the full-blown gigantic behemoth of a language that is C++ (I assess it must have like 4 or 5x as many features and language constructs as C), and generally have favored macros or gcc statement expressions, but am being pressured more and more to use templates in their place. This begs the question: in general, when doing generic programming in C++, which will produce smaller executables: macros or templates?
If you say, "size doesn't matter, choose safety over size", I'm going to go ahead and stop you right there. For large computers and application programming, this may be true, but for microcontroller programming on an Arduino, ATTiny85 with 8KB Flash space for the program, or other small devices, that's hogwash. Size matters too, so tradeoffs must be made.
Which produces smaller executables for the same code when doing generic programming? Macros or templates? Any additional insight is welcome.
Related:
Do c++ templates make programs slow?
Side note:
Some things can only be done with macros, NOT templates. Take, for example, non-name-mangled stringizing/stringifying and X macros. More on X macros:
Real-world use of X-Macros
https://www.geeksforgeeks.org/x-macros-in-c/
https://en.wikipedia.org/wiki/X_Macro
At this time of history, 2020, this is only the job of the optimizer. You can achieve better speed with assembly, the point is that it's not worth in both size and speed. With proper C++ programming your code will be fast enough and small enough. Getting faster or smaller by messing the readability of the code is not worth the trouble.
That said, macros replace stuff at the preprocessor level, templates do that at the compile level. You may get faster compilation time with macros, but a good compiler will optimize them more that macros. This means that you can have the same exe size, or possibly less with templates.
The vast, 99%, troubles of speed or size in an application comes from the programmers errors, not from the language. Very often I discover that some photo resources are PNG instead of proper JPG in my executable and voila, I have a bloat. Or that I accidentally forgot to use weak_ptr to break a reference and now I have two shared pointers that share 100MB of memory that will not be freed. It's almost always the human error.
... in general, when doing generic programming in C++, which will produce smaller executables: macros or templates?
Measure it. There shouldn't be a significant difference assuming you do a good job writing both versions (see the first point above), and your compiler is decent, and your code is equally sympathetic to both.
If you write something that is much bigger with templates - ask a specific question about that.
Note that the linked question's answer is talking about multiple non-mergeable instantiations. IME function templates are very often inlined, in which case they behave very much like type-safe macros, and there's no reason for the inlining site to be larger if it's otherwise the same code. If you start taking the addresses of function template instantiations, for example, that changes.
... C++ ... generally have favored macros ... being pressured more and more to use templates in their place.
You need to learn C++ properly. Perhaps you already have, but don't use templates much: that still leaves you badly-placed to write a good comparison.
This begs the question
No, it prompts the question.

Are the Optimization Keywords in C and C++ Reasonable?

So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.

Why doesn't compiler help us with type traits instead of resorting to language quirks?

Type traits are cool and I've used them since they originated in boost a few years ago. However, when you look at their implementation (check out "How is_base_of works?" StackOverflow thread).
Why won't compiler help here? For example, if you want to check if some class is base of another, compiler already knows that, why can't it tell us? This would make things like concepts so much easier to implement and use. You could use language constructs right there.
I am not sure, but I am guessing that it would increase general performance. It is like asking compiler for help, instead of C++ language.
I suspect that the primary reason will sound something like "we need to maintain backwards compatibility" and I agree, but why won't the compiler be more active in generating generic templated code?
Actually... some do.
The thing is that if something can be implemented in pure C++ code, there is no reason, other than simplifying the code, to hard-wire them in the compiler. It is then a matter of trade-off, is the value brought by the code simplification worth the hard-wiring ?
This depends on several points:
correctness (some times software may only partially emulate the trait)
complexity of the code ~~ maintenance burden
performance
...
Once all those points have been weighted, then you can determine whether it's more advantageous to put things in the library or the compiler; and the more likely situation is that you will end up with a mixed strategy: a couple intrinsics in the compiler used as building blocks to provide the required interface in the library.
Note that the maintenance burden is much more significant in a compiler: any C++ developer sufficiently acquainted with the language can delve into a library implementation, whereas the compiler code is a black-box. Therefore, it'll be much easier to debug and patch the library than the compiler, so there is incentive not to put things in the compiler unless you have a very good reason to.
It's hard to give an objective answer here, but I suspect the following.
The code using language quirks to find out this stuff has often already been written (Boost, etc).
The compiler does not have to be changed to implement this if it can be done with language quirks (which saves a lot of time in writing, compiling, debugging and testing).
It's basically a "don't fix what isn't broken" mentality.
Compiler help for type traits has always been a design goal. TR1 formally introduced type traits, and included a section that described acceptable incorrect results in some cases to enable writing the type traits in straight C++. When those type traits were added to C++11 (with some name changes that don't affect their implementation) the allowance for incorrect results was removed, effectively requiring compiler help to implement some of them. And even for those that can be implemented in straight C++, compiler writers prefer intrinsics to complicated templates so that you don't have to put a drip pan under your computer to catch the slag as the overworked compiler causes the computer to melt down.

Few questions about C++ compilers : GCC, MSVC, Clang, Comeau etc

I've few questions about C++ compilers
Are C++ compilers required to be one-pass compiler? Does the Standard talk about it anywhere?
In particular, is GCC one-pass compiler? If it is, then why does it generate the following error twice in this example (though the template argument is different in each error message)?
error: declaration of ‘adder<T> item’ shadows a parameter
error: declaration of ‘adder<char [21]> item’ shadows a parameter
A more general question
What are the advantages and disadvantages of one-pass compiler and multi-pass compiler?
Useful links:
A List of C/C++ compilers (wikipedia)
An incomplete list of C++ compilers (Bjarne Stroustrup's site)
The standard sets no requirements what so ever with regards to
how a compiler is implemented. But what do you mean by
"one-pass"? Most compilers today do only read the input file
once. They create an in memory representation (often in the
form of some sort of parse tree), and may make multiple passes
over that. And almost certainly make multiple passes over parts
of it. The compiler must make a "pass" over the internal
representation of a template each time it is instantiated, for
example; there's no way of avoiding that. G++ also makes
a "pass" over the template when it is defined, before any
instantiation, and reports some errors then. (The standard
committee expressedly designed templates to allow a maximum of
error detection at the point of definition. This is the
motivation behind the requirement for typename in certain
places, for example.) Even without templates, a compiler will
generally have to make two passes over a class definition if
there are functions defined in it.
With regards to the more general question, again, I think you'd
have to define exactly what you mean by "one-pass". I don't
know of any compiler today which reads the source file several
times, but almost all will visit some or all of the nodes in the
parse tree more than once. Is this one-pass or multi-pass? The
distinction was more significant in the past, when memory wasn't
sufficient to maintain much of the source code in an internal
representation. Languages like Pascal and, to a lesser degree
C, were sometimes designed to be easy to implement with a single
pass compiler, since a single pass compiler would be
significantly faster. Today, this issue is largely irrelevant,
and modern languages, including C++, tend to ignore it; where
C++ seems to conform to the needs of a one-pass compiler, it's
largely for reasons of C compatibility, and where
C compatibility is not an issue (e.g. in a class definition), it
often makes order of declaration irrelevant.
From what I know, 30 years ago it was important for a compiler to be one-pass, because reads and writes to disk (or magnetic tape) were very slow and there was not enough memory to hold whole code (thanks James Kanze). Also, a single-pass is a requirement for scripting/interactive languages.
Nowdays compilers are usually not one-pass, there are several intermediate representations (e.g Abstract Syntax Tree or Static Single Assignment Form) that the code is transformed into and then analised/optimised.
Some elements in C++ cannot be solved without some intermediate steps, e.g. in a class you can reference members which are defined only later in the class body. Also, all templates need to be somehow remembered for further access during instantiation.
What does not happen usually, is that the source code is not parsed several times --- there is no need for that. So you should not experience same syntactic error being reported several times.
No, I would be surprised if you found a heavily used C++ single pass compiler.
No, it does multiple passes and even different optimizations based on the flags you pass it.
Advantages (single-pass): fast! Since all the source only needs to be examined once the compilation phase (and thus beginning of execution) can happen very quickly. It is also a model that is attractive because it makes the compiler easy to understand and often times "easier" to implement. (I worked on a single pass Pascal compiler once, but don't encounter them often, whereas single pass interpreters are common)
Disadvantages (sinlge-pass): Optimization, semantic/syntactic analysis. Sometimes a single code look lets things through that are easily caught by simple mechanisms in multiple passes. (kind of why we have things like JSLint)
Advantages (multi-pass): optimizations, semantic/syntactic analysis. Even pseudo interpreted languages like "JRuby" go through a pipeline compilation process to get to java/jvm bytecode before execution, you could consider this multi-pass and the multiple looks at the varying representations (and consequently the resulting optimizations) of code can make it very fast.
Disadvantages (multi-pass): complexity, sometimes time (depending on if AOT/JIT is being used as your compilation method)
Also, single-pass is pretty common in academia to help learn the aspects of compiler design.
Walter Bright, the developer of the first C++ compiler, has stated that he believes it is not possible to compile C++ without at least 3 passes. And, yes, that means 3 full text-transforming passes over the source, not just traversals through an internal tree representation. See his Dr. Dobb's magazine article, "Why is C++ compilation so slow?" So any hope of finding a true one-pass compiler seems doomed. (I think this was part of the motivation Bright had to develop D, his C++ alternative.)
The compiler only needs to look at the sources once top down, but that does not mean that it does not have to process the parsed contents more than once. In particular with templates, it has to instantiate the templated code with the type, and that cannot happen until the template is used (or explicitly instantiated by the user), which is the reason for your duplicate errors:
When the template is defined, the compiler detects an error and at that point the type has not been substituted. When the actual instantiation occurs it substitutes the template arguments and processes the result, which is what triggers the second error. Note that if the template was specialized after the first definition, and before the instantiation, for that particular type, the second error need not occur.

Why STL implementation is so unreadable? How C++ could have been improved here?

For instance why does most members in STL implementation have _M_ or _ or __ prefix?
Why there is so much boilerplate code ?
What features C++ is lacking that would allow make vector (for instance) implementation clear and more concise?
Implementations use names starting with an underscore followed by an uppercase letter or two underscores to avoid conflicts with user-defined macros. Such names are reserved in C++.
For example, one could define a macro called Type and then #include <vector>. If vector implementations used Type as a template parameter name, it would break.
However, one is not allowed to define macros called _Type (or __type, type__ etc.). Therefore, vector can safely use such names.
Lots of STL implementations also include checking for debug builds, such as verifying that two iterators are from the same container when comparing them, and watching for iterators going out of bounds. This involves fairly complex code to track the container and validity of every iterator created, but is invaluable for finding bugs. This code is also all interwoven with the standard release code with #ifdefs - even in the STL algorithms. So it's never going to be as clear as their most basic operation. Sites like this one show the most basic functionality of STL algorithms, stating their functionality is "equivalent to" the code they show. You won't see that in your header files though.
In addition to the good reasons robson and AshleysBrain have already given, one reason that C++ standard library implementations have such terse names and compact code is that virtually every C++ program (compilation unit, really) includes a large number of the standard library headers, and they are thus repeatedly recompiled (remember that they're largely inlined and template-based, whereas the C standard library headers only contain a handful of function declarations). A standard library written to "industry standard" style guidelines would take longer to compile and thus lead to the perception that a particular compiler was "slow". By minimizing whitespace and using short identifier names, the lexer and parser have less work to do, and the whole compilation process completes a little bit faster.
Another reason worth mentioning is that many standard library implementations (e.g. Dinkumware, Rogue Wave (old), etc.) can be used with several different compilers with widely different standards compliance and quirks. There's frequently a lot of macro hackery aimed at satisfying each supported platform.