Will a compiler remove effectless references? [duplicate] - c++

This question already has answers here:
What exactly is the "as-if" rule?
(3 answers)
Closed 3 years ago.
Situation is that I'd like to use descriptive variable names as member variables so they are well understandable in headers (for example: min_distance_to_polygon). Yet in complex algorithms I'd find it smoother to have much shorter variable names because the context is clear (for example min_dist in that case).
So in the definition of a method I'd just write:
int & min_dist = min_distance_to_polygon;
Does this cause an overhead after compilation and would this be acceptable coding style?
EDIT: Would this be better (as it prevents a possible copy)?
int & min_dist{min_distance_to_polygon};

Does this cause an overhead after compilation
Not with an optimizing compiler, no. This is bread-and-butter optimization for a compiler. In fact, even copying the value would likely not cause any overhead (assuming that it remains unchanged) due to the compiler's value tracking and/or how CPU registers actually work behind the scenes (see Register Renaming).
and would this be acceptable coding style?
That's opinion-based and debatable. I posit that there exists code where this is a reasonable choice, but that such code is rare. In the end it's up to you to judge whether future readers will find either version easier to read and comprehend.
Would this be better (as it prevents a possible copy)?
The two code snippets you show are exactly identical in their semantics - both are initialization. No operator= is invoked (even conceptually) in X x = y;.

Will a compiler remove effectless references?
It depends. It may, if it can. The language doesn't mandate optimisation, it is permitted.
Does this cause an overhead after compilation and would this be acceptable coding style?
Overhead would be highly unlikely in this case, assuming the compiler optimises the program.
In general, you can verify that a reference is optimised away by comparing the generated assembly with or without the reference. If the generated assembly is identical, then there will not be any overhead.
More generally, you can verify whether any change has significant overhead by measuring.
Would this be better
int & min_dist{min_distance_to_polygon};
It would be effectively identical.

Related

Are these allowed optimizations in C++? [duplicate]

This question already has answers here:
Why don't compilers merge redundant std::atomic writes?
(9 answers)
Can atomic loads be merged in the C++ memory model?
(2 answers)
Closed 2 years ago.
Let std::atomic<std::int64_t> num{0}; be defined somewhere accessible/visible in the code.
Is the C++ compiler allowed to replace each of the following two codes with an empty code (something that does nothing)? Similarly, are these optimizations allowed to happen at runtime?
I am just trying to get a better understanding of how things work.
num.fetch_add(1,std::memory_order_relaxed);
num.fetch_sub(1,std::memory_order_relaxed);
and
num.fetch_add(1,std::memory_order_relaxed);
std::this_thread::yield();
num.fetch_sub(1,std::memory_order_relaxed);
I think in theory yes and even yield does not help.
But in practice no not today but possible in the future.
See:
N4455 No Sane Compiler Would Optimize Atomics
P0062R1: When should compilers optimize atomics?
"Runtime optimization" may happen if modifications coalesce. I don't know if this situation may happen in practice or not. Anyway it is not much distinguishable from "no other threads manage to observe modified value before it changes back"
In fact, the optimization effect is equivalent to "no other threads manage to observe modified value before it changes back", no matter if it is compiler optimization, or runtime. Yield is not guaranteed to help, at least because it just "gives opportunity to reschedule", which an implementation might decide to ignore. And in theory there's no synchronization between yield and an atomic operation.
On the other hand what do you expect to achieve with this?

Does using "this" keyword in c++ have any impact on performance? [duplicate]

This question already has answers here:
Is there any overhead using this-> for accessing a member?
(5 answers)
Closed 7 years ago.
For quite a long time I've been using javascript where this keyword is mandatory. Now, I'm programming in c++ but habit of using this keyword remained. But the real question is - Does using this keyword have any negative impact on performance (as in unnecessary memory access)? I mean - is code omitting this more optimization friendly for compiler or it completely doesn't matter? Because well strictly theoretically, referring to this is kind of referring to pointer, like opcode $reg0, [$reg1] in assembler, which could add one more memory reference in code but i guess it should be handled by compiler in more clever way than just typical pointer, am i right?
Personally I prefer using this because I feel a bit lost in code which doesn't use it as i don't know if some variables are members or they're local or global or what, but if it'd cause performance issues I could probably force myself to avoid it.
No, an optimizing compiler (for C++11 or better) would very probably generate the same binary code for this->field and for field, and likewise for this->memberfun(2,3) and memberfun(2,3).
(probably, even without optimization, the same -inefficient- code would be produced; but I am not sure of that)
Sometimes (notably when coding templates) this is required because omitting it has different meaning (I forgot for which weird cases).
Of course, the compilation time might be slightly different. But you should not care.
Some coding conventions require, or (more often) on the contrary forbid, using this for readability reasons. Choose whatever convention you like, but be consistent.
See also this answer explaining how practically this is handled on Linux x86-64 ABI.
But the real question is - Does using this keyword have any negative impact on performance (as in unnecessary memory access)?
no.
which could add one more memory reference in code but i guess it should be handled by compiler in more clever way than just typical pointer, am i right?
yes
:-)

When should you not use [[carries_dependency]]?

I've found questions (like this one) asking what [[carries_dependency]] does, and that's not what I'm asking here.
I want to know when you shouldn't use it, because the answers I've read all make it sound like you can plaster this code everywhere and magically you'd get equal or faster code. One comment said the code can be equal or slower, but the poster didn't elaborate.
I imagine appropriate places to use this is on any function return or parameter that is a pointer or reference and that will be passed or returned within the calling thread, and it shouldn't be used on callbacks or thread entry points.
Can someone comment on my understanding and elaborate on the subject in general, of when and when not to use it?
EDIT: I know there's this tome on the subject, should any other reader be interested; it may contain my answer, but I haven't had the chance to read through it yet.
In modern C++ you should generally not use std::memory_order_consume or [[carries_dependency]] at all. They're essentially deprecated while the committee comes up with a better mechanism that compilers can practically implement.
And that hopefully doesn't require sprinkling [[carries_dependency]] and kill_dependency all over the place.
2016-06 P0371R1: Temporarily discourage memory_order_consume
It is widely accepted that the current definition of memory_order_consume in the standard is not useful. All current compilers essentially map it to memory_order_acquire. The difficulties appear to stem both from the high implementation complexity, from the fact that the current definition uses a fairly general definition of "dependency", thus requiring frequent and inconvenient use of the kill_dependency call, and from the frequent need for [[carries_dependency]] annotations. Details can be found in e.g. P0098R0.
Notably that in C++ x - x still carries a dependency but most compilers would naturally break the dependency and replace that expression with a constant 0. But also compilers do sometimes turn data dependencies into control dependencies if they can prove something about value-ranges after a branch.
On modern compilers that just promote mo_consume to mo_acquire, fully aggressive optimizations can always happen; there's never anything to gain from [[carries_dependency]] and kill_dependency even in code that uses mo_consume, let alone in other code.
This strengthening to mo_acquire has potentially-significant performance cost (an extra barrier) for real use-cases like RCU on weakly-ordered ISAs like POWER and ARM. See this video of Paul E. McKenney's CppCon 2015 talk C++ Atomics: The Sad Story of memory_order_consume. (Link includes a summary).
If you want real dependency-ordering read-only performance, you have to "roll your own", e.g. by using mo_relaxed and checking the asm to verify it compiled to asm with a dependency. (Avoid doing anything "weird" with such a value, like passing it across functions.) DEC Alpha is basically dead and all other ISAs provide dependency ordering in asm without barriers, as long as the asm itself has a data dependency.
If you don't want to roll your own and live dangerously, it might not hurt to keep using mo_consume in "simple" use-cases where it should be able to work; perhaps some future mo_consume implementation will have the same name and work in a way that's compatible with C++11.
There is ongoing work on making a new consume, e.g. 2018's http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html
because the answers I've read all make it sound like you can plaster
this code everywhere and magically you'd get equal or faster code
The only way you can get faster code is when that annotation allows the omission of a fence.
So the only case where it could possibly be useful is:
your program uses consume ordering on an atomic load operation, in an important frequently executed code;
the "consume value" isn't just used immediately and locally, but also passed to other functions;
the target CPU gives specific guarantees for consuming operations (as strong as a given fence before that operation, just for that operation);
the compiler writers take their job seriously: they manage to translate high level language consuming of a value to CPU level consuming, to get the benefit from CPU guarantees.
That's a bunch of necessary conditions to possibly get measurably faster code.
(And the latest trend in the C++ community is to give up inventing a proper compiling scheme that's safe in all cases and to come up with a completely different way for the user to instruct the compiler to produce code that "consumes" values, with much more explicit, naively translatable, C++ code.)
One comment said the code can be equal or slower, but the poster
didn't elaborate.
Of course annotations of the kind that you can randomly put on programs simply cannot make code more efficient in general! That would be too easy and also self contradictory.
Either some annotation specifies a constrain on your code, that is a promise to the compiler, and you can't put it anywhere where it doesn't correspond an guarantee in the code (like noexcept in C++, restrict in C), or it would break code in various ways (an exception in a noexcept function stops the program, aliasing of restricted pointers can cause funny miscompilation and bad behavior (formerly the behavior is not defined in that case); then the compiler can use it to optimize the code in specific ways.
Or that annotation doesn't constrain the code in any way, and the compiler can't count on anything and the annotation does not create any more optimization opportunity.
If you get more efficient code in some cases at no cost of breaking program with an annotation then you must potentially get less efficient code in other cases. That's true in general and specifically true with consume semantic, which imposes the previously described constrained on translation of C++ constructs.
I imagine appropriate places to use this is on any function return or
parameter that is a pointer or reference and that will be passed or
returned within the calling thread
No, the one and only case where it might be useful is when the intended calling function will probably use consume memory order.

Is it worth writing part of code in C instead of C++ as micro-optimization?

I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases; but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
I'm going to agree with a lot of the comments. C syntax is supported, intentionally (with divergence only in C99), in C++. Therefore all C++ compilers have to support it. In fact I think it's hard to find any dedicated C compilers anymore. For example, in GCC you'll actually end up using the same optimization/compilation engine regardless of whether the code is C or C++.
The real question is then, does writing plain C code and compiling in C++ suffer a performance penalty. The answer is, for all intents and purposes, no. There are a few tricky points about exceptions and RTTI, but those are mainly size changes, not speed changes. You'd be so hard pressed to find an example that actually takes a performance hit that it doesn't seem worth it do write a dedicate module.
What was said about what features you use is important. It is very easy in C++ to get sloppy about copy semantics and suffer huge overheads from copying memory. In my experience this is the biggest cost -- in C you can also suffer this cost, but not as easily I'd say.
Virtual function calls are ever so slightly more expensive than normal functions. At the same time forced inline functions are cheaper than normal function calls. In both cases it is likely the cost of pushing/popping parameters from the stack that is more expensive. Worrying about function call overhead though should come quite late in the optimization process -- as it is rarely a significant problem.
Exceptions are costly at throw time (in GCC at least). But setting up catch statements and using RAII doesn't have a significant cost associated with it. This was by design in the GCC compiler (and others) so that truly only the exceptional cases are costly.
But to summarize: a good C++ programmer would not be able to make their code run faster simply by writing it in C.
measure! measure before thinking about optimizing, measure before applying optimization, measure after applying optimization, measure!
If you must run your code 1 nanosecond faster (because it's going to be used by 1000 people, 1000 times in the next 1000 days and that second is very important) anything goes.
Yes! it is worth ...
changing languages (C++ to C; Python to COBOL; Mathlab to Fortran; PHP to Lisp)
tweaking the compiler (enable/disable all the -f options)
use different libraries (even write your own)
etc
etc
What you must not forget is to measure!.
pmg nailed it. Just measure instead of global assumptions. Also think of it this way, compilers like gcc separate the front, middle, and back end. so the frontend fortran, c, c++, ada, etc ends up in the same internal middle language if you will that is what gets most of the optimization. Then that generic middle language is turned into assembler for the specific target, and there are target specific optimizations that occur. So the language may or may not induce more code from the front to middle when the languages differ greatly, but for C/C++ I would assume it is the same or very similar. Now the binary size is another story, the libraries that may get sucked into the binary for C only vs C++ even if it is only C syntax can/will vary. Doesnt necessarily affect execution performance but can bulk up the program file costing storage and transfer differences as well as memory requirements if the program loaded as a while into ram. Here again, just measure.
I also add to the measure comment compile to assembler and/or disassemble the output and compare the results of your different languages/compiler choices. This can/will supplement the timing differences you see when you measure.
The question has been answered to death, so I won't add to that.
Simply as a generic question, assuming you have measured, etc, and you have identified that a certain C++ (or other) code segment is not running at optimal speed (which generally means you have not used the right tool for the job); and you know you can get better performance by writing it in C, then yes, definitely, it is worth it.
There is a certain mindset that is common, trying to do everything from one tool (Java or SQL or C++). Not just Maslow's Hammer, but the actual belief that they can code a C construct in Java, etc. This leads to all kinds of performance problems. Architecture, as a true profession, is about placing code segments in the appropriate architectural location or platform. It is the correct combination of Java, SQL and C that will deliver performance. That produces an app that does not need to be re-visited; uneventful execution. In which case, it will not matter if or when C++ implements this constructors or that.
I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
no. keep it readable. if your team prefers c++ or c, prefer that - especially if it is already functioning in production code (don't rewrite it without very good reasons).
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference
then forbid copying and assigning
or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases
could you elaborate? if you are referring to templates, they don't have additional cost in runtime (although they can lead to additional exported symbols, resulting in a larger binary). in fact, using a template method can improve performance if (for example) a conversion would otherwise be necessary.
but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
in my experience, an expert c++ developer can create a faster, more maintainable program.
you have to be selective about the language features that you use (and do not use). if you break c++ features down to the set available in c (e.g., remove exceptions, virtual function calls, rtti) then you're off to a good start. if you learn to use templates, metaprogramming, optimization techniques, avoid type aliasing (which becomes increasingly difficult or verbose in c), etc. then you should be on par or faster than c - with a program which is more easily maintained (since you are familiar with c++).
if you're comfortable using the features of c++, use c++. it has plenty of features (many of which have been added with speed/cost in mind), and can be written to be as fast as c (or faster).
with templates and metaprogramming, you could turn many runtime variables into compile-time constants for exceptional gains. sometimes that goes well into micro-optimization territory.

Complicated code for obvious operations

Sometimes, mainly for optimization purposes, very simple operations are implemented as complicated and clumsy code.
One example is this integer initialization function:
void assign( int* arg )
{
__asm__ __volatile__ ( "mov %%eax, %0" : "=m" (*arg));
}
Then:
int a;
assign ( &a );
But actually I don't understand why is it written in this way...
Have you seen any example with real reasons to do so?
In the case of your example, I think it is a result of the fallacious assumption that writing code in assembly is automatically faster.
The problem is that the person who wrote this didn't understand WHY assembly can sometimes run faster. That is, you know more than the compiler what you are trying to do and can sometimes use this knowledge to write code at a lower level that is more performant based on not having to make assumptions that the compiler will.
In the case of a simple variable assignment, I seriously doubt that holds true and the code is likely to perform slower because it has the additional overhead of managing the assign function on the stack. Mind you, it won't be noticeably slower, the main cost here is code that is less readable and maintainable.
This is a textbook example of why you shouldn't implement optimizations without understanding WHY it is an optimization.
It seems that the assembly code intent was to ensure that the assignment to the *arg int location will be done every time - preventing (on purpose) any optimization from the compiler in this regard.
Usually the volatile keyword is used in C++ (and C...) to tell the compiler that this value should not be kept in a register (for instance) and reused from that register (optimization in order to get the value faster) as it can be changed asynchronously (by an external module, an assembly program, an interruption etc...).
For instance, in a function
int a = 36;
g(a);
a = 21;
f(a);
in this case the compiler knows that the variable a is local to the function and is not modified outside the function (a pointer on a is not provided to any call for instance). It may use a processor register to store and use the a variable.
In conclusion, that ASM instruction seems to be injected to the C++ code in order not to perform some optimizations on that variable.
While there are several reasonable justifications for writing something in assembly, in my experience those are uncommonly the actual reason. Where I've been able to study the rationale, they boil down to:
Age: The code was written so long ago that it was the most reasonable option for dealing with compilers of the era. Typically, before about 1990 can be justified, IMHO.
Control freak: Some programmers have trust issues with the compiler, but aren't inclined to investigate its actual behavior.
Misunderstanding: A surprisingly widespread and persistent myth is that anything written in assembly language inherently results in more efficient code than writing in a "clumsy" compiler—what with all its mysterious function entry/exit code, etc. Certainly a few compilers deserved this reputation
To be "cool": When time and money are not factors, what better way to strut a programmer's significantly elevated hormone levels than some macho, preferably inscrutable, assembly language?
The example you give seems flawed, in that the assign() function is liable to be slower than directly assigning the variable, reason being that calling a function with arguments involves stack usage, whereas just saying int a = x is liable to compile to efficient code without needing the stack.
The only times I have benefited from using assembler is by hand optimising the assembler output produced by the compiler, and that was in the days where processor speeds were often in the single megahertz range. Algorithmic optimisation tends to give a better return on investment as you can gain orders of magnitudes in improvement rather than small multiples. As others have already said, the only other times you go to assembler is if the compiler or language doesn't do something you need to do. With C and C++ this is very rarely the case any more.
It could well be someone showing off that they know how to write some trivial assembler code, making the next programmers job more difficult, and possibly as a half assed measure to protect their own job. For the example given, the code is confusing, possibly slower than native C, less portable, and should probably be removed. Certainly if I see any inline assmebler in any modern C code, I'd expect copious comments explaining why it is absolutely necessary.
Let compilers optimize for you. There's no possible way this kind of "optimization" will ever help anything... ever!