Consider the following code snippet (of course this piece of code is not useful at all but I've simplified it just to demonstrate my question) :
constexpr std::array<char*, 5> params_name
{
"first_param",
"second_param",
"third_param",
"fourth_param",
"fifth_param"
};
int main()
{
std::vector<std::string> a_vector;
for (int i = 0; i < params_name.size(); ++i) {
a_vector.push_back(params_name[i]);
}
}
I would like to be sure understanding what happens to the for loop during compilation. Is the loop unrolled and becomes ? :
a_vector.push_back("first_param")
a_vector.push_back("second_param")
a_vector.push_back("third_param")
a_vector.push_back("fourth_param")
a_vector.push_back("fifth_param")
If it's the case, is the behaviour identical regardless the number of elements contained in the params_name array ? If yes, then I'm wondering whether it could be more interesting just to store those values in a regular array built at run time to avoid code expansion ?
Thanks in advance for your help.
One problem with your code is that at present std::array isn't constexpr-enabled. You can work around this by simply using a regular array such as
constexpr char const * const my_array[5] = { /* ... */ };
As for your question:
All constexpr really means is "this value is known at compile time".
Is the loop unrolled and becomes ?
I don't know. It depends on your compiler, architecture, standard library implementation, and optimization settings. I wouldn't think about this too much. You can be confident that at reasonable optimization levels (-O1 and -O2) your compiler will weigh the benefits and drawbacks of doing this vs not, and pick a good option.
If it's the case, is the behaviour identical regardless the number of elements contained in the params_name array ?
Yes! It doesn't matter whether the compiler unrolls the loop. When your code runs, it will appear to behave exactly like what you wrote. This is called the "as-if" rule, meaning that no matter what optimizations the compiler does, the resulting program must behave "as-if" it does what you wrote (assuming your code doesn't invoke undefined behavior).
could be more interesting just to store those values in a regular array built at run time to avoid code expansion?
From where would these values come if you did? From standard input? From a file? If yes, then the compiler can't know what they will be or how many there will be, so it has little choice but to make a runtime loop. If no, then even if the array is not constexpr, the compiler is likely smart enough to figure out what you mean and optimize the program to be the same as with a constexpr array.
To summarize: don't worry about things like loop unrolling or code duplication. Modern compilers are pretty smart and will usually generate the right code for your situation. The amount of extra memory spent on loop unrolling like this is usually more than offset by the performance improvements. Unless you're on an embedded system where every byte matters, just don't worry about it.
Related
I understand that at runtime, const char text[] = "some char array" is the same as const char text[16] = "some char array".
Is there any difference in compile time? I reckon there would be, as telling the compiler how many elements there are in the array ought to be faster than it counting the number of elements. I understand that, even if it makes a difference, it will be inconceivably small, but curiosity got the better of me, so I thought I'd ask.
You should favour readable code that reduces cognitive load over code that might compile faster. Your assumption that something compiles faster could as well be the other way round (see below). You simply do not know the compiler implementation and in general, the difference in compile speeds is negligible.
In the case of a constant string (that you never reassign a value to), I would omit the length, as it adds clutter and the compiler is perfectly able to determine the length it needs.
You can also reason that adding the number is slower. After all, the compiler needs to parse the string and thus knows its length anyway. Adding the 16 in the declaration forces the compiler to also parse a number and check whether the string ain't too long. That might make compilation slower. Who knows? But again: the difference is likely negligible, compared to all the other wonders that compilers do (and quite efficiently). So don't worry about it.
I know that templatized types such as the below cost nothing on the compiled binary size:
template<auto value>
struct ValueHolder{};
I'm making a program that will use a LOT of such wrapped types, and I don't think I want to be using integral_constants for that reason, since they have a ::value member. I can get away with something more like:
template<typename ValHolder>
struct Deducer;
template<auto value>
struct Deducer<ValueHolder<value>> {
using Output = ValueHolder<value+1>;
};
But it's definitely a bit more work, so I want to make sure I'm not doing it for nothing. Note that we're talking TONS of such values (I'd explain, but I don't want to go on too far a tangent; I'd probably get more comments about "should I do the project" than the question!).
So the question is: Do [static] constexpr values take any size at all in the compiled binary, or are the values substituted at compile-time, as if they were typed-in literally? I'm pretty sure they DO take size in the binary, but I'm not positive.
I did a little test at godbolt to look at the assembly of a constexpr vs non-constexpr array side-by-side, and everything looked pretty similar to me: https://godbolt.org/z/9hecfq
int main()
{
// Non-constexpr large array
size_t arr[0xFFFF] = {};
// Constexpr large array
constexpr size_t cArr[0xFFF] = {};
// Just to avoid unused variable optimizations / warnings
cout << arr[0] << cArr[0] << endl;
return 0;
}
This depends entirely on:
How much the compiler feels like optimizing the variable away.
How you use the variable.
Consider the code you posted. You created a constexpr array. As this is an array, it is 100% legal to index it with a runtime value. This would require the compiler to emit code that accesses the array at that index, which would require that array to actually exist in memory. So if you use it in such a way, it must have storage.
However, since your code only indexes this array with a constant expression index, a compiler that wants to think a bit more than -O0 would allow would realize that it knows the value of all of the elements in that array. So it knows exactly what cArr[0] is. And that means the compiler can just convert that expression into the proper value and just ignore that cArr exists.
Such a compiler could do the same with arr, BTW; it doesn't have to be a constant expression for the compiler to detect a no-op.
Also, note that since both arrays are non-static, neither will take up storage "in the compiled binary". If runtime storage for them is needed, it will be stack space, not executable space.
Broadly speaking, a constexpr variable will take up storage at any reasonable optimization level if you do something that requires it to take up storage. This could be something as innocuous as passing it to a (un-inlined) function that takes the parameter by const&.
Ask your linker :) There is nothing anywhere in the C++ standard that has any bearing on the answer. So you absolutely, positively, must build your code in release mode, and check if in the particular use scenario it does increase the size or not.
Any general results you obtain on other platforms, different compilers (1), other compile options, other modules added/removed to your project, or even any changes to the code, will not have much relevance.
You have a specific question that depends on so many factors that general answers are IMHO useless.
But moreover, if you actually care about the binary size, then it should be already in your test/benchmark suite, you should have integration builds fail when things grow when they shouldn’t, etc. No measurement and no automation are prima facie evidence that you don’t actually care.
So, since you presumably do care about the binary size, just write the code you had in mind and look in your CI dashboard at the binary size metric. Oh, you don’t have it? Well, that’s the first thing to get done before you go any further. I’m serious.
(1): Same compiler = same binary. I’m crazy, you say? No. It bit me once too many. If the compiler binary is different (other than time stamps), it’s not the same compiler, end of story.
I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.
This is a general efficiency question for c++. I am not familiar with the inner workings of compilers, so suppose I have several loops and a potential if statement inside, e.g.:
for(int i=0; ...)
{
for(int j=0; ...)
{
if( ... )
{
...
}
else
{
... (slightly different)
}
}
}
However, this if-statement is independent of the loops. Is there a significant speed difference if I instead define the if/else statement outside of the loops with the loops inside? E.g.:
if( ... )
{
for(int i=0; ...)
{
for(int j=0; ...)
{
...
}
}
}
else
{
for(int i=0; ...)
{
for(int j=0; ...)
{
... (slightly different)
}
}
}
If so, or if not so, why is that? I have some notion that a compiler will recognize the same if statement being done over and over, but this is quite unfamiliar territory to me.
I examined the response to this question:
Would compiler optimize conditional statement in loop by moving it ouside the loop?
and he discusses the different levels of optimization in gcc, and how -O3 (I think) would do that. But is anything done like this automatically? If not, how big of a cost is an if-statement like this inside of a loop?
The only real answer is maybe. If the condition is a loop
invariant, then the transposition you suggest is legal, and if
the compiler can recognize the loop invariance, then it can make
the transposition. Whether it does or not depends on the
compiler: g++ /O3 does, at least in 64 bit mode, cl /Ox /Os
doesn't, at least in 32 bit mode; g++ also unrolls the two loops.
In my tests, at least; I more or less guaranteed that the
compiler could determine that the condition was a loop invariant
by wrapping the loop in a function, with the condition
a function argument of type bool const; depending on the
condition, it may be more or less difficult for the compiler to
prove loop invariance. And of course, the fact that the
compiler has more registers to play with in 64 bit mode could
also affect its optimizations .
Also: although I'd instinctively expect the g++ version to be
faster, it is significantly larger; in some cases, this may
negatively affect the various memory caches, resulting in the
code actually running slower.
In the end, I'd write the first, always. If the profiler later
shows it to be a bottleneck, there's no issue about going back
and rewriting it along the lines of the second, then measuring
to see if it makes a difference, one way or the other, and how
much difference it makes. And be aware that the best results
may depend on the compiler and the architecture you are
targetting.
This question is impossible to say which is quicker.
Just remember the 80-20 rule (http://en.wikipedia.org/wiki/Pareto_principle#In_software) and find the bit of code by profiling.
Anyway just write the code readable and maintainable in the first place. If you have performance problems profile the code.
Depends on case but I would bet second option is faster. This is because no branching will happen and compiler has higher chance to replace lot of code with some MMX/SSE group of instructions.
Also at least theoretically in first case CPU has to solve same if() in each for() cycle. In second case if() is outside of look and should be faster. But again, modern compilers often can find this problem and solve it magically.
But yes, usually it is important to write code for readability unless performance is real big concern.
I have a code segment which is as simple as :
for( int i = 0; i < n; ++i)
{
if( data[i] > c && data[i] < r )
{
--data[i];
}
}
It's a part of a large function and project. This is actually a rewrite of a different loop, which proved to be time consuming (long loops), but I was surprised by two things :
When data[i] was temporary stored like this :
for( int i = 0; i < n; ++i)
{
const int tmp = data[i];
if( tmp > c && tmp < r )
{
--data[i];
}
}
It became more much slower. I don't claim this should be faster, but I can not understand why it should be so much slower, the compiler should be able to figure out if tmp should be used or not.
But more importantly when I moved the code segment into a separate function it became around four times slower. I wanted to understand what was going on, so I looked in the opt-report and in both cases the loop is vectorized and seem to do the same optimization.
So my question is what can make such a difference on a function which is not called a million times, but is time consuming in itself ? What to look for in the opt-report ?
I could avoid it by just keeping it inlined, but the why is bugging me.
UPDATE :
I should underline that my main concern is to understand, why it became slower, when moved to a separate function. The code example given with tmp variable, was just a strange example I encountered during the process.
You're probably register starved, and the compiler is having to load and store. I'm pretty sure that the native x86 assembly instructions can take memory addresses to operate on- i.e., the compiler can keep those registers free. But by making it local, you may changing the behaviour wrt. aliasing and the compiler may not be able to prove that the faster version has the same semantics, especially if there is some form of multiple threads in here, allowing it to change the code.
The function was slower when in a new segment likely because function calls not only can break the pipeline, but also create poor instruction cache performance (there's extra code for parameter push/pop/etc).
Lesson: Let the compiler do the optimizing, it's smarter than you. I don't mean that as an insult, it's smarter than me too. But really, especially the Intel compiler, those guys know what they're doing when targetting their own platform.
Edit: More importantly, you need to recognize that compilers are targetted at optimizing unoptimized code. They're not targetted at recognizing half-optimized code. Specifically, the compiler will have a set of triggers for each optimization, and if you happen to write your code in such a way as that they're not hit, you can avoid optimizations being performed even if the code is semantically identical.
And you also need to consider implementation cost. Not every function ideal for inlining can be inlined- just because inlining that logic is too complex for the compiler to handle. I know that VC++ will rarely inline with loops, even if the inlining yields benefit. You may be seeing this in the Intel compiler- that the compiler writers simply decided that it wasn't worth the time to implement.
I encountered this when dealing with loops in VC++- the compiler would produce different assembly for two loops in slightly different formats, even though they both achieved the same result. Of course, their Standard library used the ideal format. You may observe a speedup by using std::for_each and a function object.
You're right, the compiler should be able to identify that as unused code and remove it/not compile it. That doesn't mean it actually does identify it and remove it.
Your best bet is to look at the generated assembly and check to see exactly what is going on. Remember, just because a clever compiler could be able to figure out how to do an optimization, it doesn't mean it can.
If you do check, and see that the code is not removed, you might want to report that to the intel compiler team. It sounds like they might have a bug.