Difference between std::vector::back and vector::operator[ vector::size() - 1] - c++

is there a difference between:
mvt_act_idx = openCloseList.size()-1;
openCloseList[mvt_act_idx].A += a;
and
openCloseList.back().A += a;
Besides readability?

mvt_act_idx = openCloseList.size()-1;
openCloseList[mvt_act_idx].A += a;
If openCloseList is empty the unsigned subtraction will produce a huge value, which then is used to index the vector. The indexing operator may assert, or not.
openCloseList.back().A += a;
If openCloseList is empty the back operation may assert, or not.
In the case of such error, a fault in back is probably easier to understand.
And anyway, the first code snippet can be in conflict with some guideline to not use unsigned integers as numbers (except where the modulo behavior simplifies and clarifies the code), while the call to back can not be in conflict with any such guideline.

No useful difference.
Using back() directly might be fractionally more efficient, but I doubt it. You could have a look at the generated assembler if you felt really keen. Might make the difference of an instruction or two, depending on how clever your compiler is.
I do note that the former way which uses only size() and [] would be a lot more familiar to people who understand arrays or use similar contructs in any other language; back() is a bit more C++ specific (though it is hardly a cryptic idiom).

Related

Constexpr and array

Consider the following code snippet (of course this piece of code is not useful at all but I've simplified it just to demonstrate my question) :
constexpr std::array<char*, 5> params_name
{
"first_param",
"second_param",
"third_param",
"fourth_param",
"fifth_param"
};
int main()
{
std::vector<std::string> a_vector;
for (int i = 0; i < params_name.size(); ++i) {
a_vector.push_back(params_name[i]);
}
}
I would like to be sure understanding what happens to the for loop during compilation. Is the loop unrolled and becomes ? :
a_vector.push_back("first_param")
a_vector.push_back("second_param")
a_vector.push_back("third_param")
a_vector.push_back("fourth_param")
a_vector.push_back("fifth_param")
If it's the case, is the behaviour identical regardless the number of elements contained in the params_name array ? If yes, then I'm wondering whether it could be more interesting just to store those values in a regular array built at run time to avoid code expansion ?
Thanks in advance for your help.
One problem with your code is that at present std::array isn't constexpr-enabled. You can work around this by simply using a regular array such as
constexpr char const * const my_array[5] = { /* ... */ };
As for your question:
All constexpr really means is "this value is known at compile time".
Is the loop unrolled and becomes ?
I don't know. It depends on your compiler, architecture, standard library implementation, and optimization settings. I wouldn't think about this too much. You can be confident that at reasonable optimization levels (-O1 and -O2) your compiler will weigh the benefits and drawbacks of doing this vs not, and pick a good option.
If it's the case, is the behaviour identical regardless the number of elements contained in the params_name array ?
Yes! It doesn't matter whether the compiler unrolls the loop. When your code runs, it will appear to behave exactly like what you wrote. This is called the "as-if" rule, meaning that no matter what optimizations the compiler does, the resulting program must behave "as-if" it does what you wrote (assuming your code doesn't invoke undefined behavior).
could be more interesting just to store those values in a regular array built at run time to avoid code expansion?
From where would these values come if you did? From standard input? From a file? If yes, then the compiler can't know what they will be or how many there will be, so it has little choice but to make a runtime loop. If no, then even if the array is not constexpr, the compiler is likely smart enough to figure out what you mean and optimize the program to be the same as with a constexpr array.
To summarize: don't worry about things like loop unrolling or code duplication. Modern compilers are pretty smart and will usually generate the right code for your situation. The amount of extra memory spent on loop unrolling like this is usually more than offset by the performance improvements. Unless you're on an embedded system where every byte matters, just don't worry about it.

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

Making unsigned integer underflow throw an exception

I understand that there are applications in which using unsigned integer over/underflow is a good way to get cheap modular arithmetic.
In my code, I use uint exclusively for indices to containers, so I never want this behaviour.
Is this a bad idea? Should I be using int everywhere instead? I do have to do some unsavoury things to get a for loop to count down to 0.
Is there a commonly used implementation of a less unsafe unsigned integer type? Something that throws an exception?
Do compilers (for me gcc, clang) provide a mechanism for less unsafe behaviour in the given compilation unit?
First, a terminology quibble: there is no such thing as unsigned integer underflow, precisely because of the way they wrap around (using modulo arithmetic), which is probably the phrase you meant.
Second, is this a common scenario to be in? Yes, it is a bit. You're not the only one doing "unsavoury things" with loops for reverse counting, and I bet there are a ton of bugs out there where people haven't done "unsavoury things" and, as a result, their code has an unsavoury infinite loop hidden in it. Mind you, I'm not sure I'd go so far as to call unsigneds "unsafe" as a result; like anything, they are the right tool for a subset of infinite possible jobs, and within that subset they perfectly safe.
There is debate over whether unsigned integers should be used for array indexes at all. Some standard committee members believe that their use in the standard library was a mistake; I know that several members of the c++ community here on Stack Overflow also hate unsigned values and wish they'd go away.
Personally I think having access to the full range of the integer by default is absolutely crucial (and losing that is not worth it for a single "-1" sentinel value or whatever), so I think that — while you're not alone in this requirement, and it's a sensible requirement — using unsigned array indexes by default is a good thing. (And what the heck is a negative array index? Semantics, people!)
But that doesn't help you in this scenario. So, what can you do about it? No, there's no trapping unsigned integer implementation (at least, not one that I'm aware of, let alone widespread) because that would literally violate the rules of the type as defined by C++: it would introduce well-defined underflow/overflow semantics to a type for which underflow/overflow shouldn't even be possible.
You will have to use signed integers and check for "logical underflow" (i.e. going out of your desired range, say -1) yourself. You could wrap this behaviour in a class.
I suppose you could actually just wrap an unsigned integer while you're at it, adding some extra logic to operator-- and operator-= to detect a wrap-around and throw.
But I guess my point is that, whatever you do, it's going to be in your "code space" and thus subject to decreased performance. You can't eke out this behaviour from the platform itself.

Is ++(a = b); faster than a = b + 1;?

Is it faster to use ++(a = b); instead of a = b + 1;?
For my understanding, the first approach consists of the operations:
move the value of b to a
increment a in memory
while the second approach does:
push b and 1 to the stack
call add
pop the result to a register
move the register to a
Does it actually take less cycles? Or does the compiler (gcc for example) do an optimization so it does not make a difference?
edit: TIL that ++(a=b) is wrong illegal UB, at least in pre-C++11. Nevertheless, I'll discuss this assuming it's either legal or the compiler does what you expect.
Generally speaking, a = b + 1; is faster.
The optimizer will most surely make the same of both. If not, it is more likely to optimize the second version, because it is a very common thing to write, and omtimizers are more likely to recognize common things than weird corner cases.
Why do I say it should be the same after optimization, but the second is faster? Because of the fellow developers. Everyone recognizes a = b + 1; immediately. Noone really has to think about it. The other case is more likely to trigger a reaction in the likes of "wtf is he doing there, and why?". Many people will figure out eventually what you did there. Some will not. Some might even introduce bugs because of it. Few people will find out why you did it and nevertheless stumble each time they have to read that line. Everyone will lose time wondering while reading that line. That's why the other is faster.
Caveat: all this is written silently assuming that you are talking of builtin types, like ints or pointers. Your interpretation of what the two do supports that. If we're talking of UDTs, the two lines are not even guaranteed to do the same. It then depends completely on how operator=, operator++ and operator+ and maybe the conversion from int are implemented. Nevertheless, if the implementations make you conside to write ++(a=b), they are most likely bad implementations and should be improved rather than hacked around.
tl;dr: if I'd catch you doing ++(a=b) in any codebase I work on, we'd have to have a serious talk ;-)
There is no simple answer to this question. The question has been flagged with C++ so we have no way of knowing what this code is actually doing without knowing the precise type of all the operands. Also, the context within which the code appears will make a difference to the way the optimiser generates code - the compiler could alias the variables and move the increment into instructions further down the program, for example, into effective address calculations for the two variables.
But the real question is, why do you care? As Arne said above, readability is far more important and you've not posted a scenario whereby any difference would have a measurable effect.
Only worry about it if it is actually causing a problem.
With optimizations on, they generate exactly the same code for me so they will perform exactly the same. This shouldn't be a surprise as the effects of both statements are exactly the same.
++(a = b); is undefined behaviour because there are two unsequenced modifications to a.
Although the value computation of a in a = b is sequenced before the modification of a due ++, the side-effect of a = b (storage to a) is unsequenced relative to the side-effect of ++ (again, storage to a).

Assignment vs mempcy - which will be faster in this case

which of the two is faster: ?
1.
char* _pos ..;
short value = ..;
*((short*)_pos = va;
2.
char* _pos ..;
short value = ..;
memcpy(_pos, &value, sizeof(short));
As with all "which is faster?" questions, you should benchmark it to see for yourself. And if it matters, then ask why and pick which you want.
In any case, your first example is technically undefined behavior since you are violating strict-aliasing. So if you had to choose without benchmarking, go with the second one.
To answer the actual question, which is faster will probably depend on the alignment of pos. If it's aligned properly, then 1 will probably be faster. If not, then 2 might be faster depending on how it's optimized by the compiler. (1 might even crash if the hardware doesn't support misaligned access.)
But this is all guess-work. You really need to benchmark it to know for sure.
At the very least, you should look at the compiled assembly:
: *(short *)_pos = value;
mov WORD PTR [rcx], dx
vs.
: memcpy(_pos, &value, sizeof(short));
mov WORD PTR [rcx], dx
Which in this case (in MSVC) shows the exact same assembly with default optimizations. So you can expect the performance to be the same.
With gcc at an optimization level of -O1 or higher, the following two functions compile to exactly the same machine code on x86:
void foo(char *_pos, short value)
{
memcpy(_pos, &value, sizeof(short));
}
void bar(char *_pos, short value)
{
*(short *)_pos = value;
}
The compiler might implement them both the same way.
If it does it naively, assignment will be faster.
For any practical purpose, they'll both be done in no time, and you don't need to worry.
Also note that you may have alignment problem s(_pos may not be aligned on 2 bytes, which may crash on some processors), and type punning problems (the compiler may assume that what _pos points to isn't changed, because you wrote using a short *).
Does it matter? It might be that the first case will save you some cycles (depends on the compiler sophistication and optimizations). But is it worth the readibility and maintainability hit?
Many bugs are introduced because of premature optimization. You should first identify the bottleneck, and if this assignment is that bottleneck - benchmark each of the options (taking care of alignment and other issues mentioned here by others already).
The question is implementation-dependent. In practice, for doing nothing but copying sizeof(short) bytes, if one is going to be slower, it's going to be memcpy. For considerably larger data sets, if one is going to be faster, it's generally going to be memcpy.
As pointed out, #1 invokes undefined behavior.
We can see that simple assignment is certainly easier to read and write and less error prone than both. Clarity and correctness should come first, even in performance-critical areas for the simple reason that it's easier to optimize correct code than it is to fix optimized, incorrect code. If this is really a C++ question, the need for such code (casts or memcpy that bulldoze over the type system to x-ray and copy around bits) should be very, very rare.
If you are certain that there won't be an alignment issue, and you really find this is a bottleneck situation then go ahead and do the first.
If you are unhappy calling memcpy then do something like:
*pos = static_cast<char>(value & 0xff );
*(pos+1) = static_cast<char>(value >> 8 );
although if you are going to do that then use unsigned values.
The above code ensures you get little-endian too. (Obviously reverse the order of the assignments if you want big-endian). You might want a consistent endian-ness if the data is passed around as some kind of binary blob, which is, I assume, what you are trying to create.
You might wish to use something like google protocol buffers if you want to create binary blobs. There is also boost::serialize which includes binary serialization.
You can avoid breaking aliasing rules and calling a function by using a union:
union {
char* c;
short* s;
} _pos;
short value = ...
_pos->s = value;