Which is the best way to index through a for loop? - c++

I am trying to do a product operand on the values inside of a vector. It is a huge mess of code.. I have posted it previously but no one was able to help. I just wanna confirm which is the correct way to do a single part of it. I currently have:
vector<double> taylorNumerator;
for(a = 0; a <= (constant); a++) {
double Number = equation involving a to get numerous values;
taylorNumerator.push_back(Number);
for(b = 0; b <= (constant); b++) {
double NewNumber *= taylorNumerator[b];
}
This is what I have as a snapshot, it is very short from what I actually have. Someone told me it is better to do vector.at(index) instead. Which is the correct or best way to accomplish this? If you so desire I can paste all of the code, it works but the values I get are wrong.

When possible, you should probably avoid using indexes at all. Your options are:
A range-based for loop:
for (auto numerator : taylorNumerators) { ... }
An iterator-based loop:
for (auto it = taylorNumerators.begin(); it != taylorNuemrators.end(); ++it) { ... }
A standard algorithm, perhaps with a lambda:
#include <algorithm>
std::for_each(taylorNumerators, [&](double numerator) { ... });
In particular, note that some algorithms let you specify a number of iterations, like std::generate_n, so you can create exactly n items without counting to n yourself.
If you need the index in the calculation, then it can be appropriate to use a traditional for loop. You have to watch for a couple pitfalls: std::vector<T>::size() returns a std::vector<T>::size_type which is typically identical to std::size_type, which is (1) unsigned and (2) quite possibly larger than an int.
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) { ... }
Your calculations probably deal with doubles or some numerical type other than std::size_t, so you have to consider the best way to convert it. Many programmers would rely on implicit conversions, but that can be dangerous unless you know the conversion rules very well. I'd generally start by doing a static cast of the index to the type I actually need. For example:
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) {
const auto x = static_cast<double>(i);
/* calculation involving x */
}
In C++, it's probably far more common to make sure the index is in range and then use operator[] rather than to use at(). Many projects disable exceptions, so the safety guarantee of at() wouldn't really be available. And, if you can check the range once yourself, then it'll be faster to use operator[] than to rely on the range-check built into at() on each index operation.

What you have is fine. Modern compilers can optimize the heck out of the above such that the code is just as fast as the equivalent C code of accessing items direclty.
The only optimization for using vector I recommend is to invoke taylorNumerator.reserve(constant) to allocate the needed storage upfront instead of the vector resizing itself as new items are added.
About the only worthy optimization after that is to not use vector at all and just use a static array - especially if constant is small enough that it doesn't blow up the stack (or binary size if global).
double taylorNumerator[constant];

Related

Using nested [ ] operations for std::vector

I am quite new to C++, and i have tried searching for an answer to this and running tests, but many times I'm having trouble figuring out what causes specific behaviors.
My question relates to using nested [ ] operators to access or modify elements in a loop - example:
//Declare
std::vector<int> a1 {10,20,30,40} ;
std::vector<int> a2 {2,3} ;
int S2 = a2.size() ;
//Loop
for(int i = 0 ; i < S2 ; i++){
a1[a2[i]] = a1[a2[i]] + 5000 ;
}
Is this considered ok? I'm asking not only in terms of common practice, but also in terms of efficiency and any other potential factor I need to consider.
Am I supposed to first store a[i] inside a temporary variable inside the loop and then use it to modify my element in vector a2?
I do know that its probably not the best structure and I should be using some other data structure to do this kind of thing, but I just want to understand if this is ok or if it might cause some undefined behavior.
I am developer for a finite element calculation software.
We use this technique in order to access the values inside an element. It helps us to save a lot of memory
BUT: Be aware that it spoils your cache locality. Don't use it in heavy loops, if you can avoid it.
If you need a range checks and performance is not important, you can consider using the at operator of the std::vector
for(const auto & index :a2) {
a1.at(index) += 5000;
}
The at function automatically checks whether n is within the bounds of valid elements in the vector, throwing an out_of_range exception if it is not (i.e., if n is greater than, or equal to, its size). This is in contrast with member operator[], that does not check against bounds.
Moreover, consider using a range based loop
//Loop
for(const auto & index :a2) {
a1[index] += 5000;
}
This is perfectly correct.
But in fact, you just want to iterate the elements of a standard container. C++ allows the range based for statement for that use case:
for (index: a2) {
a1[index] += 5000;
}
I find it more readable even if it is mainly a matter of taste...
Disclaimer: this code makes no control of the validity of the elements of a2 as index of a1.
Looks okay to me. There is no need to create an explicit copy of a2[i].
The only issue I see with something like this is that the argument inside [] should be of type std::size_t instead of int. These integer types encompass different ranges of values, and while std::size_t is an unsigned integer type, int is a signed integer. Beware of using negative indexes or indexes past the last element will likely result in undefined behavior due to out-of-bounds access. But if you can guarantee that the values in a2 are always valid indexes for a1, then these int values will implicitly be converted to std::size_t and things works properly (which seems to be the case in the code example in your question).
I also suggest to convert the loop variable i to std::size_t (and use ++i instead of i++ if you want to be perfect:).
In modern C++, you can also use a range-based for so you don't have use an explicit index variable for accessing a2 values at all:
for (auto indexFromA2 : a2)
a1[indexFromA2] += 5000;
This is less error-prone, because you have to write less logic for managing the element access (and don't have to spell out the types).
I would somehow ensure that the elements in a1 defined in a2 do really exist before trying to access them, otherwise you run out of bounds.
But in regards of nested [] this is fine and there's no need to create another copy of a2 to access a1. The compiler is just unwrapping your expression from inside out.
You can still simplify your code a bit
//Declare
std::vector<int> a1 {10,20,30,40} ;
std::vector<int> a2 {2,3} ;
//Loop
for(int i = 0 ; i < a2.size() ; i++){
if(a1.size()-1 < a2[i]){break;}
a1[a2[i]] += 5000 ;
}

Is it possible to micro-optimize "x = max(a,b); y = min(a,b);"?

I had an algorithm that started out like
int sumLargest2 ( int * arr, size_t n )
{
int largest(max(arr[0], arr[1])), secondLargest(min(arr[0],arr[1]));
// ...
and I realized that the first is probably not optimal because calling max and then min is repetitious when you consider that the information required to know the minimum is already there once you've found the maximum. So I figured out that I could do
int largest = max(arr[0], arr[1]);
int secondLargest = arr[0] == largest ? arr[1] : arr[0];
to shave off the useless invocation of min, but I'm not sure that actually saves any number of operations. Are there any fancy bit-shifting algorithms that can do the equivalent of
int largest(max(arr[0], arr[1])), secondLargest(min(arr[0],arr[1]));
?????
In C++, you can use std::minmax to produce a std::pair of the minimum and the maximum. This is particularly easy in combination with std::tie:
#include <algorithm>
#include <utility>
int largest, secondLargest;
std::tie(secondLargest, largest) = std::minmax(arr[0], arr[1]);
GCC, at least, is capable of optimizing the call to minmax into a single comparison, identical to the result of the C code below.
In C, you could write the test out yourself:
int largest, secondLargest;
if (arr[0] < arr[1]) {
largest = arr[1];
secondLargest = arr[0];
} else {
largest = arr[0];
secondLargest = arr[1];
}
How about:
int largestIndex = arr[1] > arr[0];
int largest = arr[largestIndex];
int secondLargest = arr[1 - largestIndex];
The first line relies on an implicit cast of a boolean result to 1 in the case of true and 0 in the case of false.
I'm going to assume that you'd rather solve the larger problem... That is, getting the sum of the largest two numbers in an array.
What you are trying to do is a std::partial_sort().
Let's implement it.
int sumLargest2(int * arr, size_t n) {
int * first = arr;
int * middle = arr + 2;
int * last = arr + n;
std::partial_sort(first, middle, last, std::greater<int>());
return arr[0] + arr[1];
}
And if you're unable to modify arr, then I'd recommend looking into std::partial_sort_copy().
x = max(a, b);
y = a + b - x;
It won't necessarily be faster, but it will be different.
Also beware of overflows.
If your intention is to reduce the function call to find min mad max you can try std::minmax_element. This is available since C++11.
auto result = std::minmax_element(arr, arr+n);
std::cout<< "min:"<< *result.first<<"\n";
std::cout<< "max :" <<*result.second << "\n";
If you just want to find the bigger of two values go:
if(a > b)
{
largest = a;
second = b;
}
else
{
largest = b;
second = a;
}
No function calls, one comparison, two assignments.
I'm assuming C++...
Short answer, use std::minmax and compile with the right optimizations and the right instruction set parameters.
Long ugly answer, The compiler cannot make all the assumptions necessary to make it really, really fast. You can. In this case, you can change the algorithm to process all data first and you can force alignment on the data. Doing all this, you can use intrinsics to make it faster.
Although I haven't tested it in this particular case, I've seen enormous performance improvements using these guidelines.
Since you're not passing 2 integers to the function, I'm assuming your using an array and want to iterate it somehow. You now have a choice to make: make 2 arrays and use min/max or use 1 array with both a and b. This decision alone can already influence the performance.
If you have 2 arrays, these can be allocated on 32-byte boundaries with aligned malloc's and then processed using intrinsics. If you are going for real, raw performance - this is the way to go.
F.ex, let's assume you have AVX2. (NOTE: I'm not sure if you do and you SHOULD check this using CPU id's!). Go to the cheat sheet here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and pick your poison.
The intrinsics you're looking for are in this case probably:
_mm256_min_epi32
_mm256_max_epi32
_mm256_stream_load_si256
If you have to do this for the entire array, you probably want to keep all the stuff in a single __mm256 register before merging the individual items. E.g.: do a min/max per 256-bit vector, and when the loop is done, extract the 32-bit items and do a min/max on that.
Long nicer answer: So ... as for the compiler. Compilers do attempt to optimize these kinds of things, but run into problems.
If you have 2 different arrays that you process, the compiler has to know that they are different in order to be able to optimize it. This is the reason why stuff like restrict exists, which tells the compiler exactly this little thing you probably already knew while writing the code.
Also, the compiler doesn't know your memory is aligned, so it has to check this and branch... for each call. We don't want this; which means we want it to inline its stuff. So, add inline, put it in a header file and that's that. You can also use aligned to give him a hint.
Your compiler also didn't get the hint that the int* won't change over time. If it cannot change, it's a good idea to tell him that using the const keyword.
A compiler uses an instruction set to do the compilation. Normally, they already use SSE, but AVX2 can help a lot (as I've shown with the intrinsics above). If you can compile it with those flags, make sure to use them - they help a lot.
Run in release mode, compile with optimizations on 'fast' and see what happens under the hood. If you do all this, you should see vpmax... instructions appearing in the inner loops, which means that the compiler uses the intrinsics just fine.
I don't know what else you want to do in the loop... if you use all these instructions you should hit the memory speed on big arrays.
How about a time-space trade-off?
#include <utility>
template<typename T>
std::pair<T, T>
minmax(T const& a, T const& b)
{ return b < a ? std::make_pair(b, a) : std::make_pair(a, b); }
//main
std::pair<int, int> values = minmax(a[0], a[1]);
int largest = values.second;
int secondLargest = values.first;

The simple task of iterating through an array. Which of these solutions is the most efficient?

Recently, I've been thinking about all the ways that one could iterate through an array and wondered which of these is the most (and least) efficient. I've written a hypothetical problem and five possible solutions.
Problem
Given an int array arr with len number of elements, what would be the most efficient way of assigning an arbitrary number 42 to every element?
Solution 0: The Obvious
for (unsigned i = 0; i < len; ++i)
arr[i] = 42;
Solution 1: The Obvious in Reverse
for (unsigned i = len - 1; i >= 0; --i)
arr[i] = 42;
Solution 2: Address and Iterator
for (unsigned i = 0; i < len; ++i)
{ *arr = 42;
++arr;
}
Solution 3: Address and Iterator in Reverse
for (unsigned i = len; i; --i)
{ *arr = 42;
++arr;
}
Solution 4: Address Madness
int* end = arr + len;
for (; arr < end; ++arr)
*arr = 42;
Conjecture
The obvious solutions are almost always used, but I wonder whether the subscript operator could result in a multiplication instruction, as if it had been written like *(arr + i * sizeof(int)) = 42.
The reverse solutions try to take advantage of how comparing i to 0 instead of len might mitigate a subtraction operation. Because of this, I prefer Solution 3 over Solution 2. Also, I've read that arrays are optimized to be accessed forwards because of how they're stored in the cache, which could present an issue with Solution 1.
I don't see why Solution 4 would be any less efficient than Solution 2. Solution 2 increments the address and the iterator, while Solution 4 only increments the address.
In the end, I'm not sure which of these solutions I prefer. I'm think the answer also varies with the target architecture and optimization settings of your compiler.
Which of these do you prefer, if any?
Just use std::fill.
std::fill(arr, arr + len, 42);
Out of your proposed solutions, on a good compiler, neither should be faster than the others.
The ISO standard doesn't mandate the efficiency of the different ways of doing things in code (other than certain big-O type stuff for some collection algorithms), it simply mandates how it functions.
Unless your arrays are billions of elements in size, or you're wanting to set them millions of times per minute, it generally won't make the slightest difference which method you use.
If you really want to know (and I still maintain it's almost certainly unnecessary), you should benchmark the various methods in the target environment. Measure, don't guess!
As to which I prefer, my first inclination is to optimise for readability. Only if there's a specific performance problem do I then consider other possibilities. That would be simply something like:
for (size_t idx = 0; idx < len; idx++)
arr[idx] = 42;
I don't think that performance is an issue here - those are, if at all (I could imagine the compiler producing the identical assembly for most of them), micro optimizations hardly ever necessary.
Go with the solution that is most readable; the standard library provides you with std::fill, or for more complex assignments
for(unsigned k = 0; k < len; ++k)
{
// whatever
}
so it is obvious to other people looking at your code what you are doing. With C++11 you could also
for(auto & elem : arr)
{
// whatever
}
just don't try to obfuscate your code without any necessity.
For nearly all meaningful cases, the compiler will optimize all of the suggested ones to the same thing, and it's very unlikely to make any difference.
There used to be a trick where you could avoid the automatic prefetching of data if you ran the loop backwards, which under some bizarre set of circumstances actually made it more efficient. I can't recall the exact circumstances, but I expect modern processors will identify backwards loops as well as forwards loops for automatic prefetching anyway.
If it's REALLY important for your application to do this over a large number of elements, then looking at blocked access and using non-temporal storage will be the most efficient. But before you do that, make sure you have identified the filling of the array as an important performance point, and then make measurements for the current code and the improved code.
I may come back with some actual benchmarks to prove that "it makes little difference" in a bit, but I've got an errand to run before it gets too late in the day...

Are std::fill, std::copy specialized for std::vector<bool>?

When thinking about this question I start to wondering if std::copy() and/or std::fill are specialized (I really mean optimized) for std::vector<bool>.
Is this required by C++ standard or, perhaps, it is common approach by C++ std library vendors?
Simple speaking, I wonder to know if the following code:
std::vector<bool> v(10, false);
std::fill(v.begin(), v.end(), true);
is in any way better/different than that:
std::vector<bool> v(10, false);
for (auto it = v.begin(); it != v.end(); ++it) *it = true;
To be very strict - can, let say: std::fill<std::vector<bool>::iterator>() go into internal representation of std::vector<bool> and sets their entire bytes instead of single bits? I assume making std::fill friend of std::vector<bool> is not a big problem for library vendor?
[UPDATE]
Next related question: can I (or anybody else :) specialize such algorithms for let say std::vector<bool>, if not already specialized? Is this allowed by C++ standard? I know this will be non portable - but just for one selected std C++ library? Assuming I (or anybody else) find a way to get to std::vector<bool> private parts.
STD is headers only library and it is shipped with your compiler. You can look into those headers yourself. For GCC's vector<bool> impelemtation is in stl_bvector.h. It probably will be the same file for other compilers too. And yes, there is specialized fill (look near __fill_bvector).
Optimizations are nowhere mandated in the standard. It is assumed to be a "quality of implementation" issue if an optimization could applied. The asymptotic complexity of most algorithms is, however, restricted.
Optimizations are allowed as long as a correct program behaves according to what the standard mandates. The examples you ask about, i.e., optimizations involving standard algorithms using iterators on std::vector<bool>, can achieve their objective pretty much in any way the implementation sees fit because there is no way to monitor how they are implemented. This said, I doubt very much that there is any standard library implementation optimizing operations on std::vector<bool>. Most people seem to think that this specialization is an abomination in the first place and that it should go away.
A user is only allowed to create specializations of library types if the specialization involves at least one user defined type. I don't think a user is allowed to provide any function in namespace std at all: There isn't any needs because all such functions would involve a user defined type and would, thus, be found in the user's namespace. Formulated differently: I think you are out of luck with respect to getting algoritms optimized for std::vector<bool> for the time being. You might consider contributing optimized versions to the open source implementations (e.g., libstdc++ and libc++), however.
There is no specialization for it, but you can still use it. (even though it's slow)
But here is a trick I found which enables std::fill on std::vector<bool>, using proxy class std::_Vbase.
(WARNING: I've tested it only for MSVC2013, so it may not work on other compilers.)
int num_bits = 100000;
std::vector<bool> bit_set(num_bits , true);
int bitsize_elem = sizeof(std::_Vbase) * 8; // 1byte = 8bits
int num_elems = static_cast<int>(std::ceil(num_bits / static_cast<double>(bitsize_elem)));
Here, since you need the whole bits of an element if you use any bit of it, the number of elements must be rounded up.
Using this information, we will build a vector of pointers that pointing the original elements underlying the bits.
std::vector<std::_Vbase*> elem_ptrs(num_elems, nullptr);
std::vector<bool>::iterator bitset_iter = bit_set.begin();
for (int i = 0; i < num_elems; ++i)
{
std::_Vbase* elem_ptr = const_cast<std::_Vbase*>((*bitset_iter)._Myptr);
elem_ptrs[i] = elem_ptr;
std::advance(bitset_iter, bitsize_elem);
}
(*bitset_iter)._Myptr : By dereferencing the iterator of std::vector<bool>, you can access the proxy class reference and its member _Myptr.
Since the return type of std::vector<bool>::iterator::operator*() is const std::_Vbase*, remove the constness of it by const_cast.
Now we get the pointer which is pointing the original element underlying those bits, std::_Vbase* elem_ptr.
elem_ptrs[i] = elem_ptr : Record this pointer,...
std::advance(bitset_iter, bitsize_elem) : ...and continue our journey to find the next element, by jumping bits held by the previous element.
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0); // fill every bits "false"
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, -1); // fill every bits "true"
Now, we can use std::fill on the vector of pointers, rather than vector of bits.
Perhaps some may feel uncomfortable using the proxy class externally and even remove the constness of it.
But if you don't care about that and want something fast, this is the fastest way.
I did some comparisons below. (made new project, nothing changed config, release, x64)
int it_max = 10; // do it 10 times ...
int num_bits = std::numeric_limits<int>::max(); // 2147483647
std::vector<bool> bit_set(num_bits, true);
for (int it_count = 0; it_count < it_max; ++it_count)
{
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0);
} // Elapse Time : 0.397sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
std::fill(bit_set.begin(), bit_set.end(), false);
} // Elapse Time : 18.734sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
for (int i = 0; i < num_bits; ++i)
{
bit_set[i] = false;
}
} // Elapse Time : 21.498sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
bit_set.assign(num_bits, false);
} // Elapse Time : 21.779sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
bit_set.swap(std::vector<bool>(num_bits, false)); // You can not use elem_ptrs anymore
} // Elapse Time : 1.3sec
There is one caveat. When you swap() the original vector with another one, then the vector of pointers becomes useless!
23.2.5 Class vector from the C++ International Standard goes as far as to tell us
To optimize space allocation, a specialization of vector for bool elements is provided:
after which the bitset specialization is provided. That's as far as the standard goes regarding vector<bool>, vendors need to implement it using a bitset to optimize for space. Optimizing for space comes with a cost here, as to not optimize for speed.
It's easier to get a book from the library than it is to find a book if it were between all the library books stapled closely together in containers....
Take your example, you're trying to do a std::fill or std::copy from begin to end. But that's not always the case, sometimes it doen't just simply map to an entire byte. So, that's a bit of a problem in terms of speed optimization. It's easy for the case you'd have to change every bit to one, that's just changing the bytes to 0xF, but that's not the case here; it becomes much harder if you were to only changes certain bits of a byte. Then you'll need to actually compute what the byte will be; that's not a trivial thing to do*, or at least not as an atomic operation on current hardware.
It's the premature optimization story, it's nice in terms of space but horrible in terms of performance.
Is having a "is a multiple of 8 bits" check worth the overhead? I doubt it.
* We're talking about multiple bits here, for the case it's just one bit you can of course do a bit operation.

c++ variable declaration

Im wondering if this code:
int main(){
int p;
for(int i = 0; i < 10; i++){
p = ...;
}
return 0
}
is exactly the same as that one
int main(){
for(int i = 0; i < 10; i++){
int p = ...;
}
return 0
}
in term of efficiency ?
I mean, the p variable will be recreated 10 times in the second example ?
It's is the same in terms of efficiency.
It's not the same in terms of readability. The second is better in this aspect, isn't it?
It's a semantic difference which the code keeps hidden because it's not making a difference for int, but it makes a difference to the human reader. Do you want to carry the value of whatever calculation you do in ... outside of the loop? You don't, so you should write code that reflects your intention.
A human reader will need to seek the function and look for other uses of p to confirm himself that what you did was just premature "optimization" and didn't have any deeper purpose.
Assuming it makes a difference for the type you use, you can help the human reader by commenting your code
/* p is only used inside the for-loop, to keep it from reallocating */
std::vector<int> p;
p.reserve(10);
for(int i = 0; i < 10; i++){
p.clear();
/* ... */
}
In this case, it's the same. Use the smallest scope possible for the most readable code.
If int were a class with a significant constructor and destructor, then the first (declaring it outside the loop) can be a significant savings - but inside you usually need to recreate the state anyway... so oftentimes it ends up being no savings at all.
One instance where it might make a difference is containers. A string or vector uses internal storage that gets grown to fit the size of the data it is storing. You may not want to reconstruct this container each time through the loop, instead, just clear its contents and it may not need as many reallocations inside the loop. This can (in some cases) result in a significant performance improvement.
The bottom-line is write it clearly, and if profiling shows it matters, move it out :)
They are equal in terms of efficiency - you should trust your compiler to get rid of the immeasurably small difference. The second is better design.
Edit: This isn't necessarily true for custom types, especially those that deal with memory. If you were writing a loop for any T, I'd sure use the first form just in case. But if you know that it's an inbuilt type, like int, pointer, char, float, bool, etc. I'd go for the second.
In second example the p is visible only inside of the for loop. you cannot use it further in your code.
In terms of efficiency they are equal.