Are std::fill, std::copy specialized for std::vector<bool>? - c++

When thinking about this question I start to wondering if std::copy() and/or std::fill are specialized (I really mean optimized) for std::vector<bool>.
Is this required by C++ standard or, perhaps, it is common approach by C++ std library vendors?
Simple speaking, I wonder to know if the following code:
std::vector<bool> v(10, false);
std::fill(v.begin(), v.end(), true);
is in any way better/different than that:
std::vector<bool> v(10, false);
for (auto it = v.begin(); it != v.end(); ++it) *it = true;
To be very strict - can, let say: std::fill<std::vector<bool>::iterator>() go into internal representation of std::vector<bool> and sets their entire bytes instead of single bits? I assume making std::fill friend of std::vector<bool> is not a big problem for library vendor?
[UPDATE]
Next related question: can I (or anybody else :) specialize such algorithms for let say std::vector<bool>, if not already specialized? Is this allowed by C++ standard? I know this will be non portable - but just for one selected std C++ library? Assuming I (or anybody else) find a way to get to std::vector<bool> private parts.

STD is headers only library and it is shipped with your compiler. You can look into those headers yourself. For GCC's vector<bool> impelemtation is in stl_bvector.h. It probably will be the same file for other compilers too. And yes, there is specialized fill (look near __fill_bvector).

Optimizations are nowhere mandated in the standard. It is assumed to be a "quality of implementation" issue if an optimization could applied. The asymptotic complexity of most algorithms is, however, restricted.
Optimizations are allowed as long as a correct program behaves according to what the standard mandates. The examples you ask about, i.e., optimizations involving standard algorithms using iterators on std::vector<bool>, can achieve their objective pretty much in any way the implementation sees fit because there is no way to monitor how they are implemented. This said, I doubt very much that there is any standard library implementation optimizing operations on std::vector<bool>. Most people seem to think that this specialization is an abomination in the first place and that it should go away.
A user is only allowed to create specializations of library types if the specialization involves at least one user defined type. I don't think a user is allowed to provide any function in namespace std at all: There isn't any needs because all such functions would involve a user defined type and would, thus, be found in the user's namespace. Formulated differently: I think you are out of luck with respect to getting algoritms optimized for std::vector<bool> for the time being. You might consider contributing optimized versions to the open source implementations (e.g., libstdc++ and libc++), however.

There is no specialization for it, but you can still use it. (even though it's slow)
But here is a trick I found which enables std::fill on std::vector<bool>, using proxy class std::_Vbase.
(WARNING: I've tested it only for MSVC2013, so it may not work on other compilers.)
int num_bits = 100000;
std::vector<bool> bit_set(num_bits , true);
int bitsize_elem = sizeof(std::_Vbase) * 8; // 1byte = 8bits
int num_elems = static_cast<int>(std::ceil(num_bits / static_cast<double>(bitsize_elem)));
Here, since you need the whole bits of an element if you use any bit of it, the number of elements must be rounded up.
Using this information, we will build a vector of pointers that pointing the original elements underlying the bits.
std::vector<std::_Vbase*> elem_ptrs(num_elems, nullptr);
std::vector<bool>::iterator bitset_iter = bit_set.begin();
for (int i = 0; i < num_elems; ++i)
{
std::_Vbase* elem_ptr = const_cast<std::_Vbase*>((*bitset_iter)._Myptr);
elem_ptrs[i] = elem_ptr;
std::advance(bitset_iter, bitsize_elem);
}
(*bitset_iter)._Myptr : By dereferencing the iterator of std::vector<bool>, you can access the proxy class reference and its member _Myptr.
Since the return type of std::vector<bool>::iterator::operator*() is const std::_Vbase*, remove the constness of it by const_cast.
Now we get the pointer which is pointing the original element underlying those bits, std::_Vbase* elem_ptr.
elem_ptrs[i] = elem_ptr : Record this pointer,...
std::advance(bitset_iter, bitsize_elem) : ...and continue our journey to find the next element, by jumping bits held by the previous element.
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0); // fill every bits "false"
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, -1); // fill every bits "true"
Now, we can use std::fill on the vector of pointers, rather than vector of bits.
Perhaps some may feel uncomfortable using the proxy class externally and even remove the constness of it.
But if you don't care about that and want something fast, this is the fastest way.
I did some comparisons below. (made new project, nothing changed config, release, x64)
int it_max = 10; // do it 10 times ...
int num_bits = std::numeric_limits<int>::max(); // 2147483647
std::vector<bool> bit_set(num_bits, true);
for (int it_count = 0; it_count < it_max; ++it_count)
{
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0);
} // Elapse Time : 0.397sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
std::fill(bit_set.begin(), bit_set.end(), false);
} // Elapse Time : 18.734sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
for (int i = 0; i < num_bits; ++i)
{
bit_set[i] = false;
}
} // Elapse Time : 21.498sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
bit_set.assign(num_bits, false);
} // Elapse Time : 21.779sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
bit_set.swap(std::vector<bool>(num_bits, false)); // You can not use elem_ptrs anymore
} // Elapse Time : 1.3sec
There is one caveat. When you swap() the original vector with another one, then the vector of pointers becomes useless!

23.2.5 Class vector from the C++ International Standard goes as far as to tell us
To optimize space allocation, a specialization of vector for bool elements is provided:
after which the bitset specialization is provided. That's as far as the standard goes regarding vector<bool>, vendors need to implement it using a bitset to optimize for space. Optimizing for space comes with a cost here, as to not optimize for speed.
It's easier to get a book from the library than it is to find a book if it were between all the library books stapled closely together in containers....
Take your example, you're trying to do a std::fill or std::copy from begin to end. But that's not always the case, sometimes it doen't just simply map to an entire byte. So, that's a bit of a problem in terms of speed optimization. It's easy for the case you'd have to change every bit to one, that's just changing the bytes to 0xF, but that's not the case here; it becomes much harder if you were to only changes certain bits of a byte. Then you'll need to actually compute what the byte will be; that's not a trivial thing to do*, or at least not as an atomic operation on current hardware.
It's the premature optimization story, it's nice in terms of space but horrible in terms of performance.
Is having a "is a multiple of 8 bits" check worth the overhead? I doubt it.
* We're talking about multiple bits here, for the case it's just one bit you can of course do a bit operation.

Related

Which is the best way to index through a for loop?

I am trying to do a product operand on the values inside of a vector. It is a huge mess of code.. I have posted it previously but no one was able to help. I just wanna confirm which is the correct way to do a single part of it. I currently have:
vector<double> taylorNumerator;
for(a = 0; a <= (constant); a++) {
double Number = equation involving a to get numerous values;
taylorNumerator.push_back(Number);
for(b = 0; b <= (constant); b++) {
double NewNumber *= taylorNumerator[b];
}
This is what I have as a snapshot, it is very short from what I actually have. Someone told me it is better to do vector.at(index) instead. Which is the correct or best way to accomplish this? If you so desire I can paste all of the code, it works but the values I get are wrong.
When possible, you should probably avoid using indexes at all. Your options are:
A range-based for loop:
for (auto numerator : taylorNumerators) { ... }
An iterator-based loop:
for (auto it = taylorNumerators.begin(); it != taylorNuemrators.end(); ++it) { ... }
A standard algorithm, perhaps with a lambda:
#include <algorithm>
std::for_each(taylorNumerators, [&](double numerator) { ... });
In particular, note that some algorithms let you specify a number of iterations, like std::generate_n, so you can create exactly n items without counting to n yourself.
If you need the index in the calculation, then it can be appropriate to use a traditional for loop. You have to watch for a couple pitfalls: std::vector<T>::size() returns a std::vector<T>::size_type which is typically identical to std::size_type, which is (1) unsigned and (2) quite possibly larger than an int.
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) { ... }
Your calculations probably deal with doubles or some numerical type other than std::size_t, so you have to consider the best way to convert it. Many programmers would rely on implicit conversions, but that can be dangerous unless you know the conversion rules very well. I'd generally start by doing a static cast of the index to the type I actually need. For example:
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) {
const auto x = static_cast<double>(i);
/* calculation involving x */
}
In C++, it's probably far more common to make sure the index is in range and then use operator[] rather than to use at(). Many projects disable exceptions, so the safety guarantee of at() wouldn't really be available. And, if you can check the range once yourself, then it'll be faster to use operator[] than to rely on the range-check built into at() on each index operation.
What you have is fine. Modern compilers can optimize the heck out of the above such that the code is just as fast as the equivalent C code of accessing items direclty.
The only optimization for using vector I recommend is to invoke taylorNumerator.reserve(constant) to allocate the needed storage upfront instead of the vector resizing itself as new items are added.
About the only worthy optimization after that is to not use vector at all and just use a static array - especially if constant is small enough that it doesn't blow up the stack (or binary size if global).
double taylorNumerator[constant];

What is the performance of std::bitset?

I recently asked a question on Programmers regarding reasons to use manual bit manipulation of primitive types over std::bitset.
From that discussion I have concluded that the main reason is its comparatively poorer performance, although I'm not aware of any measured basis for this opinion. So next question is:
what is the performance hit, if any, likely to be incurred by using std::bitset over bit-manipulation of a primitive?
The question is intentionally broad, because after looking online I haven't been able to find anything, so I'll take what I can get. Basically I'm after a resource that provides some profiling of std::bitset vs 'pre-bitset' alternatives to the same problems on some common machine architecture using GCC, Clang and/or VC++. There is a very comprehensive paper which attempts to answer this question for bit vectors:
http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf
Unfortunately, it either predates or considered out of scope std::bitset, so it focuses on vectors/dynamic array implementations instead.
I really just want to know whether std::bitset is better than the alternatives for the use cases it is intended to solve. I already know that it is easier and clearer than bit-fiddling on an integer, but is it as fast?
Update
It's been ages since I posted this one, but:
I already know that it is easier and clearer than bit-fiddling on an
integer, but is it as fast?
If you are using bitset in a way that does actually make it clearer and cleaner than bit-fiddling, like checking for one bit at a time instead of using a bit mask, then inevitably you lose all those benefits that bitwise operations provide, like being able to check to see if 64 bits are set at one time against a mask, or using FFS instructions to quickly determine which bit is set among 64-bits.
I'm not sure that bitset incurs a penalty to use in all ways possible (ex: using its bitwise operator&), but if you use it like a fixed-size boolean array which is pretty much the way I always see people using it, then you generally lose all those benefits described above. We unfortunately can't get that level of expressiveness of just accessing one bit at a time with operator[] and have the optimizer figure out all the bitwise manipulations and FFS and FFZ and so forth going on for us, at least not since the last time I checked (otherwise bitset would be one of my favorite structures).
Now if you are going to use bitset<N> bits interchangeably with like, say, uint64_t bits[N/64] as in accessing both the same way using bitwise operations, it might be on par (haven't checked since this ancient post). But then you lose many of the benefits of using bitset in the first place.
for_each method
In the past I got into some misunderstandings, I think, when I proposed a for_each method to iterate through things like vector<bool>, deque, and bitset. The point of such a method is to utilize the internal knowledge of the container to iterate through elements more efficiently while invoking a functor, just as some associative containers offer a find method of their own instead of using std::find to do a better than linear-time search.
For example, you can iterate through all set bits of a vector<bool> or bitset if you had internal knowledge of these containers by checking for 64 elements at a time using a 64-bit mask when 64 contiguous indices are occupied, and likewise use FFS instructions when that's not the case.
But an iterator design having to do this type of scalar logic in operator++ would inevitably have to do something considerably more expensive, just by the nature in which iterators are designed in these peculiar cases. bitset lacks iterators outright and that often makes people wanting to use it to avoid dealing with bitwise logic to use operator[] to check each bit individually in a sequential loop that just wants to find out which bits are set. That too is not nearly as efficient as what a for_each method implementation could do.
Double/Nested Iterators
Another alternative to the for_each container-specific method proposed above would be to use double/nested iterators: that is, an outer iterator which points to a sub-range of a different type of iterator. Client code example:
for (auto outer_it = bitset.nbegin(); outer_it != bitset.nend(); ++outer_it)
{
for (auto inner_it = outer_it->first; inner_it != outer_it->last; ++inner_it)
// do something with *inner_it (bit index)
}
While not conforming to the flat type of iterator design available now in standard containers, this can allow some very interesting optimizations. As an example, imagine a case like this:
bitset<64> bits = 0x1fbf; // 0b1111110111111;
In that case, the outer iterator can, with just a few bitwise iterations ((FFZ/or/complement), deduce that the first range of bits to process would be bits [0, 6), at which point we can iterate through that sub-range very cheaply through the inner/nested iterator (it would just increment an integer, making ++inner_it equivalent to just ++int). Then when we increment the outer iterator, it can then very quickly, and again with a few bitwise instructions, determine that the next range would be [7, 13). After we iterate through that sub-range, we're done. Take this as another example:
bitset<16> bits = 0xffff;
In such a case, the first and last sub-range would be [0, 16), and the bitset could determine that with a single bitwise instruction at which point we can iterate through all set bits and then we're done.
This type of nested iterator design would map particularly well to vector<bool>, deque, and bitset as well as other data structures people might create like unrolled lists.
I say that in a way that goes beyond just armchair speculation, since I have a set of data structures which resemble the likes of deque which are actually on par with sequential iteration of vector (still noticeably slower for random-access, especially if we're just storing a bunch of primitives and doing trivial processing). However, to achieve the comparable times to vector for sequential iteration, I had to use these types of techniques (for_each method and double/nested iterators) to reduce the amount of processing and branching going on in each iteration. I could not rival the times otherwise using just the flat iterator design and/or operator[]. And I'm certainly not smarter than the standard library implementers but came up with a deque-like container which can be sequentially iterated much faster, and that strongly suggests to me that it's an issue with the standard interface design of iterators in this case which come with some overhead in these peculiar cases that the optimizer cannot optimize away.
Old Answer
I'm one of those who would give you a similar performance answer, but I'll try to give you something a bit more in-depth than "just because". It is something I came across through actual profiling and timing, not merely distrust and paranoia.
One of the biggest problems with bitset and vector<bool> is that their interface design is "too convenient" if you want to use them like an array of booleans. Optimizers are great at obliterating all that structure you establish to provide safety, reduce maintenance cost, make changes less intrusive, etc. They do an especially fine job with selecting instructions and allocating the minimal number of registers to make such code run as fast as the not-so-safe, not-so-easy-to-maintain/change alternatives.
The part that makes the bitset interface "too convenient" at the cost of efficiency is the random-access operator[] as well as the iterator design for vector<bool>. When you access one of these at index n, the code has to first figure out which byte the nth bit belongs to, and then the sub-index to the bit within that. That first phase typically involves a division/rshifts against an lvalue along with modulo/bitwise and which is more costly than the actual bit operation you're trying to perform.
The iterator design for vector<bool> faces a similar awkward dilemma where it either has to branch into different code every 8+ times you iterate through it or pay that kind of indexing cost described above. If the former is done, it makes the logic asymmetrical across iterations, and iterator designs tend to take a performance hit in those rare cases. To exemplify, if vector had a for_each method of its own, you could iterate through, say, a range of 64 elements at once by just masking the bits against a 64-bit mask for vector<bool> if all the bits are set without checking each bit individually. It could even use FFS to figure out the range all at once. An iterator design would tend to inevitably have to do it in a scalar fashion or store more state which has to be redundantly checked every iteration.
For random access, optimizers can't seem to optimize away this indexing overhead to figure out which byte and relative bit to access (perhaps a bit too runtime-dependent) when it's not needed, and you tend to see significant performance gains with that more manual code processing bits sequentially with advanced knowledge of which byte/word/dword/qword it's working on. It's somewhat of an unfair comparison, but the difficulty with std::bitset is that there's no way to make a fair comparison in such cases where the code knows what byte it wants to access in advance, and more often than not, you tend to have this info in advance. It's an apples to orange comparison in the random-access case, but you often only need oranges.
Perhaps that wouldn't be the case if the interface design involved a bitset where operator[] returned a proxy, requiring a two-index access pattern to use. For example, in such a case, you would access bit 8 by writing bitset[0][6] = true; bitset[0][7] = true; with a template parameter to indicate the size of the proxy (64-bits, e.g.). A good optimizer may be able to take such a design and make it rival the manual, old school kind of way of doing the bit manipulation by hand by translating that into: bitset |= 0x60;
Another design that might help is if bitsets provided a for_each_bit kind of method, passing a bit proxy to the functor you provide. That might actually be able to rival the manual method.
std::deque has a similar interface problem. Its performance shouldn't be that much slower than std::vector for sequential access. Yet unfortunately we access it sequentially using operator[] which is designed for random access or through an iterator, and the internal rep of deques simply don't map very efficiently to an iterator-based design. If deque provided a for_each kind of method of its own, then there it could potentially start to get a lot closer to std::vector's sequential access performance. These are some of the rare cases where that Sequence interface design comes with some efficiency overhead that optimizers often can't obliterate. Often good optimizers can make convenience come free of runtime cost in a production build, but unfortunately not in all cases.
Sorry!
Also sorry, in retrospect I wandered a bit with this post talking about vector<bool> and deque in addition to bitset. It's because we had a codebase where the use of these three, and particularly iterating through them or using them with random-access, were often hotspots.
Apples to Oranges
As emphasized in the old answer, comparing straightforward usage of bitset to primitive types with low-level bitwise logic is comparing apples to oranges. It's not like bitset is implemented very inefficiently for what it does. If you genuinely need to access a bunch of bits with a random access pattern which, for some reason or other, needs to check and set just one bit a time, then it might be ideally implemented for such a purpose. But my point is that almost all use cases I've encountered didn't require that, and when it's not required, the old school way involving bitwise operations tends to be significantly more efficient.
Did a short test profiling std::bitset vs bool arrays for sequential and random access - you can too:
#include <iostream>
#include <bitset>
#include <cstdlib> // rand
#include <ctime> // timer
inline unsigned long get_time_in_ms()
{
return (unsigned long)((double(clock()) / CLOCKS_PER_SEC) * 1000);
}
void one_sec_delay()
{
unsigned long end_time = get_time_in_ms() + 1000;
while(get_time_in_ms() < end_time)
{
}
}
int main(int argc, char **argv)
{
srand(get_time_in_ms());
using namespace std;
bitset<5000000> bits;
bool *bools = new bool[5000000];
unsigned long current_time, difference1, difference2;
double total;
one_sec_delay();
total = 0;
current_time = get_time_in_ms();
for (unsigned int num = 0; num != 200000000; ++num)
{
bools[rand() % 5000000] = rand() % 2;
}
difference1 = get_time_in_ms() - current_time;
current_time = get_time_in_ms();
for (unsigned int num2 = 0; num2 != 100; ++num2)
{
for (unsigned int num = 0; num != 5000000; ++num)
{
total += bools[num];
}
}
difference2 = get_time_in_ms() - current_time;
cout << "Bool:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;
one_sec_delay();
total = 0;
current_time = get_time_in_ms();
for (unsigned int num = 0; num != 200000000; ++num)
{
bits[rand() % 5000000] = rand() % 2;
}
difference1 = get_time_in_ms() - current_time;
current_time = get_time_in_ms();
for (unsigned int num2 = 0; num2 != 100; ++num2)
{
for (unsigned int num = 0; num != 5000000; ++num)
{
total += bits[num];
}
}
difference2 = get_time_in_ms() - current_time;
cout << "Bitset:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;
delete [] bools;
cin.get();
return 0;
}
Please note: the outputting of the sum total is necessary so the compiler doesn't optimise out the for loop - which some do if the result of the loop isn't used.
Under GCC x64 with the following flags: -O2;-Wall;-march=native;-fomit-frame-pointer;-std=c++11;
I get the following results:
Bool array:
random access time = 4695, sequential access time = 390
Bitset:
random access time = 5382, sequential access time = 749
Not a great answer here, but rather a related anecdote:
A few years ago I was working on real-time software and we ran into scheduling problems. There was a module which was way over time-budget, and this was very surprising because the module was only responsible for some mapping and packing/unpacking of bits into/from 32-bit words.
It turned out that the module was using std::bitset. We replaced this with manual operations and the execution time decreased from 3 milliseconds to 25 microseconds. That was a significant performance issue and a significant improvement.
The point is, the performance issues caused by this class can be very real.
In addition to what the other answers said about the performance of access, there may also be a significant space overhead: Typical bitset<> implementations simply use the longest integer type to back their bits. Thus, the following code
#include <bitset>
#include <stdio.h>
struct Bitfield {
unsigned char a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;
};
struct Bitset {
std::bitset<8> bits;
};
int main() {
printf("sizeof(Bitfield) = %zd\n", sizeof(Bitfield));
printf("sizeof(Bitset) = %zd\n", sizeof(Bitset));
printf("sizeof(std::bitset<1>) = %zd\n", sizeof(std::bitset<1>));
}
produces the following output on my machine:
sizeof(Bitfield) = 1
sizeof(Bitset) = 8
sizeof(std::bitset<1>) = 8
As you see, my compiler allocates a whopping 64 bits to store a single one, with the bitfield approach, I only need to round up to eight bits.
This factor eight in space usage can become important if you have a lot of small bitsets.
Rhetorical question: Why std::bitset is written in that inefficacy way?
Answer: It is not.
Another rhetorical question: What is difference between:
std::bitset<128> a = src;
a[i] = true;
a = a << 64;
and
std::bitset<129> a = src;
a[i] = true;
a = a << 63;
Answer: 50 times difference in performance http://quick-bench.com/iRokweQ6JqF2Il-T-9JSmR0bdyw
You need be very careful what you ask for, bitset support lot of things but each have it own cost. With correct handling you will have exactly same behavior as raw code:
void f(std::bitset<64>& b, int i)
{
b |= 1L << i;
b = b << 15;
}
void f(unsigned long& b, int i)
{
b |= 1L << i;
b = b << 15;
}
Both generate same assembly: https://godbolt.org/g/PUUUyd (64 bit GCC)
Another thing is that bitset is more portable but this have cost too:
void h(std::bitset<64>& b, unsigned i)
{
b = b << i;
}
void h(unsigned long& b, unsigned i)
{
b = b << i;
}
If i > 64 then bit set will be zero and in case of unsigned we have UB.
void h(std::bitset<64>& b, unsigned i)
{
if (i < 64) b = b << i;
}
void h(unsigned long& b, unsigned i)
{
if (i < 64) b = b << i;
}
With check preventing UB both generate same code.
Another place is set and [], first one is safe and mean you will never get UB but this will cost you a branch. [] have UB if you use wrong value but is fast as using var |= 1L<< i;. Of corse if std::bitset do not need have more bits than biggest int available on system because other wise you need split value to get correct element in internal table. This mean for std::bitset<N> size N is very important for performance. If is bigger or smaller than optimal one you will pay cost of it.
Overall I find that best way is use something like that:
constexpr size_t minBitSet = sizeof(std::bitset<1>)*8;
template<size_t N>
using fasterBitSet = std::bitset<minBitSet * ((N + minBitSet - 1) / minBitSet)>;
This will remove cost of trimming exceeding bits: http://quick-bench.com/Di1tE0vyhFNQERvucAHLaOgucAY

Iterating over a vector in C++ [duplicate]

Take the following two lines of code:
for (int i = 0; i < some_vector.size(); i++)
{
//do stuff
}
And this:
for (some_iterator = some_vector.begin(); some_iterator != some_vector.end();
some_iterator++)
{
//do stuff
}
I'm told that the second way is preferred. Why exactly is this?
The first form is efficient only if vector.size() is a fast operation. This is true for vectors, but not for lists, for example. Also, what are you planning to do within the body of the loop? If you plan on accessing the elements as in
T elem = some_vector[i];
then you're making the assumption that the container has operator[](std::size_t) defined. Again, this is true for vector but not for other containers.
The use of iterators bring you closer to container independence. You're not making assumptions about random-access ability or fast size() operation, only that the container has iterator capabilities.
You could enhance your code further by using standard algorithms. Depending on what it is you're trying to achieve, you may elect to use std::for_each(), std::transform() and so on. By using a standard algorithm rather than an explicit loop you're avoiding re-inventing the wheel. Your code is likely to be more efficient (given the right algorithm is chosen), correct and reusable.
It's part of the modern C++ indoctrination process. Iterators are the only way to iterate most containers, so you use it even with vectors just to get yourself into the proper mindset. Seriously, that's the only reason I do it - I don't think I've ever replaced a vector with a different kind of container.
Wow, this is still getting downvoted after three weeks. I guess it doesn't pay to be a little tongue-in-cheek.
I think the array index is more readable. It matches the syntax used in other languages, and the syntax used for old-fashioned C arrays. It's also less verbose. Efficiency should be a wash if your compiler is any good, and there are hardly any cases where it matters anyway.
Even so, I still find myself using iterators frequently with vectors. I believe the iterator is an important concept, so I promote it whenever I can.
because you are not tying your code to the particular implementation of the some_vector list. if you use array indices, it has to be some form of array; if you use iterators you can use that code on any list implementation.
Imagine some_vector is implemented with a linked-list. Then requesting an item in the i-th place requires i operations to be done to traverse the list of nodes. Now, if you use iterator, generally speaking, it will make its best effort to be as efficient as possible (in the case of a linked list, it will maintain a pointer to the current node and advance it in each iteration, requiring just a single operation).
So it provides two things:
Abstraction of use: you just want to iterate some elements, you don't care about how to do it
Performance
I'm going to be the devils advocate here, and not recommend iterators. The main reason why, is all the source code I've worked on from Desktop application development to game development have i nor have i needed to use iterators. All the time they have not been required and secondly the hidden assumptions and code mess and debugging nightmares you get with iterators make them a prime example not to use it in any applications that require speed.
Even from a maintence stand point they're a mess. Its not because of them but because of all the aliasing that happen behind the scene. How do i know that you haven't implemented your own virtual vector or array list that does something completely different to the standards. Do i know what type is currently now during runtime? Did you overload a operator I didn't have time to check all your source code. Hell do i even know what version of the STL your using?
The next problem you got with iterators is leaky abstraction, though there are numerous web sites that discuss this in detail with them.
Sorry, I have not and still have not seen any point in iterators. If they abstract the list or vector away from you, when in fact you should know already what vector or list your dealing with if you don't then your just going to be setting yourself up for some great debugging sessions in the future.
You might want to use an iterator if you are going to add/remove items to the vector while you are iterating over it.
some_iterator = some_vector.begin();
while (some_iterator != some_vector.end())
{
if (/* some condition */)
{
some_iterator = some_vector.erase(some_iterator);
// some_iterator now positioned at the element after the deleted element
}
else
{
if (/* some other condition */)
{
some_iterator = some_vector.insert(some_iterator, some_new_value);
// some_iterator now positioned at new element
}
++some_iterator;
}
}
If you were using indices you would have to shuffle items up/down in the array to handle the insertions and deletions.
Separation of Concerns
It's very nice to separate the iteration code from the 'core' concern of the loop. It's almost a design decision.
Indeed, iterating by index ties you to the implementation of the container. Asking the container for a begin and end iterator, enables the loop code for use with other container types.
Also, in the std::for_each way, you TELL the collection what to do, instead of ASKing it something about its internals
The 0x standard is going to introduce closures, which will make this approach much more easy to use - have a look at the expressive power of e.g. Ruby's [1..6].each { |i| print i; }...
Performance
But maybe a much overseen issue is that, using the for_each approach yields an opportunity to have the iteration parallelized - the intel threading blocks can distribute the code block over the number of processors in the system!
Note: after discovering the algorithms library, and especially foreach, I went through two or three months of writing ridiculously small 'helper' operator structs which will drive your fellow developers crazy. After this time, I went back to a pragmatic approach - small loop bodies deserve no foreach no more :)
A must read reference on iterators is the book "Extended STL".
The GoF have a tiny little paragraph in the end of the Iterator pattern, which talks about this brand of iteration; it's called an 'internal iterator'. Have a look here, too.
Because it is more object-oriented. if you are iterating with an index you are assuming:
a) that those objects are ordered
b) that those objects can be obtained by an index
c) that the index increment will hit every item
d) that that index starts at zero
With an iterator, you are saying "give me everything so I can work with it" without knowing what the underlying implementation is. (In Java, there are collections that cannot be accessed through an index)
Also, with an iterator, no need to worry about going out of bounds of the array.
Another nice thing about iterators is that they better allow you to express (and enforce) your const-preference. This example ensures that you will not be altering the vector in the midst of your loop:
for(std::vector<Foo>::const_iterator pos=foos.begin(); pos != foos.end(); ++pos)
{
// Foo & foo = *pos; // this won't compile
const Foo & foo = *pos; // this will compile
}
Aside from all of the other excellent answers... int may not be large enough for your vector. Instead, if you want to use indexing, use the size_type for your container:
for (std::vector<Foo>::size_type i = 0; i < myvector.size(); ++i)
{
Foo& this_foo = myvector[i];
// Do stuff with this_foo
}
I probably should point out you can also call
std::for_each(some_vector.begin(), some_vector.end(), &do_stuff);
STL iterators are mostly there so that the STL algorithms like sort can be container independent.
If you just want to loop over all the entries in a vector just use the index loop style.
It is less typing and easier to parse for most humans. It would be nice if C++ had a simple foreach loop without going overboard with template magic.
for( size_t i = 0; i < some_vector.size(); ++i )
{
T& rT = some_vector[i];
// now do something with rT
}
'
I don't think it makes much difference for a vector. I prefer to use an index myself as I consider it to be more readable and you can do random access like jumping forward 6 items or jumping backwards if needs be.
I also like to make a reference to the item inside the loop like this so there are not a lot of square brackets around the place:
for(size_t i = 0; i < myvector.size(); i++)
{
MyClass &item = myvector[i];
// Do stuff to "item".
}
Using an iterator can be good if you think you might need to replace the vector with a list at some point in the future and it also looks more stylish to the STL freaks but I can't think of any other reason.
The second form represents what you're doing more accurately. In your example, you don't care about the value of i, really - all you want is the next element in the iterator.
After having learned a little more on the subject of this answer, I realize it was a bit of an oversimplification. The difference between this loop:
for (some_iterator = some_vector.begin(); some_iterator != some_vector.end();
some_iterator++)
{
//do stuff
}
And this loop:
for (int i = 0; i < some_vector.size(); i++)
{
//do stuff
}
Is fairly minimal. In fact, the syntax of doing loops this way seems to be growing on me:
while (it != end){
//do stuff
++it;
}
Iterators do unlock some fairly powerful declarative features, and when combined with the STL algorithms library you can do some pretty cool things that are outside the scope of array index administrivia.
Indexing requires an extra mul operation. For example, for vector<int> v, the compiler converts v[i] into &v + sizeof(int) * i.
During iteration you don't need to know number of item to be processed. You just need the item and iterators do such things very good.
No one mentioned yet that one advantage of indices is that they are not become invalid when you append to a contiguous container like std::vector, so you can add items to the container during iteration.
This is also possible with iterators, but you must call reserve(), and therefore need to know how many items you'll append.
If you have access to C++11 features, then you can also use a range-based for loop for iterating over your vector (or any other container) as follows:
for (auto &item : some_vector)
{
//do stuff
}
The benefit of this loop is that you can access elements of the vector directly via the item variable, without running the risk of messing up an index or making a making a mistake when dereferencing an iterator. In addition, the placeholder auto prevents you from having to repeat the type of the container elements,
which brings you even closer to a container-independent solution.
Notes:
If you need the the element index in your loop and the operator[] exists for your container (and is fast enough for you), then better go for your first way.
A range-based for loop cannot be used to add/delete elements into/from a container. If you want to do that, then better stick to the solution given by Brian Matthews.
If you don't want to change the elements in your container, then you should use the keyword const as follows: for (auto const &item : some_vector) { ... }.
Several good points already. I have a few additional comments:
Assuming we are talking about the C++ standard library, "vector" implies a random access container that has the guarantees of C-array (random access, contiguos memory layout etc). If you had said 'some_container', many of the above answers would have been more accurate (container independence etc).
To eliminate any dependencies on compiler optimization, you could move some_vector.size() out of the loop in the indexed code, like so:
const size_t numElems = some_vector.size();
for (size_t i = 0; i
Always pre-increment iterators and treat post-increments as exceptional cases.
for (some_iterator = some_vector.begin(); some_iterator != some_vector.end(); ++some_iterator){ //do stuff }
So assuming and indexable std::vector<> like container, there is no good reason to prefer one over other, sequentially going through the container. If you have to refer to older or newer elemnent indexes frequently, then the indexed version is more appropropriate.
In general, using the iterators is preferred because algorithms make use of them and behavior can be controlled (and implicitly documented) by changing the type of the iterator. Array locations can be used in place of iterators, but the syntactical difference will stick out.
I don't use iterators for the same reason I dislike foreach-statements. When having multiple inner-loops it's hard enough to keep track of global/member variables without having to remember all the local values and iterator-names as well. What I find useful is to use two sets of indices for different occasions:
for(int i=0;i<anims.size();i++)
for(int j=0;j<bones.size();j++)
{
int animIndex = i;
int boneIndex = j;
// in relatively short code I use indices i and j
... animation_matrices[i][j] ...
// in long and complicated code I use indices animIndex and boneIndex
... animation_matrices[animIndex][boneIndex] ...
}
I don't even want to abbreviate things like "animation_matrices[i]" to some random "anim_matrix"-named-iterator for example, because then you can't see clearly from which array this value is originated.
If you like being close to the metal / don't trust their implementation details, don't use iterators.
If you regularly switch out one collection type for another during development, use iterators.
If you find it difficult to remember how to iterate different sorts of collections (maybe you have several types from several different external sources in use), use iterators to unify the means by which you walk over elements. This applies to say switching a linked list with an array list.
Really, that's all there is to it. It's not as if you're going to gain more brevity either way on average, and if brevity really is your goal, you can always fall back on macros.
Even better than "telling the CPU what to do" (imperative) is "telling the libraries what you want" (functional).
So instead of using loops you should learn the algorithms present in stl.
For container independence
I always use array index because many application of mine require something like "display thumbnail image". So I wrote something like this:
some_vector[0].left=0;
some_vector[0].top =0;<br>
for (int i = 1; i < some_vector.size(); i++)
{
some_vector[i].left = some_vector[i-1].width + some_vector[i-1].left;
if(i % 6 ==0)
{
some_vector[i].top = some_vector[i].top.height + some_vector[i].top;
some_vector[i].left = 0;
}
}
Both the implementations are correct, but I would prefer the 'for' loop. As we have decided to use a Vector and not any other container, using indexes would be the best option. Using iterators with Vectors would lose the very benefit of having the objects in continuous memory blocks which help ease in their access.
I felt that none of the answers here explain why I like iterators as a general concept over indexing into containers. Note that most of my experience using iterators doesn't actually come from C++ but from higher-level programming languages like Python.
The iterator interface imposes fewer requirements on consumers of your function, which allows consumers to do more with it.
If all you need is to be able to forward-iterate, the developer isn't limited to using indexable containers - they can use any class implementing operator++(T&), operator*(T) and operator!=(const &T, const &T).
#include <iostream>
template <class InputIterator>
void printAll(InputIterator& begin, InputIterator& end)
{
for (auto current = begin; current != end; ++current) {
std::cout << *current << "\n";
}
}
// elsewhere...
printAll(myVector.begin(), myVector.end());
Your algorithm works for the case you need it - iterating over a vector - but it can also be useful for applications you don't necessarily anticipate:
#include <random>
class RandomIterator
{
private:
std::mt19937 random;
std::uint_fast32_t current;
std::uint_fast32_t floor;
std::uint_fast32_t ceil;
public:
RandomIterator(
std::uint_fast32_t floor = 0,
std::uint_fast32_t ceil = UINT_FAST32_MAX,
std::uint_fast32_t seed = std::mt19937::default_seed
) :
floor(floor),
ceil(ceil)
{
random.seed(seed);
++(*this);
}
RandomIterator& operator++()
{
current = floor + (random() % (ceil - floor));
}
std::uint_fast32_t operator*() const
{
return current;
}
bool operator!=(const RandomIterator &that) const
{
return current != that.current;
}
};
int main()
{
// roll a 1d6 until we get a 6 and print the results
RandomIterator firstRandom(1, 7, std::random_device()());
RandomIterator secondRandom(6, 7);
printAll(firstRandom, secondRandom);
return 0;
}
Attempting to implement a square-brackets operator which does something similar to this iterator would be contrived, while the iterator implementation is relatively simple. The square-brackets operator also makes implications about the capabilities of your class - that you can index to any arbitrary point - which may be difficult or inefficient to implement.
Iterators also lend themselves to decoration. People can write iterators which take an iterator in their constructor and extend its functionality:
template<class InputIterator, typename T>
class FilterIterator
{
private:
InputIterator internalIterator;
public:
FilterIterator(const InputIterator &iterator):
internalIterator(iterator)
{
}
virtual bool condition(T) = 0;
FilterIterator<InputIterator, T>& operator++()
{
do {
++(internalIterator);
} while (!condition(*internalIterator));
return *this;
}
T operator*()
{
// Needed for the first result
if (!condition(*internalIterator))
++(*this);
return *internalIterator;
}
virtual bool operator!=(const FilterIterator& that) const
{
return internalIterator != that.internalIterator;
}
};
template <class InputIterator>
class EvenIterator : public FilterIterator<InputIterator, std::uint_fast32_t>
{
public:
EvenIterator(const InputIterator &internalIterator) :
FilterIterator<InputIterator, std::uint_fast32_t>(internalIterator)
{
}
bool condition(std::uint_fast32_t n)
{
return !(n % 2);
}
};
int main()
{
// Rolls a d20 until a 20 is rolled and discards odd rolls
EvenIterator<RandomIterator> firstRandom(RandomIterator(1, 21, std::random_device()()));
EvenIterator<RandomIterator> secondRandom(RandomIterator(20, 21));
printAll(firstRandom, secondRandom);
return 0;
}
While these toys might seem mundane, it's not difficult to imagine using iterators and iterator decorators to do powerful things with a simple interface - decorating a forward-only iterator of database results with an iterator which constructs a model object from a single result, for example. These patterns enable memory-efficient iteration of infinite sets and, with a filter like the one I wrote above, potentially lazy evaluation of results.
Part of the power of C++ templates is your iterator interface, when applied to the likes of fixed-length C arrays, decays to simple and efficient pointer arithmetic, making it a truly zero-cost abstraction.

The simple task of iterating through an array. Which of these solutions is the most efficient?

Recently, I've been thinking about all the ways that one could iterate through an array and wondered which of these is the most (and least) efficient. I've written a hypothetical problem and five possible solutions.
Problem
Given an int array arr with len number of elements, what would be the most efficient way of assigning an arbitrary number 42 to every element?
Solution 0: The Obvious
for (unsigned i = 0; i < len; ++i)
arr[i] = 42;
Solution 1: The Obvious in Reverse
for (unsigned i = len - 1; i >= 0; --i)
arr[i] = 42;
Solution 2: Address and Iterator
for (unsigned i = 0; i < len; ++i)
{ *arr = 42;
++arr;
}
Solution 3: Address and Iterator in Reverse
for (unsigned i = len; i; --i)
{ *arr = 42;
++arr;
}
Solution 4: Address Madness
int* end = arr + len;
for (; arr < end; ++arr)
*arr = 42;
Conjecture
The obvious solutions are almost always used, but I wonder whether the subscript operator could result in a multiplication instruction, as if it had been written like *(arr + i * sizeof(int)) = 42.
The reverse solutions try to take advantage of how comparing i to 0 instead of len might mitigate a subtraction operation. Because of this, I prefer Solution 3 over Solution 2. Also, I've read that arrays are optimized to be accessed forwards because of how they're stored in the cache, which could present an issue with Solution 1.
I don't see why Solution 4 would be any less efficient than Solution 2. Solution 2 increments the address and the iterator, while Solution 4 only increments the address.
In the end, I'm not sure which of these solutions I prefer. I'm think the answer also varies with the target architecture and optimization settings of your compiler.
Which of these do you prefer, if any?
Just use std::fill.
std::fill(arr, arr + len, 42);
Out of your proposed solutions, on a good compiler, neither should be faster than the others.
The ISO standard doesn't mandate the efficiency of the different ways of doing things in code (other than certain big-O type stuff for some collection algorithms), it simply mandates how it functions.
Unless your arrays are billions of elements in size, or you're wanting to set them millions of times per minute, it generally won't make the slightest difference which method you use.
If you really want to know (and I still maintain it's almost certainly unnecessary), you should benchmark the various methods in the target environment. Measure, don't guess!
As to which I prefer, my first inclination is to optimise for readability. Only if there's a specific performance problem do I then consider other possibilities. That would be simply something like:
for (size_t idx = 0; idx < len; idx++)
arr[idx] = 42;
I don't think that performance is an issue here - those are, if at all (I could imagine the compiler producing the identical assembly for most of them), micro optimizations hardly ever necessary.
Go with the solution that is most readable; the standard library provides you with std::fill, or for more complex assignments
for(unsigned k = 0; k < len; ++k)
{
// whatever
}
so it is obvious to other people looking at your code what you are doing. With C++11 you could also
for(auto & elem : arr)
{
// whatever
}
just don't try to obfuscate your code without any necessity.
For nearly all meaningful cases, the compiler will optimize all of the suggested ones to the same thing, and it's very unlikely to make any difference.
There used to be a trick where you could avoid the automatic prefetching of data if you ran the loop backwards, which under some bizarre set of circumstances actually made it more efficient. I can't recall the exact circumstances, but I expect modern processors will identify backwards loops as well as forwards loops for automatic prefetching anyway.
If it's REALLY important for your application to do this over a large number of elements, then looking at blocked access and using non-temporal storage will be the most efficient. But before you do that, make sure you have identified the filling of the array as an important performance point, and then make measurements for the current code and the improved code.
I may come back with some actual benchmarks to prove that "it makes little difference" in a bit, but I've got an errand to run before it gets too late in the day...

C++ STL: Which method of iteration over a STL container is better?

This may seem frivolous to some of you, but which of the following 2 methods of iteration over a STL container is better? Why?
class Elem;
typedef vector<Elem> ElemVec;
ElemVec elemVec;
// Method 0
for (ElemVec::iterator i = elemVec.begin(); i != elemVec.end(); ++i)
{
Elem& e = *i;
// Do something
}
// Method 1
for (int i = 0; i < elemVec.size(); ++i)
{
Elem& e = elemVec.at(i);
// Do something
}
Method 0 seems like cleaner STL, but Method 1 achieves the same with lesser code. Simple iteration over a container is what appears all over the place in any source code. So, I'm inclined to pick Method 1 which seems to reduce visual clutter and code size.
PS: I know iterators can do much more than a simple index. But, please keep the reply/discussion focused on simple iteration over a container like shown above.
The first version works with any container and so is more useful in template functions that take any container a s a parameter. It is also conceivably slightly more efficient, even for vectors.
The second version only works for vectors and other integer-indexed containers. It'd somewhat more idiomatic for those containers, will be easily understood by newcomers to C++, and is useful if you need to do something else with the index, which is not uncommon.
As usual, there is no simple "this one is better" answer, I'm afraid.
If you don't mind a (very?) small loss of efficiency, i'd recommend using Boost.Foreach
BOOST_FOREACH( Elem& e, elemVec )
{
// Your code
}
Method 0 is faster and therefore recommended.
Method 1 uses size() which is allowed to be O(1), depending on the container and the stl implementation.
The following method of iteration over a standard library container is best.
Use c++11 (and beyond)'s range-based for-loop with the auto specifier:
// Method 2
for (auto& e: elemVec)
{
// Do something with e...
}
This is similar to your Method 0 but requires less typing, less maintenence and works with any container compatible with std::begin() and std::end(), including plain-old arrays.
Some more advantages of method 0:
if you move from vector to another
container the loop remains the same,
easy to move from iterator to
const_iterator if you need,
when c++0x will arive, auto
typing will reduce some of the code clutter.
The main disadvantage is that in many cases you scan two containers, in which case an index is cleaner than keeping two iterators.
Method 0, for several reasons.
It better expresses your intent, which aids compiler optimizations as well as readability
Off-by-one errors are less likely
It works even if you replace the vector with a list, which doesn't have operator[]
Of course, the best solution will often be solution 2: One of the std algorithms. std::for_each, std::transform, std::copy or whatever else you need. This has some further advantages:
It expresses your intent even better, and allows some significant compiler optimizations (MS's secure SCL performs bounds checking on your methods 0 and 1, but will skip it on std algorithms)
It's less code (at the call site, at least. Of course you have to write a functor or something to replace the loop body, but at the use site, the code gets cleaned up quite a bit, which is probably where it matters most.
In general, avoid overspecifying your code. Specify exactly what you want done, and nothing else. The std algorithms are usually the way to go there, but even without them, if you don't need the index i, why have it? Use iterators instead, in that case.
Coincidentally I made a speed test recently (filling 10 * 1024 * 1024 ints with rand() ).
These are 3 runs, time in nano-seconds
vect[i] time : 373611869
vec.at(i) time : 473297793
*it = time : 446818590
arr[i] time : 390357294
*ptr time : 356895778
UPDATE : added stl-algorithm std::generate, which seems to run the fastest, because of special iterator-optimizing (VC++2008). time in micro-seconds.
vect[i] time : 393951
vec.at(i) time : 551387
*it = time : 596080
generate = time : 346591
arr[i] time : 375432
*ptr time : 334612
Conclusion : Use standard-algorithms, they might be faster than a explicit loop ! (and also good practice)
Update : the above times were in a I/O-bound situation, I did the same tests with a CPU-bound (iterate over a relatively short vector, which should fit in cache repeatedly, multiply each element by 2 and write back to vector)
//Visual Studio 2008 Express Edition
vect[i] time : 1356811
vec.at(i) time : 7760148
*it = time : 4913112
for_each = time : 455713
arr[i] time : 446280
*ptr time : 429595
//GCC
vect[i] time : 431039
vec.at(i) time : 2421283
*it = time : 381400
for_each = time : 380972
arr[i] time : 363563
*ptr time : 365971
Interestingly iterators and operator[] is considerably slower in VC++ compared to for_each (which seems to degrade the iterators to pointers through some template-magic for performance).
In GCC access times are only worse for at(), which is normal, because it's the only range-checked function of the tests.
It depends on which type of container. For a vector, it probably doesn't matter which you use. Method 0 has become more idiomatic, but their isn't a big difference, as everyone says.
If you decided to use a list, instead, method 1 would, in principle, be O(N), but in fact there is no list at() method, so you can't even do it that way. (So at some level your question stacks the deck.)
But that in itself is another argument for method 0: it uses the same syntax for different containers.
A possibility not considered above: depending on the details of "Do something", one can have method 0 and method 1 simultaneously, you don't have to choose:
for (auto i = elemVec.begin(), ii = 0; ii < elemVec.size(); ++i, ++ii)
{
// Do something with either the iterator i or the index ii
}
This way, finding the index or accessing the corresponding member are both obtained with trivial complexity.