how can I get the fastest iteration possible for some calculus intensive code? - c++

I'm using a QLinkedList to store some class I wrote.
The fact is I must iterate a lot over this list.
By a lot I mean the program I write makes infinite calculus (well, you can still stop it manually) and I need to get through that QLinkedList for each iteration.
The problem is not if I'm iterating to much over this list.
It's that I'm profiling my code and I see that 1/4 of the time is spent on QLinkedList::end() and QLinkedList::begin() functions.
Sample code
My code is the following :
typedef QLinkedList<Particle*> ParticlesList; // Particle is a custom class
ParticlesList* parts = // assign a QLinkedList
for (ParticlesList::const_iterator itp = parts->begin(); itp != parts->end(); ++itp)
//make some calculus
Like I said, this code is called so often that it spends a lot of time on parts->begin() and parts->end().
So, the question is how can I reduce the time spent on the iteration of this list ?
Possible solutions
Here are some solutions I've thought of, please help me choose the best or propose me another one :)
Use of classic C array : // sorry for this mistake
Particle** parts = // assing it something
for (int n = 0; n < LENGTH; n++)
//access by index
//make some calculus
This should be quick right ?
Maybe use Java style iterator ?
Maybe use another container ?
Asm ? Just kidding... or maybe ?
Thank you for your future answers !
PS : I have read stackoverflow posts about when to profile so don't worry about that ;)
Edit :
The list is modified
I'm sorry I think I forgot the most important, I'll write the whole function without stripping :
typedef std::vector<Cell*> Neighbours;
typedef QLinkedList<Particle*> ParticlesList;
Neighbours neighbours = m_cell->getNeighbourhood();
Neighbours::const_iterator it;
for (it = neighbours.begin(); it != neighbours.end(); ++it)
ParticlesList* parts = (*it)->getParticles();
for (ParticlesList::const_iterator itp = parts->begin(); itp != parts->end(); ++itp)
double d = distanceTo(*itp); // computes sqrt(x^2 + y^2)
if(d>=0 && d<=m_maxForceRange)
particleIsClose(d, *itp); // just changes
And just to make sure I'm complete, this whole code is called in a loop ^^.
So yes the list is modified and it is in a inner loop. So there's no way to precompute the beginning and end of it.
And moreover, the list needs to be constructed at each big iteration (I mean in the topmost loop) by inserting one by one.
Debug mode
Yes indeed I profiled in Debug mode. And I think the remark was judicious because the code went 2x faster in Release. And the problem with lists disappeared.
Thanks to all for your answers and sorry for this ^^

If you are profiling in debug mode, a lot of compilers disable inlineing. The begin() and end() times being high may not be "real". The method call times would be much higher than the equivalent inline operations.
Something else I noticed in the full code, you're doing a sqrt in the inner loop. They can be fairly expensive depending on the hardware architecture. I would consider replacing the following code:
double d = distanceTo(*itp); // computes sqrt(x^2 + y^2)
if(d >= 0 && d <= m_maxForceRange)
double d = distanceToSquared(*itp); // computes x^2 + y^2
if(d >= 0 && d <= m_maxForceRangeSquared)
I've done this in code where I was doing collison detection and it sometimes makes a noticible improvement. The tests are equivalent and saves a lot of calls to sqrt. As always with optimization, measure to verify if it improves the speed.

Pre-computing the end iterator will help if your compiler isn't smart enough to realise it is const, and is hence computing it each time through the loop. You can do that like below:
const ParticlesList::const_iterator itp_end = parts->end();
for (ParticlesList::const_iterator itp = parts->begin(); itp != itp_end; ++itp)
//make some calculus
I can't understand why parts->begin(); is taking so long, it should only be used once. However, if this loop is inside another loop, you could do something like this:
const ParticlesList::const_iterator itp_begin = parts->begin();
const ParticlesList::const_iterator itp_end = parts->end();
for (...)
for (ParticlesList::const_iterator itp = itp_begin; itp != itp_end; ++itp)
//make some calculus
But I can't imagine this will make too much difference (unless your inner list is really short), but it shouldn't hurt much either.
On a further note, a linked list possibly isn't the fastest data structure for your purposes. Linked lists are most useful when you frequently need to insert items into the middle of the list. If the list is built and then fixed, you're probably better off with a std::vector. A std::vector may also be better even if you occasionally only need to add/remove items from the end (not the beginning or middle). If you have to add/remove from the beginning/end (but not middle) consider a std::deque.

If you absolutely need raw speed you should measure each possible choice you encounter, and keep the fastest.
Sounds like the list remains unchanged while you iterate over it. I'd try by storing the end of the list on a local variable.
typedef QLinkedList<Particle*> ParticlesList; // Particle is a custom class
ParticlesList* parts = // assign a QLinkedList
ParticlesList::const_iterator end = parts->end();
for (ParticlesList::const_iterator itp = parts->begin(); itp != end; ++itp)
// make some calculus

Qt containers are compatible with STL algorithms like std::for_each.
Try something like this:
std::for_each( parts->begin(), parts->end(), MyParticleCalculus );
where MyParticleCalculus is a functor that contains your calculus.
Qt also has its own foreach, but it's apparently just a macro to hide the iterators, so it probably won't give you any performance benefit.
(Edit: I'm recommending std::for_each per Scott Meyer's recommendation in "Effective STL": "Prefer algorithm calls to hand-written loops.")


Why is inserting into a set<vector<string>> so slow?

For a class project we are making a simple compiler / Relational Database. Mine produces the correct answers, but too slowly on large queries. I ran visual studio's performance analysis and my program is spending 80% of it's time inserting my tuples (rows in a table) into a set. The function is part of computing a cross product, so the result has lots and lots of rows, but I need suggestions on a faster way to insert my tuples into the set.
for (set<vector<string>>::iterator it = tuples.begin(); it != tuples.end(); ++it)
for (set<vector<string>>::iterator it2 = tuples2.begin(); it2 != tuples2.end(); ++it2)
vector<string> f(*it);
f.insert(f.end(), it2->begin(), it2->end());
newTuples.insert(f); //This is the line that takes all the processing time
You are copying big vector by value for no reason. You should move: newTuples.insert(std::move(f));
A set might be the wrong container. A set is ordered, and keeps only unique elements. There might be many string comparisons happening when you insert a new vector.
Use a list or a vector instead (if you can).
...and avoid needless copying, as SergeyA already pointed out in his answer
We might as well go C++11 (totally untested code)
for (const auto& it : tuples) {
for (const auto& it2 : tuples2) {
auto where = newTuples.emplace(it); // returns where its placed
auto& vect = where.first; // makes the next more readable
vect.insert(vect.end(), it2.begin(), it2.end());
Note on collisions some strings disappears from the result, is that really what you want?
Your using the vector as key, will that ever be a collision? add
if (!where.second) {
; // collision
to check.
This should remove all double work of moving (if the compiler doesn't optimize it away anyway).

Couple performance questions (one bigger vector vs smaller chunks vectors) and Is it worth to store iteration index for jump access of vector?

I am a bit curiuous about vector optimization and have couple questions about it. (I am still a beginner in programing)
struct GameInfo{
EnumType InfoType;
// Other info...
int _lastPosition;
// _gameInfoV is sorted beforehand
std::vector<GameInfo> _gameInfoV;
// The tick function is called every game frame (in "perfect" condition it's every 1.0/60 second)
void BaseClass::tick()
for (unsigned int i = _lastPosition; i < _gameInfoV.size(); i++{
auto & info = _gameInfoV[i];
if( !info.bhasbeenAdded ){
if( DoWeNeedNow() ){
info.bhasbeenAdded = true;
// Do something more with "info"....
else return; //Break the cycle since we don't need now other "info"
The _gameInfoV vector size can be between 2000 and 5000.
My main 2 questions are:
Is it better to leave the way how it is or it's better to make smaller chunks of it, which is checked for every different GameInfo.InfoType
Is it worth the hassle of storing the last start position index of the vector instead of iterating from the beginning.
Note that if using smaller vectors there will be like 3 to 6 of them
The third thing is probably that I am not using vector iterators, but is it safe to use then like this?
std::vector<GameInfo>::iterator it = _gameInfoV.begin() + _lastPosition;
for (it = _gameInfoV.begin(); it != _gameInfoV.end(); ++it){
//Do something
Note: It will be used in smartphones, so every optimization will be appreciated, when targeting weaker phones.
-Thank you
Don't; except if you frequently move memory around
It is no hassle if you do it correctly:
std::vector<GameInfo>::const_iterator _lastPosition(gameInfoV.begin());
// ...
for (std::vector<GameInfo>::iterator info=_lastPosition; it!=_gameInfoV.end(); ++info)
if (!info->bhasbeenAdded)
if (DoWeNeedNow())
// Do something more with "info"....
else return; //Break the cycle since we don't need now other "i
Breaking one vector up into several smaller vectors in general doesn't improve performance. It could even slightly degrade performance because the compiler has to manage more variables, which take up more CPU registers etc.
I don't know about gaming so I don't understand the implication of GameInfo.InfoType. Your processing time and CPU resource requirements are going to increase if you do more total iterations through loops (where each loop iteration performs the same type of operation). So if separating the vectors causes you to avoid some loop iterations because you can skip entire vectors, that's going to increase performance of your app.
iterators are the most secure way to iterate through containers. But for a vector I often just use the index operator [] and my own indexer (a plain old unsigned integer).

Iterating over a vector in C++ [duplicate]

Take the following two lines of code:
for (int i = 0; i < some_vector.size(); i++)
//do stuff
And this:
for (some_iterator = some_vector.begin(); some_iterator != some_vector.end();
//do stuff
I'm told that the second way is preferred. Why exactly is this?
The first form is efficient only if vector.size() is a fast operation. This is true for vectors, but not for lists, for example. Also, what are you planning to do within the body of the loop? If you plan on accessing the elements as in
T elem = some_vector[i];
then you're making the assumption that the container has operator[](std::size_t) defined. Again, this is true for vector but not for other containers.
The use of iterators bring you closer to container independence. You're not making assumptions about random-access ability or fast size() operation, only that the container has iterator capabilities.
You could enhance your code further by using standard algorithms. Depending on what it is you're trying to achieve, you may elect to use std::for_each(), std::transform() and so on. By using a standard algorithm rather than an explicit loop you're avoiding re-inventing the wheel. Your code is likely to be more efficient (given the right algorithm is chosen), correct and reusable.
It's part of the modern C++ indoctrination process. Iterators are the only way to iterate most containers, so you use it even with vectors just to get yourself into the proper mindset. Seriously, that's the only reason I do it - I don't think I've ever replaced a vector with a different kind of container.
Wow, this is still getting downvoted after three weeks. I guess it doesn't pay to be a little tongue-in-cheek.
I think the array index is more readable. It matches the syntax used in other languages, and the syntax used for old-fashioned C arrays. It's also less verbose. Efficiency should be a wash if your compiler is any good, and there are hardly any cases where it matters anyway.
Even so, I still find myself using iterators frequently with vectors. I believe the iterator is an important concept, so I promote it whenever I can.
because you are not tying your code to the particular implementation of the some_vector list. if you use array indices, it has to be some form of array; if you use iterators you can use that code on any list implementation.
Imagine some_vector is implemented with a linked-list. Then requesting an item in the i-th place requires i operations to be done to traverse the list of nodes. Now, if you use iterator, generally speaking, it will make its best effort to be as efficient as possible (in the case of a linked list, it will maintain a pointer to the current node and advance it in each iteration, requiring just a single operation).
So it provides two things:
Abstraction of use: you just want to iterate some elements, you don't care about how to do it
I'm going to be the devils advocate here, and not recommend iterators. The main reason why, is all the source code I've worked on from Desktop application development to game development have i nor have i needed to use iterators. All the time they have not been required and secondly the hidden assumptions and code mess and debugging nightmares you get with iterators make them a prime example not to use it in any applications that require speed.
Even from a maintence stand point they're a mess. Its not because of them but because of all the aliasing that happen behind the scene. How do i know that you haven't implemented your own virtual vector or array list that does something completely different to the standards. Do i know what type is currently now during runtime? Did you overload a operator I didn't have time to check all your source code. Hell do i even know what version of the STL your using?
The next problem you got with iterators is leaky abstraction, though there are numerous web sites that discuss this in detail with them.
Sorry, I have not and still have not seen any point in iterators. If they abstract the list or vector away from you, when in fact you should know already what vector or list your dealing with if you don't then your just going to be setting yourself up for some great debugging sessions in the future.
You might want to use an iterator if you are going to add/remove items to the vector while you are iterating over it.
some_iterator = some_vector.begin();
while (some_iterator != some_vector.end())
if (/* some condition */)
some_iterator = some_vector.erase(some_iterator);
// some_iterator now positioned at the element after the deleted element
if (/* some other condition */)
some_iterator = some_vector.insert(some_iterator, some_new_value);
// some_iterator now positioned at new element
If you were using indices you would have to shuffle items up/down in the array to handle the insertions and deletions.
Separation of Concerns
It's very nice to separate the iteration code from the 'core' concern of the loop. It's almost a design decision.
Indeed, iterating by index ties you to the implementation of the container. Asking the container for a begin and end iterator, enables the loop code for use with other container types.
Also, in the std::for_each way, you TELL the collection what to do, instead of ASKing it something about its internals
The 0x standard is going to introduce closures, which will make this approach much more easy to use - have a look at the expressive power of e.g. Ruby's [1..6].each { |i| print i; }...
But maybe a much overseen issue is that, using the for_each approach yields an opportunity to have the iteration parallelized - the intel threading blocks can distribute the code block over the number of processors in the system!
Note: after discovering the algorithms library, and especially foreach, I went through two or three months of writing ridiculously small 'helper' operator structs which will drive your fellow developers crazy. After this time, I went back to a pragmatic approach - small loop bodies deserve no foreach no more :)
A must read reference on iterators is the book "Extended STL".
The GoF have a tiny little paragraph in the end of the Iterator pattern, which talks about this brand of iteration; it's called an 'internal iterator'. Have a look here, too.
Because it is more object-oriented. if you are iterating with an index you are assuming:
a) that those objects are ordered
b) that those objects can be obtained by an index
c) that the index increment will hit every item
d) that that index starts at zero
With an iterator, you are saying "give me everything so I can work with it" without knowing what the underlying implementation is. (In Java, there are collections that cannot be accessed through an index)
Also, with an iterator, no need to worry about going out of bounds of the array.
Another nice thing about iterators is that they better allow you to express (and enforce) your const-preference. This example ensures that you will not be altering the vector in the midst of your loop:
for(std::vector<Foo>::const_iterator pos=foos.begin(); pos != foos.end(); ++pos)
// Foo & foo = *pos; // this won't compile
const Foo & foo = *pos; // this will compile
Aside from all of the other excellent answers... int may not be large enough for your vector. Instead, if you want to use indexing, use the size_type for your container:
for (std::vector<Foo>::size_type i = 0; i < myvector.size(); ++i)
Foo& this_foo = myvector[i];
// Do stuff with this_foo
I probably should point out you can also call
std::for_each(some_vector.begin(), some_vector.end(), &do_stuff);
STL iterators are mostly there so that the STL algorithms like sort can be container independent.
If you just want to loop over all the entries in a vector just use the index loop style.
It is less typing and easier to parse for most humans. It would be nice if C++ had a simple foreach loop without going overboard with template magic.
for( size_t i = 0; i < some_vector.size(); ++i )
T& rT = some_vector[i];
// now do something with rT
I don't think it makes much difference for a vector. I prefer to use an index myself as I consider it to be more readable and you can do random access like jumping forward 6 items or jumping backwards if needs be.
I also like to make a reference to the item inside the loop like this so there are not a lot of square brackets around the place:
for(size_t i = 0; i < myvector.size(); i++)
MyClass &item = myvector[i];
// Do stuff to "item".
Using an iterator can be good if you think you might need to replace the vector with a list at some point in the future and it also looks more stylish to the STL freaks but I can't think of any other reason.
The second form represents what you're doing more accurately. In your example, you don't care about the value of i, really - all you want is the next element in the iterator.
After having learned a little more on the subject of this answer, I realize it was a bit of an oversimplification. The difference between this loop:
for (some_iterator = some_vector.begin(); some_iterator != some_vector.end();
//do stuff
And this loop:
for (int i = 0; i < some_vector.size(); i++)
//do stuff
Is fairly minimal. In fact, the syntax of doing loops this way seems to be growing on me:
while (it != end){
//do stuff
Iterators do unlock some fairly powerful declarative features, and when combined with the STL algorithms library you can do some pretty cool things that are outside the scope of array index administrivia.
Indexing requires an extra mul operation. For example, for vector<int> v, the compiler converts v[i] into &v + sizeof(int) * i.
During iteration you don't need to know number of item to be processed. You just need the item and iterators do such things very good.
No one mentioned yet that one advantage of indices is that they are not become invalid when you append to a contiguous container like std::vector, so you can add items to the container during iteration.
This is also possible with iterators, but you must call reserve(), and therefore need to know how many items you'll append.
If you have access to C++11 features, then you can also use a range-based for loop for iterating over your vector (or any other container) as follows:
for (auto &item : some_vector)
//do stuff
The benefit of this loop is that you can access elements of the vector directly via the item variable, without running the risk of messing up an index or making a making a mistake when dereferencing an iterator. In addition, the placeholder auto prevents you from having to repeat the type of the container elements,
which brings you even closer to a container-independent solution.
If you need the the element index in your loop and the operator[] exists for your container (and is fast enough for you), then better go for your first way.
A range-based for loop cannot be used to add/delete elements into/from a container. If you want to do that, then better stick to the solution given by Brian Matthews.
If you don't want to change the elements in your container, then you should use the keyword const as follows: for (auto const &item : some_vector) { ... }.
Several good points already. I have a few additional comments:
Assuming we are talking about the C++ standard library, "vector" implies a random access container that has the guarantees of C-array (random access, contiguos memory layout etc). If you had said 'some_container', many of the above answers would have been more accurate (container independence etc).
To eliminate any dependencies on compiler optimization, you could move some_vector.size() out of the loop in the indexed code, like so:
const size_t numElems = some_vector.size();
for (size_t i = 0; i
Always pre-increment iterators and treat post-increments as exceptional cases.
for (some_iterator = some_vector.begin(); some_iterator != some_vector.end(); ++some_iterator){ //do stuff }
So assuming and indexable std::vector<> like container, there is no good reason to prefer one over other, sequentially going through the container. If you have to refer to older or newer elemnent indexes frequently, then the indexed version is more appropropriate.
In general, using the iterators is preferred because algorithms make use of them and behavior can be controlled (and implicitly documented) by changing the type of the iterator. Array locations can be used in place of iterators, but the syntactical difference will stick out.
I don't use iterators for the same reason I dislike foreach-statements. When having multiple inner-loops it's hard enough to keep track of global/member variables without having to remember all the local values and iterator-names as well. What I find useful is to use two sets of indices for different occasions:
for(int i=0;i<anims.size();i++)
for(int j=0;j<bones.size();j++)
int animIndex = i;
int boneIndex = j;
// in relatively short code I use indices i and j
... animation_matrices[i][j] ...
// in long and complicated code I use indices animIndex and boneIndex
... animation_matrices[animIndex][boneIndex] ...
I don't even want to abbreviate things like "animation_matrices[i]" to some random "anim_matrix"-named-iterator for example, because then you can't see clearly from which array this value is originated.
If you like being close to the metal / don't trust their implementation details, don't use iterators.
If you regularly switch out one collection type for another during development, use iterators.
If you find it difficult to remember how to iterate different sorts of collections (maybe you have several types from several different external sources in use), use iterators to unify the means by which you walk over elements. This applies to say switching a linked list with an array list.
Really, that's all there is to it. It's not as if you're going to gain more brevity either way on average, and if brevity really is your goal, you can always fall back on macros.
Even better than "telling the CPU what to do" (imperative) is "telling the libraries what you want" (functional).
So instead of using loops you should learn the algorithms present in stl.
For container independence
I always use array index because many application of mine require something like "display thumbnail image". So I wrote something like this:
some_vector[0].top =0;<br>
for (int i = 1; i < some_vector.size(); i++)
some_vector[i].left = some_vector[i-1].width + some_vector[i-1].left;
if(i % 6 ==0)
some_vector[i].top = some_vector[i].top.height + some_vector[i].top;
some_vector[i].left = 0;
Both the implementations are correct, but I would prefer the 'for' loop. As we have decided to use a Vector and not any other container, using indexes would be the best option. Using iterators with Vectors would lose the very benefit of having the objects in continuous memory blocks which help ease in their access.
I felt that none of the answers here explain why I like iterators as a general concept over indexing into containers. Note that most of my experience using iterators doesn't actually come from C++ but from higher-level programming languages like Python.
The iterator interface imposes fewer requirements on consumers of your function, which allows consumers to do more with it.
If all you need is to be able to forward-iterate, the developer isn't limited to using indexable containers - they can use any class implementing operator++(T&), operator*(T) and operator!=(const &T, const &T).
#include <iostream>
template <class InputIterator>
void printAll(InputIterator& begin, InputIterator& end)
for (auto current = begin; current != end; ++current) {
std::cout << *current << "\n";
// elsewhere...
printAll(myVector.begin(), myVector.end());
Your algorithm works for the case you need it - iterating over a vector - but it can also be useful for applications you don't necessarily anticipate:
#include <random>
class RandomIterator
std::mt19937 random;
std::uint_fast32_t current;
std::uint_fast32_t floor;
std::uint_fast32_t ceil;
std::uint_fast32_t floor = 0,
std::uint_fast32_t ceil = UINT_FAST32_MAX,
std::uint_fast32_t seed = std::mt19937::default_seed
) :
RandomIterator& operator++()
current = floor + (random() % (ceil - floor));
std::uint_fast32_t operator*() const
return current;
bool operator!=(const RandomIterator &that) const
return current != that.current;
int main()
// roll a 1d6 until we get a 6 and print the results
RandomIterator firstRandom(1, 7, std::random_device()());
RandomIterator secondRandom(6, 7);
printAll(firstRandom, secondRandom);
return 0;
Attempting to implement a square-brackets operator which does something similar to this iterator would be contrived, while the iterator implementation is relatively simple. The square-brackets operator also makes implications about the capabilities of your class - that you can index to any arbitrary point - which may be difficult or inefficient to implement.
Iterators also lend themselves to decoration. People can write iterators which take an iterator in their constructor and extend its functionality:
template<class InputIterator, typename T>
class FilterIterator
InputIterator internalIterator;
FilterIterator(const InputIterator &iterator):
virtual bool condition(T) = 0;
FilterIterator<InputIterator, T>& operator++()
do {
} while (!condition(*internalIterator));
return *this;
T operator*()
// Needed for the first result
if (!condition(*internalIterator))
return *internalIterator;
virtual bool operator!=(const FilterIterator& that) const
return internalIterator != that.internalIterator;
template <class InputIterator>
class EvenIterator : public FilterIterator<InputIterator, std::uint_fast32_t>
EvenIterator(const InputIterator &internalIterator) :
FilterIterator<InputIterator, std::uint_fast32_t>(internalIterator)
bool condition(std::uint_fast32_t n)
return !(n % 2);
int main()
// Rolls a d20 until a 20 is rolled and discards odd rolls
EvenIterator<RandomIterator> firstRandom(RandomIterator(1, 21, std::random_device()()));
EvenIterator<RandomIterator> secondRandom(RandomIterator(20, 21));
printAll(firstRandom, secondRandom);
return 0;
While these toys might seem mundane, it's not difficult to imagine using iterators and iterator decorators to do powerful things with a simple interface - decorating a forward-only iterator of database results with an iterator which constructs a model object from a single result, for example. These patterns enable memory-efficient iteration of infinite sets and, with a filter like the one I wrote above, potentially lazy evaluation of results.
Part of the power of C++ templates is your iterator interface, when applied to the likes of fixed-length C arrays, decays to simple and efficient pointer arithmetic, making it a truly zero-cost abstraction.

The fastest way to count differing elements in two long vectors

I'm trying to compare 2 large vectors(integer) i.e. at each entry, see if two vectors have the same element or not. I've tried a few things, using an iterator to do the comparision and a simple for loop. Both works but I need something that will speed things up as I have to compare a lot of vectors. What's the best way to do that in C++?? Many thanks in advance!
typedef vector<int> fingerprint;
double aakernel(fingerprint a,fingerprint b, double h){
double diff = 0;
vector<int>::iterator dd = a.begin();
vector<int>::iterator ee = b.begin();
for(; dd != a.end() && ee != b.end() ;++dd, ++ee){ /*option one*/
if (*dd!=*ee){
for (int dd=0;dd<int(a.size());dd++){ /*option two*/
if (a[dd]!=b[dd]){
double due = (h/(1-h));
double q = -log(due)*diff;
double K = exp(q);
return (K);
If the vectors are otherwise arbitrary, you cannot get asymptotically better than sequentially comparing all elements, the way you do now. So you're left with micro-optimisations which may or may not improve performance (depending on how your compiler's optimiser handles them).
The only one I can think of is taking the non-changing evaluations out of the loop. (And perhaps also not using ++ on type double, but I believe the compiler will handle this optimally anyway):
double diff = 0;
for (
auto itA = a.begin(), itB = b.begin(), endA = a.end();
itA != endA;
++itA, ++itB
) {
if (*itA != *itB) {
diff += 1.0;
1) You could use speed this up by dividing it into pieces and using different threads for each.
2) You could also explore the parallel processing machine opcodes, such as MMX, to see if they're applicable.
3) Depending on your compiler, its optimiser, CPU etc. you may or may not find significant performance benefits just from eliminating the branching: instead of...
if (*dd != *ee){
...try perhaps...
diff += bool(*dd - *ee);
It might be worth checking the assembly language of the if () version first to see if the optimiser is already doing this. If bool(*dd - *ee) still has branches you could try a few other things, falling back on inline assembly if necessary.
4) assuming you'll end up comparing the same vector to many others, you could store checksums/hashes of ranges within the data, such that when the same vector is compared to different alternatives only the regions with differing hashes are considered: this could miss some differences - about 1 in 2^bits for a good hash - but if this is for fingerprints I assume it's probabilistic anyway and this will be insignificant.
5) if you're doing this for the NSA, I recommend recoding in VBA.
In case the two fingerprint values are usually the same, it may help if you first do a
memcmp(&a[0], &b[0], a.size() * sizeof(int))
To test whether there's any difference between the two arrays at all. Only if there's any difference you go and look how many differences there are.
you don't need to write it by yourself since stl have certain functions to do that, check this
You can check more algorithm here:
Thanks a lot for all the different solutions! Much appreciated. I used diff as a double because at the end of the calculation it needs to be put in a kernel function and coming from a Python background I thought it would be better to assign it double in the first place but I might be wrong here but thanks for the comment!
Also, to elaborate on the fingerprint (which I should have done in the first place, my apologies) or maybe bitstring is a better word for it, each bit contains 1 or 0 in my case and I need to compare at each index whether the two bitstring are the same or not. Many thanks again for the solutions I'll try and see which one would help speed things up! Thanks a lot guys!

