A D range that is RandomAccess and hasLength but not hasSlicing? - d

In the implementation of findSplit in Phobos, we have this special case:
static if (isSomeString!R1 && isSomeString!R2
|| (isRandomAccessRange!R1 && hasSlicing!R1 && hasLength!R1 && hasLength!R2))
{
auto balance = find!pred(haystack, needle);
immutable pos1 = haystack.length - balance.length;
immutable pos2 = balance.empty ? pos1 : pos1 + needle.length;
return Result!(typeof(haystack[0 .. pos1]),
typeof(haystack[pos2 .. haystack.length]))(haystack[0 .. pos1],
haystack[pos1 .. pos2],
haystack[pos2 .. haystack.length]);
}
Most of the constraint makes sense here. I understand that we need a range that is random access and that both the haystack and the needle need a size. But the hasSlicing check is surprising to me.
I would expect any range that is both RandomAccess and hasLength to be able to support Slicing. Is there an example range that inherently fundamentally cannot support Slicing despite being RandomAccess and hasLength?
Or is this more of an issue of user potentially providing a range that simply chose to not implement that particular operation for whatever reason?

I went directly to the source with this and asked Andrei Alexandrescu on Twitter. He responded:
I don't think there's an interesting case of a random access range without slicing (or vice versa). When we introduced ranges we wanted to be as general as possible, but that turned out to be overengineering.

Related

Is there a technical reason to use > (<) instead of != when incrementing by 1 in a 'for' loop?

I almost never see a for loop like this:
for (int i = 0; 5 != i; ++i)
{}
Is there a technical reason to use > or < instead of != when incrementing by 1 in a for loop? Or this is more of a convention?
while (time != 6:30pm) {
Work();
}
It is 6:31pm... Damn, now my next chance to go home is tomorrow! :)
This to show that the stronger restriction mitigates risks and is probably more intuitive to understand.
There is no technical reason. But there is mitigation of risk, maintainability and better understanding of code.
< or > are stronger restrictions than != and fulfill the exact same purpose in most cases (I'd even say in all practical cases).
There is duplicate question here; and one interesting answer.
Yes there is a reason. If you write a (plain old index based) for loop like this
for (int i = a; i < b; ++i){}
then it works as expected for any values of a and b (ie zero iterations when a > b instead of infinite if you had used i == b;).
On the other hand, for iterators you'd write
for (auto it = begin; it != end; ++it)
because any iterator should implement an operator!=, but not for every iterator it is possible to provide an operator<.
Also range-based for loops
for (auto e : v)
are not just fancy sugar, but they measurably reduce the chances to write wrong code.
You can have something like
for(int i = 0; i<5; ++i){
...
if(...) i++;
...
}
If your loop variable is written by the inner code, the i!=5 might not break that loop. This is safer to check for inequality.
Edit about readability.
The inequality form is way more frequently used. Therefore, this is very fast to read as there is nothing special to understand (brain load is reduced because the task is common). So it's cool for the readers to make use of these habits.
And last but not least, this is called defensive programming, meaning to always take the strongest case to avoid current and future errors influencing the program.
The only case where defensive programming is not needed is where states have been proven by pre- and post-conditions (but then, proving this is the most defensive of all programming).
I would argue that an expression like
for ( int i = 0 ; i < 100 ; ++i )
{
...
}
is more expressive of intent than is
for ( int i = 0 ; i != 100 ; ++i )
{
...
}
The former clearly calls out that the condition is a test for an exclusive upper bound on a range; the latter is a binary test of an exit condition. And if the body of the loop is non-trivial, it may not apparent that the index is only modified in the for statement itself.
Iterators are an important case when you most often use the != notation:
for(auto it = vector.begin(); it != vector.end(); ++it) {
// do stuff
}
Granted: in practice I would write the same relying on a range-for:
for(auto & item : vector) {
// do stuff
}
but the point remains: one normally compares iterators using == or !=.
The loop condition is an enforced loop invariant.
Suppose you don't look at the body of the loop:
for (int i = 0; i != 5; ++i)
{
// ?
}
in this case, you know at the start of the loop iteration that i does not equal 5.
for (int i = 0; i < 5; ++i)
{
// ?
}
in this case, you know at the start of the loop iteration that i is less than 5.
The second is much, much more information than the first, no? Now, the programmer intent is (almost certainly) the same, but if you are looking for bugs, having confidence from reading a line of code is a good thing. And the second enforces that invariant, which means some bugs that would bite you in the first case just cannot happen (or don't cause memory corruption, say) in the second case.
You know more about the state of the program, from reading less code, with < than with !=. And on modern CPUs, they take the same amount of time as no difference.
If your i was not manipulated in the loop body, and it was always increased by 1, and it started less than 5, there would be no difference. But in order to know if it was manipulated, you'd have to confirm each of these facts.
Some of these facts are relatively easy, but you can get wrong. Checking the entire body of the loop is, however, a pain.
In C++ you can write an indexes type such that:
for( const int i : indexes(0, 5) )
{
// ?
}
does the same thing as either of the two above for loops, even down to the compiler optimizing it down to the same code. Here, however, you know that i cannot be manipulated in the body of the loop, as it is declared const, without the code corrupting memory.
The more information you can get out of a line of code without having to understand the context, the easier it is to track down what is going wrong. < in the case of integer loops gives you more information about the state of the code at that line than != does.
As already said by Ian Newson, you can't reliably loop over a floating variable and exit with !=. For instance,
for (double x=0; x!=1; x+=0.1) {}
will actually loop forever, because 0.1 can't exactly be represented in floating point, hence the counter narrowly misses 1. With < it terminates.
(Note however that it's basically undefined behaviour whether you get 0.9999... as the last accepted number – which kind of violates the less-than assumption – or already exit at 1.0000000000000001.)
Yes; OpenMP doesn't parallelize loops with the != condition.
It may happen that the variable i is set to some large value and if you just use the != operator you will end up in an endless loop.
As you can see from the other numerous answers, there are reasons to use < instead of != which will help in edge cases, initial conditions, unintended loop counter modification, etc...
Honestly though, I don't think you can stress the importance of convention enough. For this example it will be easy enough for other programmers to see what you are trying to do, but it will cause a double-take. One of the jobs while programming is making it as readable and familiar to everyone as possible, so inevitably when someone has to update/change your code, it doesn't take a lot of effort to figure out what you were doing in different code blocks. If I saw someone use !=, I'd assume there was a reason they used it instead of < and if it was a large loop I'd look through the whole thing trying to figure out what you did that made that necessary... and that's wasted time.
I take the adjectival "technical" to mean language behavior/quirks and compiler side effects such as performance of generated code.
To this end, the answer is: no(*). The (*) is "please consult your processor manual". If you are working with some edge-case RISC or FPGA system, you may need to check what instructions are generated and what they cost. But if you're using pretty much any conventional modern architecture, then there is no significant processor level difference in cost between lt, eq, ne and gt.
If you are using an edge case you could find that != requires three operations (cmp, not, beq) vs two (cmp, blt xtr myo). Again, RTM in that case.
For the most part, the reasons are defensive/hardening, especially when working with pointers or complex loops. Consider
// highly contrived example
size_t count_chars(char c, const char* str, size_t len) {
size_t count = 0;
bool quoted = false;
const char* p = str;
while (p != str + len) {
if (*p == '"') {
quote = !quote;
++p;
}
if (*(p++) == c && !quoted)
++count;
}
return count;
}
A less contrived example would be where you are using return values to perform increments, accepting data from a user:
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i != len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
i += step; // here for emphasis, it could go in the for(;;)
}
}
Try this and input the values 1, 2, 10, 999.
You could prevent this:
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i != len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
if (step + i > len)
std::cout << "too much.\n";
else
i += step;
}
}
But what you probably wanted was
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i < len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
i += step;
}
}
There is also something of a convention bias towards <, because ordering in standard containers often relies on operator<, for instance hashing in several STL containers determines equality by saying
if (lhs < rhs) // T.operator <
lessthan
else if (rhs < lhs) // T.operator < again
greaterthan
else
equal
If lhs and rhs are a user defined class writing this code as
if (lhs < rhs) // requires T.operator<
lessthan
else if (lhs > rhs) // requires T.operator>
greaterthan
else
equal
The implementor has to provide two comparison functions. So < has become the favored operator.
There are several ways to write any kind of code (usually), there just happens to be two ways in this case (three if you count <= and >=).
In this case, people prefer > and < to make sure that even if something unexpected happens in the loop (like a bug), it won't loop infinitely (BAD). Consider the following code, for example.
for (int i = 1; i != 3; i++) {
//More Code
i = 5; //OOPS! MISTAKE!
//More Code
}
If we used (i < 3), we would be safe from an infinite loop because it placed a bigger restriction.
Its really your choice whether you want a mistake in your program to shut the whole thing down or keep functioning with the bug there.
Hope this helped!
The most common reason to use < is convention. More programmers think of loops like this as "while the index is in range" rather than "until the index reaches the end." There's value is sticking to convention when you can.
On the other hand, many answers here are claiming that using the < form helps avoid bugs. I'd argue that in many cases this just helps hide bugs. If the loop index is supposed to reach the end value, and, instead, it actually goes beyond it, then there's something happening you didn't expect which may cause a malfunction (or be a side effect of another bug). The < will likely delay discovery of the bug. The != is more likely to lead to a stall, hang, or even a crash, which will help you spot the bug sooner. The sooner a bug is found, the cheaper it is to fix.
Note that this convention is peculiar to array and vector indexing. When traversing nearly any other type of data structure, you'd use an iterator (or pointer) and check directly for an end value. In those cases you have to be sure the iterator will reach and not overshoot the actual end value.
For example, if you're stepping through a plain C string, it's generally more common to write:
for (char *p = foo; *p != '\0'; ++p) {
// do something with *p
}
than
int length = strlen(foo);
for (int i = 0; i < length; ++i) {
// do something with foo[i]
}
For one thing, if the string is very long, the second form will be slower because the strlen is another pass through the string.
With a C++ std::string, you'd use a range-based for loop, a standard algorithm, or iterators, even if though the length is readily available. If you're using iterators, the convention is to use != rather than <, as in:
for (auto it = foo.begin(); it != foo.end(); ++it) { ... }
Similarly, iterating a tree or a list or a deque usually involves watching for a null pointer or other sentinel rather than checking if an index remains within a range.
One reason not to use this construct is floating point numbers. != is a very dangerous comparison to use with floats as it'll rarely evaluate to true even if the numbers look the same. < or > removes this risk.
There are two related reasons for following this practice that both have to do with the fact that a programming language is, after all, a language that will be read by humans (among others).
(1) A bit of redundancy. In natural language we usually provide more information than is strictly necessary, much like an error correcting code. Here the extra information is that the loop variable i (see how I used redundancy here? If you didn't know what 'loop variable' means, or if you forgot the name of the variable, after reading "loop variable i" you have the full information) is less than 5 during the loop, not just different from 5. Redundancy enhances readability.
(2) Convention. Languages have specific standard ways of expressing certain situations. If you don't follow the established way of saying something, you will still be understood, but the effort for the recipient of your message is greater because certain optimisations won't work. Example:
Don't talk around the hot mash. Just illuminate the difficulty!
The first sentence is a literal translation of a German idiom. The second is a common English idiom with the main words replaced by synonyms. The result is comprehensible but takes a lot longer to understand than this:
Don't beat around the bush. Just explain the problem!
This is true even in case the synonyms used in the first version happen to fit the situation better than the conventional words in the English idiom. Similar forces are in effect when programmers read code. This is also why 5 != i and 5 > i are weird ways of putting it unless you are working in an environment in which it is standard to swap the more normal i != 5 and i < 5 in this way. Such dialect communities do exist, probably because consistency makes it easier to remember to write 5 == i instead of the natural but error prone i == 5.
Using relational comparisons in such cases is more of a popular habit than anything else. It gained its popularity back in the times when such conceptual considerations as iterator categories and their comparability were not considered high priority.
I'd say that one should prefer to use equality comparisons instead of relational comparisons whenever possible, since equality comparisons impose less requirements on the values being compared. Being EqualityComparable is a lesser requirement than being LessThanComparable.
Another example that demonstrates the wider applicability of equality comparison in such contexts is the popular conundrum with implementing unsigned iteration down to 0. It can be done as
for (unsigned i = 42; i != -1; --i)
...
Note that the above is equally applicable to both signed and unsigned iteration, while the relational version breaks down with unsigned types.
Besides the examples, where the loop variable will (unintentional) change inside the body, there are other reasions to use the smaller-than or greater-than operators:
Negations make code harder to understand
< or > is only one char, but != two
In addition to the various people who have mentioned that it mitigates risk, it also reduces the number of function overloads necessary to interact with various standard library components. As an example, if you want your type to be storable in a std::set, or used as a key for std::map, or used with some of the searching and sorting algorithms, the standard library usually uses std::less to compare objects as most algorithms only need a strict weak ordering. Thus it becomes a good habit to use the < comparisons instead of != comparisons (where it makes sense, of course).
There is no problem from a syntax perspective, but the logic behind that expression 5!=i is not sound.
In my opinion, using != to set the bounds of a for loop is not logically sound because a for loop either increments or decrements the iteration index, so setting the loop to iterate until the iteration index becomes out of bounds (!= to something) is not a proper implementation.
It will work, but it is prone to misbehavior since the boundary data handling is lost when using != for an incremental problem (meaning that you know from the start if it increments or decrements), that's why instead of != the <>>==> are used.

how to make more expressive python iterators? just like c++ iterator

Firstly, I review the c++ style iterators quickly.for example:
//--- Iterating over vector with iterator.
vector<int> v;
. . .
for (vector<int>::iterator it = v.begin(); it!=v.end(); ++it) {
cout << *it << endl;
}
It is flexible. It is easily to change underlying container types. For example, you might decide later that the number of insertions and deletions is so high that a list would be more efficient than a vector. It also has many useful member functions. Many of the member functions for vector use iterators, for example, assign, insert, or erase. Moreover, we can use iterator (if supported) bidirectionaly, such as ++, --. This is useful to parse a stream like objects.
The problems of python is:
1:Currently, python for loop syntax is less flexible than c++ for. (well , safer)
2:rather than "it != iter.end()" style, python will throw exception when next() has no more. It is not flexible.
Question 1: Is my idea above correct?
OK. Here comes my question, how to implement a more powerful python iterator as powerful as c++ iterators? Currently, python for loop syntax is less flexible than c++ for. I also find some possible solutions, such as http://www.velocityreviews.com/forums/t684406-pushback-iterator.html. but it asks user to push_back a stuff rather than ask iterator --.
Question 2: What is the best to implement a Bidirectional Iterator in python? Just like http://www.cplusplus.com/reference/std/iterator/BidirectionalIterator/.
The pseudo-code is the following:
it = v.begin();
while( it!=v.end()) {
//do sth here
if (condition1)
++it;//suppose this iterator supports ++
if(condition2)
--it;//suppose this iterator supports --
}
The key features are: 1) bidirectional , 2) simpler "end" checking. The "++" or "--" operators or common functions do not matter (it has no semantic difference anyway).
Thanks,
Update:
I got some possible solutions from the answers:
i = 0
while i < len(sequence): # or i < len and some_other_condition
star_it = sequence[i]
if condition_one(star_it):
i += 1
if condition_two(star_it):
i = max(i - 1, 0)
However, unlike array, random access of list should be O(n). I suppose the "list" object in python internally is implemented using linked-list like stuff. Thus, this while loop solution is not efficient. However, in c++, we have "random iterator", "bidirectional iterator". How should I get a better solution? Thanks.
For the majority of situations, Python's for and iterators are the simplest thing around. That is their goal and they shouldn't compromise it for flexibility -- their lack of flexibility isn't a problem.
For a few situations where you couldn't use a for loop, C++ iterators might be simpler. But there is always a way to do it in Python that isn't much more complex than using a C++ iterator.
If you need to separate advancing the iterator from looping, just use a while loop:
it = iter(obj)
try:
while True: # or some secondary break condition other than StopIteration
star_it = next(it)
if condition_one(star_it):
star_it = next(it)
except StopIteration:
pass # exhausted the iterator
I can think of only two situations where --it makes sense in Python.
The first is you're iterating over a sequence. In that case, if you need to go backwards, don't use an iterator at all -- just use a counter with a while loop:
i = 0
while i < len(sequence): # or i < len and some_other_condition
star_it = sequence[i]
if condition_one(star_it):
i += 1
if condition_two(star_it):
i = max(i - 1, 0)
The second is if you're iterating over a doubly linked list. In that case, again, don't use an iterator -- just traverse the nodes normally:
current = node
while current: # or any break condition
if condition_one(current):
current = current.next
if condition_two(star_it):
current = current.prev
A situation where you might think it makes sense, but you can't use either of the above methods, is with an unordered collection like a set or dict. However, --it doesn't make sense in that case. As the collection is unordered, semantically, any of the items previously reached would be appropriate -- not just the actual previous item.
So, in order to know the right object to go back to, you need memory, either by iterating over a sequence like mydict.values() or tuple(myset) and using a counter, or by assembling a sequence of previous values as you go and using a while loop and next as above instead of a for loop.
Solutions for a few situations you mentioned:
You want to replace objects in the underlying container. For dictionaries, iterate over the keys or items, not only the values:
for key, value in my_dict.iteritems():
if conditiion(value):
my_dict[key] = new_value
For lists use enumerate():
for index, item in enumerate(my_list):
if condition(item):
my_list[index] = new_item
You want an iterator with one "look-ahead" value. You probably would use something tailored to a specific situation, but here's a recipe for general situations:
def iter_with look_ahead(iterable, sentinel=None):
iterable, it_ahead = itertools.tee(iterable)
next(it_ahead, None)
return izip_longest(iterable, it_ahead, fillvalue=sentinel)
for current, look_ahead in iter_with look_ahead(tokens):
# whatever
You want to iterate in reverse. Use reversed() for containers that support it.
You want random access. Just turn your iterable into a list and use indices:
my_list = list(my_iterable)
Actually, C++ iterator system is not so great. Iterators are akin to pointers, and they have their woes:
singular values: v.end() cannot be dereferenced safely
inversion issues: std::for_each(end, begin, func);
mismatch issues: std::for_each(v0.begin(), v2.end(), func);
Python approach is much better in this regard (though the use of exception can be quite surprising at first, it really helps defining nested iterators), because contrary to its name, a Python iterator is more akin to a Range.
The concept of Range is so much better than C++11 introduces the range-for loop construct:
for (Object& o: range) {
}
Anything that is possible with an iterator is also possible with a range, though it may take some times to realize it and some translations seem surrealists at first for those of us who were educated with C++ pointer-like iterators. For example, subranges can perfectly be expressed:
for (Object& o: slice(range, 2, 9)) {
}
where slice would take all elements in position [2, 9) within range.
So, instead of fighting your language (Python) you should delve further into it and embrace its style. Fighting against a language is generally a losing battle, learn its idioms, become efficient.
You could implement a similar way of C++ using python objects:
class Iterable(object):
class Iterator(object):
def __init__(self, father, pos=0):
self.father = father
self.pos = pos
def __getitem__(self, pos=0):
return self.father[self.pos + pos]
def __setitem__(self, pos, value):
self.father[self.pos + pos] = value
def __iadd__(self, increment):
self.pos += increment
return self
def __isub__(self, decrement):
self.pos -= decrement
return self
def __ne__(self, other):
return self.father != other.father or self.pos != other.pos
def __eq__(self, other):
return not (self != other)
def begin(self):
return self.Iterator(self)
def end(self):
return self.Iterator(self, len(self))
class Vector(list, Iterable):
pass
v = Vector([54, 43, 32, 21])
counter = 0
it = v.begin()
print it, it[0]
while it != v.end():
counter += 1
print it[0]
if counter == 2:
it += 1; # suppose this iterator supports ++
if counter == 1:
it -= 1; # suppose this iterator supports --
it += 1
This replaces *it by it[0] (also analog to C++) and it++ by it += 1, but in effect it stays pretty much the same.
You leave the Pythonic ways if you do this, though ;-)
Note that the list object in Python is an array, so the efficiency concern mentioned in the question is actually a non-issue.

how can I get the fastest iteration possible for some calculus intensive code?

Context
I'm using a QLinkedList to store some class I wrote.
The fact is I must iterate a lot over this list.
By a lot I mean the program I write makes infinite calculus (well, you can still stop it manually) and I need to get through that QLinkedList for each iteration.
Problem
The problem is not if I'm iterating to much over this list.
It's that I'm profiling my code and I see that 1/4 of the time is spent on QLinkedList::end() and QLinkedList::begin() functions.
Sample code
My code is the following :
typedef QLinkedList<Particle*> ParticlesList; // Particle is a custom class
ParticlesList* parts = // assign a QLinkedList
for (ParticlesList::const_iterator itp = parts->begin(); itp != parts->end(); ++itp)
{
//make some calculus
}
Like I said, this code is called so often that it spends a lot of time on parts->begin() and parts->end().
Question
So, the question is how can I reduce the time spent on the iteration of this list ?
Possible solutions
Here are some solutions I've thought of, please help me choose the best or propose me another one :)
Use of classic C array : // sorry for this mistake
Particle** parts = // assing it something
for (int n = 0; n < LENGTH; n++)
{
//access by index
//make some calculus
}
This should be quick right ?
Maybe use Java style iterator ?
Maybe use another container ?
Asm ? Just kidding... or maybe ?
Thank you for your future answers !
PS : I have read stackoverflow posts about when to profile so don't worry about that ;)
Edit :
The list is modified
I'm sorry I think I forgot the most important, I'll write the whole function without stripping :
typedef std::vector<Cell*> Neighbours;
typedef QLinkedList<Particle*> ParticlesList;
Neighbours neighbours = m_cell->getNeighbourhood();
Neighbours::const_iterator it;
for (it = neighbours.begin(); it != neighbours.end(); ++it)
{
ParticlesList* parts = (*it)->getParticles();
for (ParticlesList::const_iterator itp = parts->begin(); itp != parts->end(); ++itp)
{
double d = distanceTo(*itp); // computes sqrt(x^2 + y^2)
if(d>=0 && d<=m_maxForceRange)
{
particleIsClose(d, *itp); // just changes
}
}
}
And just to make sure I'm complete, this whole code is called in a loop ^^.
So yes the list is modified and it is in a inner loop. So there's no way to precompute the beginning and end of it.
And moreover, the list needs to be constructed at each big iteration (I mean in the topmost loop) by inserting one by one.
Debug mode
Yes indeed I profiled in Debug mode. And I think the remark was judicious because the code went 2x faster in Release. And the problem with lists disappeared.
Thanks to all for your answers and sorry for this ^^
If you are profiling in debug mode, a lot of compilers disable inlineing. The begin() and end() times being high may not be "real". The method call times would be much higher than the equivalent inline operations.
Something else I noticed in the full code, you're doing a sqrt in the inner loop. They can be fairly expensive depending on the hardware architecture. I would consider replacing the following code:
double d = distanceTo(*itp); // computes sqrt(x^2 + y^2)
if(d >= 0 && d <= m_maxForceRange)
with:
double d = distanceToSquared(*itp); // computes x^2 + y^2
if(d >= 0 && d <= m_maxForceRangeSquared)
I've done this in code where I was doing collison detection and it sometimes makes a noticible improvement. The tests are equivalent and saves a lot of calls to sqrt. As always with optimization, measure to verify if it improves the speed.
Pre-computing the end iterator will help if your compiler isn't smart enough to realise it is const, and is hence computing it each time through the loop. You can do that like below:
const ParticlesList::const_iterator itp_end = parts->end();
for (ParticlesList::const_iterator itp = parts->begin(); itp != itp_end; ++itp)
{
//make some calculus
}
I can't understand why parts->begin(); is taking so long, it should only be used once. However, if this loop is inside another loop, you could do something like this:
const ParticlesList::const_iterator itp_begin = parts->begin();
const ParticlesList::const_iterator itp_end = parts->end();
for (...)
{
for (ParticlesList::const_iterator itp = itp_begin; itp != itp_end; ++itp)
{
//make some calculus
}
}
But I can't imagine this will make too much difference (unless your inner list is really short), but it shouldn't hurt much either.
On a further note, a linked list possibly isn't the fastest data structure for your purposes. Linked lists are most useful when you frequently need to insert items into the middle of the list. If the list is built and then fixed, you're probably better off with a std::vector. A std::vector may also be better even if you occasionally only need to add/remove items from the end (not the beginning or middle). If you have to add/remove from the beginning/end (but not middle) consider a std::deque.
If you absolutely need raw speed you should measure each possible choice you encounter, and keep the fastest.
Sounds like the list remains unchanged while you iterate over it. I'd try by storing the end of the list on a local variable.
typedef QLinkedList<Particle*> ParticlesList; // Particle is a custom class
ParticlesList* parts = // assign a QLinkedList
ParticlesList::const_iterator end = parts->end();
for (ParticlesList::const_iterator itp = parts->begin(); itp != end; ++itp)
{
// make some calculus
}
Qt containers are compatible with STL algorithms like std::for_each.
Try something like this:
std::for_each( parts->begin(), parts->end(), MyParticleCalculus );
where MyParticleCalculus is a functor that contains your calculus.
Qt also has its own foreach, but it's apparently just a macro to hide the iterators, so it probably won't give you any performance benefit.
(Edit: I'm recommending std::for_each per Scott Meyer's recommendation in "Effective STL": "Prefer algorithm calls to hand-written loops.")

Floating point keys in std:map

The following code is supposed to find the key 3.0in a std::map which exists. But due to floating point precision it won't be found.
map<double, double> mymap;
mymap[3.0] = 1.0;
double t = 0.0;
for(int i = 0; i < 31; i++)
{
t += 0.1;
bool contains = (mymap.count(t) > 0);
}
In the above example, contains will always be false.
My current workaround is just multiply t by 0.1 instead of adding 0.1, like this:
for(int i = 0; i < 31; i++)
{
t = 0.1 * i;
bool contains = (mymap.count(t) > 0);
}
Now the question:
Is there a way to introduce a fuzzyCompare to the std::map if I use double keys?
The common solution for floating point number comparison is usually something like a-b < epsilon. But I don't see a straightforward way to do this with std::map.
Do I really have to encapsulate the double type in a class and overwrite operator<(...) to implement this functionality?
So there are a few issues with using doubles as keys in a std::map.
First, NaN, which compares less than itself is a problem. If there is any chance of NaN being inserted, use this:
struct safe_double_less {
bool operator()(double left, double right) const {
bool leftNaN = std::isnan(left);
bool rightNaN = std::isnan(right);
if (leftNaN != rightNaN)
return leftNaN<rightNaN;
return left<right;
}
};
but that may be overly paranoid. Do not, I repeat do not, include an epsilon threshold in your comparison operator you pass to a std::set or the like: this will violate the ordering requirements of the container, and result in unpredictable undefined behavior.
(I placed NaN as greater than all doubles, including +inf, in my ordering, for no good reason. Less than all doubles would also work).
So either use the default operator<, or the above safe_double_less, or something similar.
Next, I would advise using a std::multimap or std::multiset, because you should be expecting multiple values for each lookup. You might as well make content management an everyday thing, instead of a corner case, to increase the test coverage of your code. (I would rarely recommend these containers) Plus this blocks operator[], which is not advised to be used when you are using floating point keys.
The point where you want to use an epsilon is when you query the container. Instead of using the direct interface, create a helper function like this:
// works on both `const` and non-`const` associative containers:
template<class Container>
auto my_equal_range( Container&& container, double target, double epsilon = 0.00001 )
-> decltype( container.equal_range(target) )
{
auto lower = container.lower_bound( target-epsilon );
auto upper = container.upper_bound( target+epsilon );
return std::make_pair(lower, upper);
}
which works on both std::map and std::set (and multi versions).
(In a more modern code base, I'd expect a range<?> object that is a better thing to return from an equal_range function. But for now, I'll make it compatible with equal_range).
This finds a range of things whose keys are "sufficiently close" to the one you are asking for, while the container maintains its ordering guarantees internally and doesn't execute undefined behavior.
To test for existence of a key, do this:
template<typename Container>
bool key_exists( Container const& container, double target, double epsilon = 0.00001 ) {
auto range = my_equal_range(container, target, epsilon);
return range.first != range.second;
}
and if you want to delete/replace entries, you should deal with the possibility that there might be more than one entry hit.
The shorter answer is "don't use floating point values as keys for std::set and std::map", because it is a bit of a hassle.
If you do use floating point keys for std::set or std::map, almost certainly never do a .find or a [] on them, as that is highly highly likely to be a source of bugs. You can use it for an automatically sorted collection of stuff, so long as exact order doesn't matter (ie, that one particular 1.0 is ahead or behind or exactly on the same spot as another 1.0). Even then, I'd go with a multimap/multiset, as relying on collisions or lack thereof is not something I'd rely upon.
Reasoning about the exact value of IEEE floating point values is difficult, and fragility of code relying on it is common.
Here's a simplified example of how using soft-compare (aka epsilon or almost equal) can lead to problems.
Let epsilon = 2 for simplicity. Put 1 and 4 into your map. It now might look like this:
1
\
4
So 1 is the tree root.
Now put in the numbers 2, 3, 4 in that order. Each will replace the root, because it compares equal to it. So then you have
4
\
4
which is already broken. (Assume no attempt to rebalance the tree is made.) We can keep going with 5, 6, 7:
7
\
4
and this is even more broken, because now if we ask whether 4 is in there, it will say "no", and if we ask for an iterator for values less than 7, it won't include 4.
Though I must say that I've used maps based on this flawed fuzzy compare operator numerous times in the past, and whenever I digged up a bug, it was never due to this. This is because datasets in my application areas never actually amount to stress-testing this problem.
As Naszta says, you can implement your own comparison function. What he leaves out is the key to making it work - you must make sure that the function always returns false for any values that are within your tolerance for equivalence.
return (abs(left - right) > epsilon) && (left < right);
Edit: as pointed out in many comments to this answer and others, there is a possibility for this to turn out badly if the values you feed it are arbitrarily distributed, because you can't guarantee that !(a<b) and !(b<c) results in !(a<c). This would not be a problem in the question as asked, because the numbers in question are clustered around 0.1 increments; as long as your epsilon is large enough to account for all possible rounding errors but is less than 0.05, it will be reliable. It is vitally important that the keys to the map are never closer than 2*epsilon apart.
You could implement own compare function.
#include <functional>
class own_double_less : public std::binary_function<double,double,bool>
{
public:
own_double_less( double arg_ = 1e-7 ) : epsilon(arg_) {}
bool operator()( const double &left, const double &right ) const
{
// you can choose other way to make decision
// (The original version is: return left < right;)
return (abs(left - right) > epsilon) && (left < right);
}
double epsilon;
};
// your map:
map<double,double,own_double_less> mymap;
Updated: see Item 40 in Effective STL!
Updated based on suggestions.
Using doubles as keys is not useful. As soon as you make any arithmetic on the keys you are not sure what exact values they have and hence cannot use them for indexing the map. The only sensible usage would be that the keys are constant.

The relationship between iterators and containers in STL

Good Day,
Assume that I am writing a Python-like range in C++. It provides all the characteristics of Random Access containers(Immutable of course). A question is raised in my mind about the following situation:
I have two different iterators, that point to different instances of the range container. The thing is that these two ranges are equal. i.e. they represent the same range. Would you allow the following situation:
fact: range1 == range2 e.g.
---------------------------
range range1(10, 20, 1), range2((10, 20, 1);
range::iterator i = range1.begin(), j = range2.begin();
assert(i == j); // would you allow this?
Sorry if I am missing a simple design rule in STL :)
By default, in the STL, two iterators from two different container are not comparable. This means, the behavior is unspecified. So you do whatever you want, nobody should even try.
edit
After looking carefully at the standard, section 24.1, paragraph 6 states:
An iterator j is called reachable from
an iterator i if and only if there is
a finite sequence of applications of
the expression ++i that makes i == j.
If j is reachable from i, they refer
to the same container.
Which means that if you allow i == j with i and j in two different container, you really consider both container as being the same. As they are immutable, this is perfectly fine. Just a question of semantic.
You might want to check boost::counting_iterator. Combined with boost::iterator_range you'll get something analogous to your range class (except that it will only allow a step-size of 1):
auto rng = boost::make_iterator_range(boost::make_counting_iterator(0),
boost::make_counting_iterator(10));
for(auto it = rng.begin(), e = rng.end(); it != e; ++it)
std::cout << it << " "; // Prints 0,1,2,3,...,9
For this class two iterators are considered equal provided that they contain the same number. But admittedly the situation is different than yours because here each iterator doesn't know to which range it belongs.
In STL the comparison rules are driven by the container's elements and not the container itself, so in my opinion you shouldn't be performing the dereference your self in your == operator overload.