The initial question was why with the following code,
std::vector<int> coll{1,4,7,10};
auto iseven = [](auto&& i){return i % 2 == 0; };
auto colleven = coll | std::views::filter(iseven);
// first view materialization
for(int& i : colleven)
{
i += 1;
}
for(auto i : coll)
std::cout << i << ' ';
std::cout << std::endl;
// second view materialization
for(int& i : colleven)
{
i += 1;
}
for(auto i : coll)
std::cout << i << ' ';
and by materializing the view twice, we get two different results. This is definately weird at first sight. The output:
1 5 7 11
1 6 7 11
After doing some research and looking into potential duplicates I understand that this is the cause of Undefined Behavior per https://eel.is/c++draft/range.filter#iterator-1.
Basically, std::filter_view::iterator - and other similar views - caches the begin iterator (filter_view is derived from remove_if_view) in order to achieve "laziness" resulting in it maintaining internal state. In the specific example the standard dictates that "even after modification of the view element the user should take care that the predicate remains true." So my question now becomes:
Isn't that a weird requirement? Asking from the user not to do something that otherwise would feel natural, materializing a filter view twice, that is. What are the compromises that we would have to make in order to alleviate this restriction and why didn't we make them?
Note: My question regards standard views and I know the code I linked is from range-v3. I presume the reference implementation corresponds to the standard in this case.
Isn't that a weird requirement? Asking from the user not to do something that otherwise would feel natural [...]
I don't think so. I think the code in the example is actually exceedingly weird to begin with, and it's not especially surprising that it doesn't work.
Views are intended to be ephemeral. You construct the view that you want, you use it, and then you throw it away. The view is (probably) going to have its own reference dependencies, and you should not touch them for the lifetime of the view. In Rust's terms, the view is borrowing the containers that it's constructed from.
With this in mind, it makes no sense to construct a filter, do something with it, then mutate the underlying container, then re-use the original filter. Just construct a new one.
What are the compromises that we would have to make in order to alleviate this restriction and why didn't we make them?
Uh, none. This restriction is fairly fundamental to even the iterator model and has nothing really to do with caching or any range-specific design choices.
The model of forward iterators is that if you copy a forward iterator, and then advance both, that both copies are valid and refer to the same element (assuming that they weren't originally end() so that advancing is actually valid). That remains true of filter as well:
vector<int> v = {1, 2, 3, 4};
auto f = v | views::filter(iseven);
auto it = f.begin(); // this is the 2
auto it2 = it; // this is also the 2
++it; // this is the 4
*it = 5;
++it2; // oops: this is v.end()
assert(it == it2); // nope
That assertion holding is an important part of the C++ iterator model, and it cannot possibly hold if you allow arbitrary mutation to occur.
Now the original iteration in the example:
for (int& i : colleven) {
i += 1;
}
This does mutation, of the kind that breaks guarantees. But this is kind of OK - we're mutating, but we're mutating in a way that happens to not have any ill-effects in this context. Reusing colleven after this is definitely not okay (because of the mutation breaking iterator guarantees). It's very difficult to actually articulate exactly what situations lead to undefined behavior.
But the fact that looping over colleven twice after internal mutation doesn't work in C++20 ranges isn't just specifically a consequence of caching begin() - it's a consequence of the fact that you just cannot allow doing this sort of thing and maintain any iterator guarantees. It's not a weird requirement - the code itself is problematic. It's just problematic in a way that is impossible to diagnose in C++.
The short version is: views are not intended to be long-lived, so don't use them that way.
This question already has answers here:
Modern C++ way to repeat code for set number of times
(3 answers)
Closed 2 years ago.
I want to execute a single operation multiple times, without defining a counter. For example, like this:
do(10)
{
//do something
}
I think this would be useful in several different scenarios. For example:
Deleting several consecutive items from std::list by a beginning index.
Emitting some signal several times, either over time or at some specific time.
Adding the same data to the list for custom initialization
Many scenes are not limited to those listed above.
Other languages have syntax similar to this that allows repeatedly executing the same command, without having to explicitly define a counter variable.In my opinion,defining a counter is completely inconsistent with human thinking.
Simulate how we think:
In reality, we always do sth. a few times directly.
But now the syntax looks like this:
Uh...I am going to do sth. three times.
Okay,ready,I started.
Soul torture: Why doesn't C++ provide a concise syntax? Although I am a fan of C++, I can’t help but wonder why some people don’t like C++ because C++ rarely considers how people think.I hope C++ can advance with the times and become the programming language of the future.
Different from Modern C++ way to repeat code for set number of times.I gave my plan, application scenarios, and even emotional appeals.
The evolution of C++ is by committee. In the simplest terms folk propose stuff and the committee accepts or rejects it.
Interestingly your suggestion
do (integral_expression)
{
}
would not be a breaking change. Note there's no while after the loop body or while adjacent to do. integral_expression is almost a production rule in C++ in case labels of switch blocks, although it could be run-time evaluable in this case. It could even lend itself to clean code in the sense that the equivalent
for (int i = 0; i < integral_expression; ++i)
{
}
introduces i into the loop body which can be inconvenient as it can shadow an existing i.
That said, thought is needed for the case where integral_expression is negative. Perhaps introduce unsigned_integral_expression not unlike what needs to be written as the size expression when declaring a variable length array in a reasonably common extension to standard C++?
If you want this feature in C++, then why not propose it?
Here is an example of C++ code that will do something a given number of times. Could also be done in C, but the functor would have to be a function pointer.
There may be a use case for this, but I'd think that for C++ programs the standard loop syntax would be preferable.
#include <iostream>
#include <functional>
static void Do(int count, std::function<void(int)> fn)
{
while(count)
{
if (count > 0) --count;
else if (count < 0) ++count;
fn(count);
}
}
int main()
{
Do(10, [](int count) { std::cout << "Loop is at " << count << "\n"; });
}
lambdas make a clean pure library implementation possible if needed:
template<typename F>
constexpr void repeat(std::size_t const n,F const& f){
for (std::size_t i=0;i<n;++i)
f();
};
int x{};
repeat(5,[&]{
std::cout << ++x << std::endl;
});
such a proposal is likely to get discarded by the committee, unless greater reasons support it.
I almost never see a for loop like this:
for (int i = 0; 5 != i; ++i)
{}
Is there a technical reason to use > or < instead of != when incrementing by 1 in a for loop? Or this is more of a convention?
while (time != 6:30pm) {
Work();
}
It is 6:31pm... Damn, now my next chance to go home is tomorrow! :)
This to show that the stronger restriction mitigates risks and is probably more intuitive to understand.
There is no technical reason. But there is mitigation of risk, maintainability and better understanding of code.
< or > are stronger restrictions than != and fulfill the exact same purpose in most cases (I'd even say in all practical cases).
There is duplicate question here; and one interesting answer.
Yes there is a reason. If you write a (plain old index based) for loop like this
for (int i = a; i < b; ++i){}
then it works as expected for any values of a and b (ie zero iterations when a > b instead of infinite if you had used i == b;).
On the other hand, for iterators you'd write
for (auto it = begin; it != end; ++it)
because any iterator should implement an operator!=, but not for every iterator it is possible to provide an operator<.
Also range-based for loops
for (auto e : v)
are not just fancy sugar, but they measurably reduce the chances to write wrong code.
You can have something like
for(int i = 0; i<5; ++i){
...
if(...) i++;
...
}
If your loop variable is written by the inner code, the i!=5 might not break that loop. This is safer to check for inequality.
Edit about readability.
The inequality form is way more frequently used. Therefore, this is very fast to read as there is nothing special to understand (brain load is reduced because the task is common). So it's cool for the readers to make use of these habits.
And last but not least, this is called defensive programming, meaning to always take the strongest case to avoid current and future errors influencing the program.
The only case where defensive programming is not needed is where states have been proven by pre- and post-conditions (but then, proving this is the most defensive of all programming).
I would argue that an expression like
for ( int i = 0 ; i < 100 ; ++i )
{
...
}
is more expressive of intent than is
for ( int i = 0 ; i != 100 ; ++i )
{
...
}
The former clearly calls out that the condition is a test for an exclusive upper bound on a range; the latter is a binary test of an exit condition. And if the body of the loop is non-trivial, it may not apparent that the index is only modified in the for statement itself.
Iterators are an important case when you most often use the != notation:
for(auto it = vector.begin(); it != vector.end(); ++it) {
// do stuff
}
Granted: in practice I would write the same relying on a range-for:
for(auto & item : vector) {
// do stuff
}
but the point remains: one normally compares iterators using == or !=.
The loop condition is an enforced loop invariant.
Suppose you don't look at the body of the loop:
for (int i = 0; i != 5; ++i)
{
// ?
}
in this case, you know at the start of the loop iteration that i does not equal 5.
for (int i = 0; i < 5; ++i)
{
// ?
}
in this case, you know at the start of the loop iteration that i is less than 5.
The second is much, much more information than the first, no? Now, the programmer intent is (almost certainly) the same, but if you are looking for bugs, having confidence from reading a line of code is a good thing. And the second enforces that invariant, which means some bugs that would bite you in the first case just cannot happen (or don't cause memory corruption, say) in the second case.
You know more about the state of the program, from reading less code, with < than with !=. And on modern CPUs, they take the same amount of time as no difference.
If your i was not manipulated in the loop body, and it was always increased by 1, and it started less than 5, there would be no difference. But in order to know if it was manipulated, you'd have to confirm each of these facts.
Some of these facts are relatively easy, but you can get wrong. Checking the entire body of the loop is, however, a pain.
In C++ you can write an indexes type such that:
for( const int i : indexes(0, 5) )
{
// ?
}
does the same thing as either of the two above for loops, even down to the compiler optimizing it down to the same code. Here, however, you know that i cannot be manipulated in the body of the loop, as it is declared const, without the code corrupting memory.
The more information you can get out of a line of code without having to understand the context, the easier it is to track down what is going wrong. < in the case of integer loops gives you more information about the state of the code at that line than != does.
As already said by Ian Newson, you can't reliably loop over a floating variable and exit with !=. For instance,
for (double x=0; x!=1; x+=0.1) {}
will actually loop forever, because 0.1 can't exactly be represented in floating point, hence the counter narrowly misses 1. With < it terminates.
(Note however that it's basically undefined behaviour whether you get 0.9999... as the last accepted number – which kind of violates the less-than assumption – or already exit at 1.0000000000000001.)
Yes; OpenMP doesn't parallelize loops with the != condition.
It may happen that the variable i is set to some large value and if you just use the != operator you will end up in an endless loop.
As you can see from the other numerous answers, there are reasons to use < instead of != which will help in edge cases, initial conditions, unintended loop counter modification, etc...
Honestly though, I don't think you can stress the importance of convention enough. For this example it will be easy enough for other programmers to see what you are trying to do, but it will cause a double-take. One of the jobs while programming is making it as readable and familiar to everyone as possible, so inevitably when someone has to update/change your code, it doesn't take a lot of effort to figure out what you were doing in different code blocks. If I saw someone use !=, I'd assume there was a reason they used it instead of < and if it was a large loop I'd look through the whole thing trying to figure out what you did that made that necessary... and that's wasted time.
I take the adjectival "technical" to mean language behavior/quirks and compiler side effects such as performance of generated code.
To this end, the answer is: no(*). The (*) is "please consult your processor manual". If you are working with some edge-case RISC or FPGA system, you may need to check what instructions are generated and what they cost. But if you're using pretty much any conventional modern architecture, then there is no significant processor level difference in cost between lt, eq, ne and gt.
If you are using an edge case you could find that != requires three operations (cmp, not, beq) vs two (cmp, blt xtr myo). Again, RTM in that case.
For the most part, the reasons are defensive/hardening, especially when working with pointers or complex loops. Consider
// highly contrived example
size_t count_chars(char c, const char* str, size_t len) {
size_t count = 0;
bool quoted = false;
const char* p = str;
while (p != str + len) {
if (*p == '"') {
quote = !quote;
++p;
}
if (*(p++) == c && !quoted)
++count;
}
return count;
}
A less contrived example would be where you are using return values to perform increments, accepting data from a user:
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i != len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
i += step; // here for emphasis, it could go in the for(;;)
}
}
Try this and input the values 1, 2, 10, 999.
You could prevent this:
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i != len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
if (step + i > len)
std::cout << "too much.\n";
else
i += step;
}
}
But what you probably wanted was
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i < len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
i += step;
}
}
There is also something of a convention bias towards <, because ordering in standard containers often relies on operator<, for instance hashing in several STL containers determines equality by saying
if (lhs < rhs) // T.operator <
lessthan
else if (rhs < lhs) // T.operator < again
greaterthan
else
equal
If lhs and rhs are a user defined class writing this code as
if (lhs < rhs) // requires T.operator<
lessthan
else if (lhs > rhs) // requires T.operator>
greaterthan
else
equal
The implementor has to provide two comparison functions. So < has become the favored operator.
There are several ways to write any kind of code (usually), there just happens to be two ways in this case (three if you count <= and >=).
In this case, people prefer > and < to make sure that even if something unexpected happens in the loop (like a bug), it won't loop infinitely (BAD). Consider the following code, for example.
for (int i = 1; i != 3; i++) {
//More Code
i = 5; //OOPS! MISTAKE!
//More Code
}
If we used (i < 3), we would be safe from an infinite loop because it placed a bigger restriction.
Its really your choice whether you want a mistake in your program to shut the whole thing down or keep functioning with the bug there.
Hope this helped!
The most common reason to use < is convention. More programmers think of loops like this as "while the index is in range" rather than "until the index reaches the end." There's value is sticking to convention when you can.
On the other hand, many answers here are claiming that using the < form helps avoid bugs. I'd argue that in many cases this just helps hide bugs. If the loop index is supposed to reach the end value, and, instead, it actually goes beyond it, then there's something happening you didn't expect which may cause a malfunction (or be a side effect of another bug). The < will likely delay discovery of the bug. The != is more likely to lead to a stall, hang, or even a crash, which will help you spot the bug sooner. The sooner a bug is found, the cheaper it is to fix.
Note that this convention is peculiar to array and vector indexing. When traversing nearly any other type of data structure, you'd use an iterator (or pointer) and check directly for an end value. In those cases you have to be sure the iterator will reach and not overshoot the actual end value.
For example, if you're stepping through a plain C string, it's generally more common to write:
for (char *p = foo; *p != '\0'; ++p) {
// do something with *p
}
than
int length = strlen(foo);
for (int i = 0; i < length; ++i) {
// do something with foo[i]
}
For one thing, if the string is very long, the second form will be slower because the strlen is another pass through the string.
With a C++ std::string, you'd use a range-based for loop, a standard algorithm, or iterators, even if though the length is readily available. If you're using iterators, the convention is to use != rather than <, as in:
for (auto it = foo.begin(); it != foo.end(); ++it) { ... }
Similarly, iterating a tree or a list or a deque usually involves watching for a null pointer or other sentinel rather than checking if an index remains within a range.
One reason not to use this construct is floating point numbers. != is a very dangerous comparison to use with floats as it'll rarely evaluate to true even if the numbers look the same. < or > removes this risk.
There are two related reasons for following this practice that both have to do with the fact that a programming language is, after all, a language that will be read by humans (among others).
(1) A bit of redundancy. In natural language we usually provide more information than is strictly necessary, much like an error correcting code. Here the extra information is that the loop variable i (see how I used redundancy here? If you didn't know what 'loop variable' means, or if you forgot the name of the variable, after reading "loop variable i" you have the full information) is less than 5 during the loop, not just different from 5. Redundancy enhances readability.
(2) Convention. Languages have specific standard ways of expressing certain situations. If you don't follow the established way of saying something, you will still be understood, but the effort for the recipient of your message is greater because certain optimisations won't work. Example:
Don't talk around the hot mash. Just illuminate the difficulty!
The first sentence is a literal translation of a German idiom. The second is a common English idiom with the main words replaced by synonyms. The result is comprehensible but takes a lot longer to understand than this:
Don't beat around the bush. Just explain the problem!
This is true even in case the synonyms used in the first version happen to fit the situation better than the conventional words in the English idiom. Similar forces are in effect when programmers read code. This is also why 5 != i and 5 > i are weird ways of putting it unless you are working in an environment in which it is standard to swap the more normal i != 5 and i < 5 in this way. Such dialect communities do exist, probably because consistency makes it easier to remember to write 5 == i instead of the natural but error prone i == 5.
Using relational comparisons in such cases is more of a popular habit than anything else. It gained its popularity back in the times when such conceptual considerations as iterator categories and their comparability were not considered high priority.
I'd say that one should prefer to use equality comparisons instead of relational comparisons whenever possible, since equality comparisons impose less requirements on the values being compared. Being EqualityComparable is a lesser requirement than being LessThanComparable.
Another example that demonstrates the wider applicability of equality comparison in such contexts is the popular conundrum with implementing unsigned iteration down to 0. It can be done as
for (unsigned i = 42; i != -1; --i)
...
Note that the above is equally applicable to both signed and unsigned iteration, while the relational version breaks down with unsigned types.
Besides the examples, where the loop variable will (unintentional) change inside the body, there are other reasions to use the smaller-than or greater-than operators:
Negations make code harder to understand
< or > is only one char, but != two
In addition to the various people who have mentioned that it mitigates risk, it also reduces the number of function overloads necessary to interact with various standard library components. As an example, if you want your type to be storable in a std::set, or used as a key for std::map, or used with some of the searching and sorting algorithms, the standard library usually uses std::less to compare objects as most algorithms only need a strict weak ordering. Thus it becomes a good habit to use the < comparisons instead of != comparisons (where it makes sense, of course).
There is no problem from a syntax perspective, but the logic behind that expression 5!=i is not sound.
In my opinion, using != to set the bounds of a for loop is not logically sound because a for loop either increments or decrements the iteration index, so setting the loop to iterate until the iteration index becomes out of bounds (!= to something) is not a proper implementation.
It will work, but it is prone to misbehavior since the boundary data handling is lost when using != for an incremental problem (meaning that you know from the start if it increments or decrements), that's why instead of != the <>>==> are used.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Why there is a need of 3 different loops : "while", "do-while", and "for" to exist in c/c++, especially when each of them gives you power to do almost anything that the other 2 can do? Other languages lack one or the other.
Is it just for ease of use or to make the code look better in cases, or are there any special purposes that are served by any one of them specifically that can't be accomplished so easily with the other two? If yes, then please mention.
P.S. - In general, do a language support many iteration syntax just to enhance readability?
It's not just readability, it's also the closely-related but distinct maintainability, and concision, and scoping (esp. for files, locks, smart pointers etc.), and performance....
If we consider the for loop, it:
allows some variables to be defined - in the for loop's own scope - and initialised,
tests a control expression before entering the loop each time (including the first), and
has a statement that gets executed after each iteration and before re-testing the control expression, assuming no break/return/throw/exit/failed assert etc., and regardless of whether the last statement in the body executed or whether a continue statement executed; this statement is traditionally reserved for logically "advancing" some state "through" the processing, such that the next test of the control expression is meaningful.
That's very flexible and given the utility of more localised scopes to ensure earlier destructor invocation, can help ensure locks, files, memory etc. are released as early as possible - implicitly when leaving the loop.
If we consider a while loop...
while (expression to test)
...
...it's functionally exactly equivalent to...
for ( ; expression to test; )
...
...but, it also implies to the programmer that there are no control variables that should be local to the loop, and that either the control "expression to test" inherently "progresses" through a finite number of iterations, loops forever if the test expression is hardcoded true, or more complicated management of "progress" had to bed itself controlled and coordinated by the statements the while controls.
In other words, a programmer seeing while is automatically aware that they need to study the control expression more carefully, then possibly look more widely at both the surrounding scope/function and the contained statements, to understand the loop behaviour.
So, do-while? Well, writing code like this is painful and less efficient:
bool first_time = true;
while (first_time || ...)
{
first_time = false;
...
}
// oops... first_time still hanging around...
...compared to...
do
...
while (...);
Examples
While loop:
int i = 23;
while (i < 99)
{
if (f(i)) { ++i; continue; }
if (g(i)) break;
++i;
}
// oops... i is hanging around
For loop:
for (int i = 23; i < 99; ++i)
{
if (f(i)) continue;
if (g(i)) break;
}
Well, C++ has goto and you can use it to implement all three loops, but it doesn't mean that they should be removed. Actually it just increases readability. Of course you could implement any of them yourself.
Some loops are easiest to write using for, some are easiest to write using while, and some are easiest to write using do-while. So the language provides all three.
We have things things like the += operator for the same reason; += doesn't do anything that you can't do with plain +, but using it (where appropriate) can make your code a bit more readable.
In general, when presented with different language constructs that accomplish similar purposes, you should choose the one that more clearly communicates the intended purpose of the code you are writing. It is a benefit that C provides four distinct structured iteration devices to use, as it provides a high chance you can clearly communicate the intended purpose.
for ( initialization ; condition ; iteration-step ) body
This form communicates how the loop will start, how it will adjust things for the next iteration, and what is the condition to stay within the loop. This construct lends itself naturally for doing something N times.
for (int i = 0; i < N; ++i) {
/* ... */
}
while ( condition ) body
This form communicates simply that you wish to continue to perform the loop while the condition remains true. For loops where the iteration-step is implicit to the way the loop works, it can be a more natural way to communicate the intention of the code:
while (std::cin >> word) {
/* ... */
}
do body while ( condition )
This form communicates that the loop body will execute at least once, and then continues while the condition remains true. This is useful for situations where you have already determined that you need to execute the body, so you avoid a redundant looking test.
if (count > 0) {
do {
/* ... */
} while (--count > 0);
} else {
puts("nothing to do");
}
The fourth iteration device is ... recursion!
Recursion is another form of iteration that expresses that the same function can be used to work on a smaller part of the original problem. It is a natural way to express a divide and conquer strategy to a problem (like binary searching, or sorting), or to work on data structures that self-referential (such as lists or trees).
struct node {
struct node *next;
char name[32];
char info[256];
};
struct node * find (struct node *list, char *name)
{
if (list == NULL || strcmp(name, list->name) == 0) {
return list;
}
return find(list->next, name);
}