GCC, std::array and for vs ranged-based for - c++

I'm trying to use (large-ish) arrays of structs containing only a single small std::array each.
Especially the K == 1 case should be well supported (see code).
However, GCC seems to be unable to properly optimize these arrays in most cases, especially when using ranged-based for.
Clang produces code that I don't fully understand but seems to be well optimized (and uses SSE, which GCC doesn't).
#include <vector>
#include <array>
template<int K>
struct Foo {
std::array<int, K> vals;
int bar() const {
int sum = 0;
#if 0 // Foo Version A
for (auto f : vals)
sum += f;
#else // Foo Version B
for (auto i = 0; i < K; ++i)
sum += vals[i];
#endif
return sum;
}
};
int test1(std::vector<Foo<1>> const& foos)
{
int sum = 0;
for (auto const& f : foos)
sum += f.bar();
return sum;
}
// Version C
int test2(std::vector<std::array<int, 1>> const& foos)
{
int sum = 0;
for (auto const& f : foos)
for (auto const& v : f)
sum += v;
return sum;
}
// Version D
int test3(std::vector<std::array<int, 1>> const& foos)
{
int sum = 0;
for (auto const& f : foos)
for (auto i = 0; i < f.size(); ++i)
sum += f[i];
return sum;
}
Godbolt Code, gcc 7.2, flags -O2 -std=c++11 -march=native. Older gcc versions behave similarly.
If I'm not mistaken, then all four versions should have the same semantic.
Furthermore, I would expect all versions to compile to about the same assembly.
The assembly should only have one frequently used conditional jump (for iterating over the vector).
However, the following happens:
Version A (ranged-based for, array-in-struct): 3 conditional jumps, one at the beginning for handling zero-length vectors. Then one for the vector (this is ok). But then another for iterating over the array? Why? It has constant size 1.
Version B (manual for, array-in-struct): Here, GCC actually recognizes that the array of length 1 can be optimized, the assembly looks good.
Version C (ranged-based for, direct array): The loop over the array is not optimized away, so again two conditional jumps for the looping. Also: this version contains more memory access that I thought would be required.
Version D (manual for, direct array): This one is the only version that looks sane to me. 11 instructions.
Clang creates way more assembly code (same flags) but it's pretty much the same for all versions and contains a lot of loop unrolling and SSE.
Is this a GCC-related problem? Should I file a bug? Is this something in my C++ code that I should/can fix?
EDIT: updated the godbolt url due to a fix in Version B. Behavior is now the same as Version D, which makes this a pure manual-for vs. ranged-for issue.

Related

Is there an efficient way to slice a C++ vector given a vector containing the indexes to be sliced

I am working to implement a code which was written in MATLAB into C++.
In MATLAB you can slice an Array with another array, like A(B), which results in a new array of the elements of A at the indexes specified by the values of the element in B.
I would like to do a similar thing in C++ using vectors. These vectors are of size 10000-40000 elements of type double.
I want to be able to slice these vectors using another vector of type int containing the indexes to be sliced.
For example, I have a vector v = <1.0, 3.0, 5.0, 2.0, 8.0> and a vector w = <0, 3, 2>. I want to slice v using w such that the outcome of the slice is a new vector (since the old vector must remain unchanged) x = <1.0, 2.0, 5.0>.
I came up with a function to do this:
template<typename T>
std::vector<T> slice(std::vector<T>& v, std::vector<int>& id) {
std::vector<T> tmp;
tmp.reserve(id.size());
for (auto& i : id) {
tmp.emplace_back(v[i]);
}
return tmp;
}
I was wondering if there was potentially a more efficient way to do such a task. Speed is the key here since this slice function will be in a for-loop which has approximately 300000 iterations. I heard the boost library might contain some valid solutions, but I have not had experience yet with it.
I used the chrono library to measure the time it takes to call this slice function, where the vector to be sliced was length 37520 and the vector containing the indexes was size 1550. For a single call of this function, the time elapsed = 0.0004284s. However, over ~300000 for-loop iterations, the total elapsed time was 134s.
Any advice would be much appreicated!
emplace_back has some overhead as it involves some internal accounting inside std::vector. Try this instead:
template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {
std::vector<T> tmp;
tmp.resize (id.size ());
size_t n = 0;
for (auto i : id) {
tmp [n++] = v [i];
}
return tmp;
}
Also, I removed an unnecessary dereference in your inner loop.
Edit: I thought about this some more, and inspired by #jack's answer, I think that the inner loop (which is the one that counts) can be optimised further. The idea is to put everything used by the loop in local variables, which gives the compiler the best chance to optimise the code. So try this, see what timings you get. Make sure that you test a Release / optimised build:
template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {
size_t id_size = id.size ();
std::vector<T> tmp (id_size);
T *tmp_data = tmp.data ();
const int *id_data = id.data ();
const T* v_data = v.data ();
for (size_t i = 0; i < id_size; ++i) {
tmp_data [i] = v_data [id_data [i]];
}
return tmp;
}
The performance seems a bit slow; are you building with compiler optimizations (eg. g++ main.cpp -O3 or if using an IDE, switching to release mode). This alone sped up computation time around 10x.
If you are using optimizations already, by using basic for loop iteration (for int i = 0; i < id.size(); i++) computation time was sped up around 2-3x on my machine, the idea being, the compiler doesn't have to resolve what type auto refers to, and since basic for loops have been in C++ forever, the compiler is likely to have lots of tricks to speed it up.
template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id){
// #Jan Schultke's suggestion
std::vector<T> tmp(id.size ());
size_t n = 0;
for (int i = 0; i < id.size(); i++) {
tmp [n++] = v [i];
}
return tmp;
}

Performance penalty of using boost::irange over raw loop

A few answers and discussions and even the source code of boost::irange mention that there should be a performance penalty to using these ranges over raw for loops.
However, for example for the following code
#include <boost/range/irange.hpp>
int sum(int* v, int n) {
int result{};
for (auto i : boost::irange(0, n)) {
result += v[i];
}
return result;
}
int sum2(int* v, int n) {
int result{};
for (int i = 0; i < n; ++i) {
result += v[i];
}
return result;
}
I see no differences in the generated (-O3 optimized) code (Compiler Explorer). Does anyone see an example where using such an integer range could lead to worse code generation in modern compilers?
EDIT: Clearly, debug performance might be impacted, but that's not my aim here.
Concerning the strided (step size > 1) example, I think it might be possible to modify the irange code to more closely match the code of a strided raw for-loop.
Does anyone see an example where using such an integer range could lead to worse code generation in modern compilers?
Yes. It is not stated that your particular case is affected. But changing the step to anything else than 1:
#include <boost/range/irange.hpp>
int sum(int* v, int n) {
int result{};
for (auto i : boost::irange(0, n, 8)) {
result += v[i]; //^^^ different steps
}
return result;
}
int sum2(int* v, int n) {
int result{};
for (int i = 0; i < n; i+=8) {
result += v[i]; //^^^ different steps
}
return result;
}
Live.
While sum now looks worse (the loop did not get unrolled) sum2 still benefits from loop unrolling and SIMD optimization.
Edit:
To comment on your edit, it's true that it might be possible to modify the irange code to more closely. But:
To fit how range-based for loops are expanded, boost::irange(0, n, 8) must create some sort of temporary, implementing begin/end iterators and a prefix operator++ (which is crearly not as trivial as an int += operation). Compilers are using pattern matching for optimization, which is trimmed to work with standard C++ and standard libraries. Thus, any result from irange; if it is slightly different than a pattern the compiler knows to optimize, optimization won't kick in. And I think, these are the reason why the author of the library mentions performance penalties.

Why isn't a for-loop a compile-time expression?

If I want to do something like iterate over a tuple, I have to resort to crazy template metaprogramming and template helper specializations. For example, the following program won't work:
#include <iostream>
#include <tuple>
#include <utility>
constexpr auto multiple_return_values()
{
return std::make_tuple(3, 3.14, "pi");
}
template <typename T>
constexpr void foo(T t)
{
for (auto i = 0u; i < std::tuple_size<T>::value; ++i)
{
std::get<i>(t);
}
}
int main()
{
constexpr auto ret = multiple_return_values();
foo(ret);
}
Because i can't be const or we wouldn't be able to implement it. But for loops are a compile-time construct that can be evaluated statically. Compilers are free to remove it, transform it, fold it, unroll it or do whatever they want with it thanks to the as-if rule. But then why can't loops be used in a constexpr manner? There's nothing in this code that needs to be done at "runtime". Compiler optimizations are proof of that.
I know that you could potentially modify i inside the body of the loop, but the compiler can still be able to detect that. Example:
// ...snip...
template <typename T>
constexpr int foo(T t)
{
/* Dead code */
for (auto i = 0u; i < std::tuple_size<T>::value; ++i)
{
}
return 42;
}
int main()
{
constexpr auto ret = multiple_return_values();
/* No error */
std::array<int, foo(ret)> arr;
}
Since std::get<>() is a compile-time construct, unlike std::cout.operator<<, I can't see why it's disallowed.
πάντα ῥεῖ gave a good and useful answer, I would like to mention another issue though with constexpr for.
In C++, at the most fundamental level, all expressions have a type which can be determined statically (at compile-time). There are things like RTTI and boost::any of course, but they are built on top of this framework, and the static type of an expression is an important concept for understanding some of the rules in the standard.
Suppose that you can iterate over a heterogenous container using a fancy for syntax, like this maybe:
std::tuple<int, float, std::string> my_tuple;
for (const auto & x : my_tuple) {
f(x);
}
Here, f is some overloaded function. Clearly, the intended meaning of this is to call different overloads of f for each of the types in the tuple. What this really means is that in the expression f(x), overload resolution has to run three different times. If we play by the current rules of C++, the only way this can make sense is if we basically unroll the loop into three different loop bodies, before we try to figure out what the types of the expressions are.
What if the code is actually
for (const auto & x : my_tuple) {
auto y = f(x);
}
auto is not magic, it doesn't mean "no type info", it means, "deduce the type, please, compiler". But clearly, there really need to be three different types of y in general.
On the other hand, there are tricky issues with this kind of thing -- in C++ the parser needs to be able to know what names are types and what names are templates in order to correctly parse the language. Can the parser be modified to do some loop unrolling of constexpr for loops before all the types are resolved? I don't know but I think it might be nontrivial. Maybe there is a better way...
To avoid this issue, in current versions of C++, people use the visitor pattern. The idea is that you will have an overloaded function or function object and it will be applied to each element of the sequence. Then each overload has its own "body" so there's no ambiguity as to the types or meanings of the variables in them. There are libraries like boost::fusion or boost::hana that let you do iteration over heterogenous sequences using a given vistior -- you would use their mechanism instead of a for-loop.
If you could do constexpr for with just ints, e.g.
for (constexpr i = 0; i < 10; ++i) { ... }
this raises the same difficulty as heterogenous for loop. If you can use i as a template parameter inside the body, then you can make variables that refer to different types in different runs of the loop body, and then it's not clear what the static types of the expressions should be.
So, I'm not sure, but I think there may be some nontrivial technical issues associated with actually adding a constexpr for feature to the language. The visitor pattern / the planned reflection features may end up being less of a headache IMO... who knows.
Let me give another example I just thought of that shows the difficulty involved.
In normal C++, the compiler knows the static type of every variable on the stack, and so it can compute the layout of the stack frame for that function.
You can be sure that the address of a local variable won't change while the function is executing. For instance,
std::array<int, 3> a{{1,2,3}};
for (int i = 0; i < 3; ++i) {
auto x = a[i];
int y = 15;
std::cout << &y << std::endl;
}
In this code, y is a local variable in the body of a for loop. It has a well-defined address throughout this function, and the address printed by the compiler will be the same each time.
What should be the behavior of similar code with constexpr for?
std::tuple<int, long double, std::string> a{};
for (int i = 0; i < 3; ++i) {
auto x = std::get<i>(a);
int y = 15;
std::cout << &y << std::endl;
}
The point is that the type of x is deduced differently in each pass through the loop -- since it has a different type, it may have different size and alignment on the stack. Since y comes after it on the stack, that means that y might change its address on different runs of the loop -- right?
What should be the behavior if a pointer to y is taken in one pass through the loop, and then dereferenced in a later pass? Should it be undefined behavior, even though it would probably be legal in the similar "no-constexpr for" code with std::array showed above?
Should the address of y not be allowed to change? Should the compiler have to pad the address of y so that the largest of the types in the tuple can be accommodated before y? Does that mean that the compiler can't simply unroll the loops and start generating code, but must unroll every instance of the loop before-hand, then collect all of the type information from each of the N instantiations and then find a satisfactory layout?
I think you are better off just using a pack expansion, it's a lot more clear how it is supposed to be implemented by the compiler, and how efficient it's going to be at compile and run time.
Here's a way to do it that does not need too much boilerplate, inspired from http://stackoverflow.com/a/26902803/1495627 :
template<std::size_t N>
struct num { static const constexpr auto value = N; };
template <class F, std::size_t... Is>
void for_(F func, std::index_sequence<Is...>)
{
using expander = int[];
(void)expander{0, ((void)func(num<Is>{}), 0)...};
}
template <std::size_t N, typename F>
void for_(F func)
{
for_(func, std::make_index_sequence<N>());
}
Then you can do :
for_<N>([&] (auto i) {
std::get<i.value>(t); // do stuff
});
If you have a C++17 compiler accessible, it can be simplified to
template <class F, std::size_t... Is>
void for_(F func, std::index_sequence<Is...>)
{
(func(num<Is>{}), ...);
}
In C++20 most of the std::algorithm functions will be constexpr. For example using std::transform, many operations requiring a loop can be done at compile time. Consider this example calculating the factorial of every number in an array at compile time (adapted from Boost.Hana documentation):
#include <array>
#include <algorithm>
constexpr int factorial(int n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
template <typename T, std::size_t N, typename F>
constexpr std::array<std::result_of_t<F(T)>, N>
transform_array(std::array<T, N> array, F f) {
auto array_f = std::array<std::result_of_t<F(T)>, N>{};
// This is a constexpr "loop":
std::transform(array.begin(), array.end(), array_f.begin(), [&f](auto el){return f(el);});
return array_f;
}
int main() {
constexpr std::array<int, 4> ints{{1, 2, 3, 4}};
// This can be done at compile time!
constexpr std::array<int, 4> facts = transform_array(ints, factorial);
static_assert(facts == std::array<int, 4>{{1, 2, 6, 24}}, "");
}
See how the array facts can be computed at compile time using a "loop", i.e. an std::algorithm. At the time of writing this, you need an experimental version of the newest clang or gcc release which you can try out on godbolt.org. But soon C++20 will be fully implemented by all the major compilers in the release versions.
This proposal "Expansion Statements" is interesting and I will provide the link for you to read further explanations.
Click this link
The proposal introduced the syntactic sugar for... as similar to the sizeof... operator. for... loop statement is a compile-time expression which means it has nothing to do in the runtime.
For example:
std::tuple<int, float, char> Tup1 {5, 3.14, 'K'};
for... (auto elem : Tup1) {
std::cout << elem << " ";
}
The compiler will generate the code at the compile-time and this is the equivalence:
std::tuple<int, float, char> Tup1 {5, 3.14, 'K'};
{
auto elem = std::get<0>(Tup1);
std::cout << elem << " ";
}
{
auto elem = std::get<1>(Tup1);
std::cout << elem << " ";
}
{
auto elem = std::get<2>(Tup1);
std::cout << elem << " ";
}
Thus, the expansion statement is not a loop but a repeated version of the loop body as it was said in the document.
Since this proposal isn't in C++'s current version or in the technical specification (if it's accepted). We can use the alternative version from the boost library specifically <boost/hana/for_each.hpp> and use the tuple version of boost from <boost/hana/tuple.hpp>. Click this link.
#include <boost/hana/for_each.hpp>
#include <boost/hana/tuple.hpp>
using namespace boost;
...
hana::tuple<int, std::string, float> Tup1 {5, "one", 5.55};
hana::for_each(Tup1, [](auto&& x){
std::cout << x << " ";
});
// Which will print:
// 5 "one" 5.55
The first argument of boost::hana::for_each must be a foldable container.
Why isn't a for-loop a compile-time expression?
Because a for() loop is used to define runtime control flow in the c++ language.
Generally variadic templates cannot be unpacked within runtime control flow statements in c++.
std::get<i>(t);
cannot be deduced at compile time, since i is a runtime variable.
Use variadic template parameter unpacking instead.
You might also find this post useful (if this not even remarks a duplicate having answers for your question):
iterate over tuple
Here are two examples attempting to replicate a compile-time for loop (which isn't part of the language at this time), using fold expressions and std::integer_sequence. The first example shows a simple assignment in the loop, and the second example shows tuple indexing and uses a lambda with template parameters available in C++20.
For a function with a template parameter, e.g.
template <int n>
constexpr int factorial() {
if constexpr (n == 0) { return 1; }
else { return n * factorial<n - 1>(); }
}
Where we want to loop over the template parameter, like this:
template <int N>
constexpr auto example() {
std::array<int, N> vals{};
for (int i = 0; i < N; ++i) {
vals[i] = factorial<i>(); // this doesn't work
}
return vals;
}
One can do this:
template <int... Is>
constexpr auto get_array(std::integer_sequence<int, Is...> a) -> std::array<int, a.size()> {
std::array<int, a.size()> vals{};
((vals[Is] = factorial<Is>()), ...);
return vals;
}
And then get the result at compile time:
constexpr auto x = get_array(std::make_integer_sequence<int, 5>{});
// x = {1, 1, 2, 6, 24}
Similarly, for a tuple:
constexpr auto multiple_return_values()
{
return std::make_tuple(3, 3.14, "pi");
}
int main(void) {
static constexpr auto ret = multiple_return_values();
constexpr auto for_constexpr = [&]<int... Is>(std::integer_sequence<int, Is...> a) {
((std::get<Is>(ret)), ...); // std::get<i>(t); from the question
return 0;
}
// use it:
constexpr auto w = for_constexpr(std::make_integer_sequence<int, std::tuple_size_v<decltype(ret)>>{});
}

std::array vs array performance

If I want to build a very simple array like:
int myArray[3] = {1,2,3};
Should I use std::array instead?
std::array<int, 3> a = {{1, 2, 3}};
What are the advantages of using std::array over usual ones? Is it more performant? Just easier to handle for copy/access?
What are the advantages of using std::array over usual ones?
It has friendly value semantics, so that it can be passed to or returned from functions by value. Its interface makes it more convenient to find the size, and use with STL-style iterator-based algorithms.
Is it more performant ?
It should be exactly the same. By definition, it's a simple aggregate containing an array as its only member.
Just easier to handle for copy/access ?
Yes.
A std::array is a very thin wrapper around a C-style array, basically defined as
template<typename T, size_t N>
struct array
{
T _data[N];
T& operator[](size_t);
const T& operator[](size_t) const;
// other member functions and typedefs
};
It is an aggregate, and it allows you to use it almost like a fundamental type (i.e. you can pass-by-value, assign etc, whereas a standard C array cannot be assigned or copied directly to another array). You should take a look at some standard implementation (jump to definition from your favourite IDE or directly open <array>), it is a piece of the C++ standard library that is quite easy to read and understand.
std::array is designed as zero-overhead wrapper for C arrays that gives it the "normal" value like semantics of the other C++ containers.
You should not notice any difference in runtime performance while you still get to enjoy the extra features.
Using std::array instead of int[] style arrays is a good idea if you have C++11 or boost at hand.
Is it more performant ?
It should be exactly the same. By definition, it's a simple aggregate containing an array as its only member.
The situation seems to be more complicated, as std::array does not always produce identical assembly code compared to C-array depending on the specific platform.
I tested this specific situation on godbolt:
#include <array>
void test(double* const C, const double* const A,
const double* const B, const size_t size) {
for (size_t i = 0; i < size; i++) {
//double arr[2] = {0.e0};//
std::array<double, 2> arr = {0.e0};//different to double arr[2] for some compiler
for (size_t j = 0; j < size; j++) {
arr[0] += A[i] * B[j];
arr[1] += A[j] * B[i];
}
C[i] += arr[0];
C[i] += arr[1];
}
}
GCC and Clang produce identical assembly code for both the C-array version and the std::array version.
MSVC and ICPC, however, produce different assembly code for each array version. (I tested ICPC19 with -Ofast and -Os; MSVC -Ox and -Os)
I have no idea, why this is the case (I would indeed expect exactly identical behavior of std::array and c-array). Maybe there are different optimization strategies employed.
As a little extra:
There seems to be a bug in ICPC with
#pragma simd
for vectorization when using the c-array in some situations
(the c-array code produces a wrong output; the std::array version works fine).
Unfortunately, I do not have a minimal working example for that yet, since I discovered that problem while optimizing a quite complicated piece of code.
I will file a bug-report to intel when I am sure that I did not just misunderstood something about C-array/std::array and #pragma simd.
std::array has value semantics while raw arrays do not. This means you can copy std::array and treat it like a primitive value. You can receive them by value or reference as function arguments and you can return them by value.
If you never copy a std::array, then there is no performance difference than a raw array. If you do need to make copies then std::array will do the right thing and should still give equal performance.
You will get the same perfomance results using std::array and c array
If you run these code:
std::array<QPair<int, int>, 9> *m_array=new std::array<QPair<int, int>, 9>();
QPair<int, int> *carr=new QPair<int, int>[10];
QElapsedTimer timer;
timer.start();
for (int j=0; j<1000000000; j++)
{
for (int i=0; i<9; i++)
{
m_array->operator[](i).first=i+j;
m_array->operator[](i).second=j-i;
}
}
qDebug() << "std::array<QPair<int, int>" << timer.elapsed() << "milliseconds";
timer.start();
for (int j=0; j<1000000000; j++)
{
for (int i=0; i<9; i++)
{
carr[i].first=i+j;
carr[i].second=j-i;
}
}
qDebug() << "QPair<int, int> took" << timer.elapsed() << "milliseconds";
return 0;
You will get these results:
std::array<QPair<int, int> 5670 milliseconds
QPair<int, int> took 5638 milliseconds
Mike Seymour is right, if you can use std::array you should use it.

C++ range/xrange equivalent in STL or boost?

Is there C++ equivalent for python Xrange generator in either STL or boost?
xrange basically generates incremented number with each call to ++ operator.
the constructor is like this:
xrange(first, last, increment)
was hoping to do something like this using boost for each:
foreach(int i, xrange(N))
I. am aware of the for loop. in my opinion they are too much boilerplate.
Thanks
my reasons:
my main reason for wanting to do so is because i use speech to text software, and programming loop usual way is difficult, even if using code completion. It is much more efficient to have pronounceable constructs.
many loops start with zero and increment by one, which is default for range. I find python construct more intuitive
for(int i = 0; i < N; ++i)
foreach(int i, range(N))
functions which need to take range as argument:
Function(int start, int and, int inc);
function(xrange r);
I understand differences between languages, however if a particular construct in python is very useful for me and can be implemented efficiently in C++, I do not see a reason not to use it. For each construct is foreign to C++ as well however people use it.
I put my implementation at the bottom of the page as well the example usage.
in my domain i work with multidimensional arrays, often rank 4 tensor. so I would often end up with 4 nested loops with different ranges/increments to compute normalization, indexes, etc. those are not necessarily performance loops, and I am more concerned with correctness readability and ability to modify.
for example
int function(int ifirst, int ilast, int jfirst, int jlast, ...);
versus
int function(range irange, range jrange, ...);
In the above, if different strids are needed, you have to pass more variables, modify loops, etc. eventually you end up with a mass of integers/nearly identical loops.
foreach and range solve my problem exactly. familiarity to average C++ programmer is not high on my list of concerns - problem domain is a rather obscure, there is a lot of meta-programming, SSE intrinsic, generated code.
Boost irange should really be the answer (ThxPaul Brannan)
I'm adding my answer to provide a compelling example of very valid use-cases that are not served well by manual looping:
#include <boost/range/adaptors.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/irange.hpp>
using namespace boost::adaptors;
static int mod7(int v)
{ return v % 7; }
int main()
{
std::vector<int> v;
boost::copy(
boost::irange(1,100) | transformed(mod7),
std::back_inserter(v));
boost::sort(v);
boost::copy(
v | reversed | uniqued,
std::ostream_iterator<int>(std::cout, ", "));
}
Output: 6, 5, 4, 3, 2, 1, 0,
Note how this resembles generators/comprehensions (functional languages) and enumerables (C#)
Update I just thought I'd mention the following (highly inflexible) idiom that C++11 allows:
for (int x : {1,2,3,4,5,6,7})
std::cout << x << std::endl;
of course you could marry it with irange:
for (int x : boost::irange(1,8))
std::cout << x << std::endl;
Boost has counting_iterator as far as I know, which seems to allow only incrementing in steps of 1. For full xrange functionality you might need to implement a similar iterator yourself.
All in all it could look like this (edit: added an iterator for the third overload of xrange, to play around with boost's iterator facade):
#include <iostream>
#include <boost/iterator/counting_iterator.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/foreach.hpp>
#include <boost/iterator/iterator_facade.hpp>
#include <cassert>
template <class T>
boost::iterator_range<boost::counting_iterator<T> > xrange(T to)
{
//these assertions are somewhat problematic:
//might produce warnings, if T is unsigned
assert(T() <= to);
return boost::make_iterator_range(boost::counting_iterator<T>(0), boost::counting_iterator<T>(to));
}
template <class T>
boost::iterator_range<boost::counting_iterator<T> > xrange(T from, T to)
{
assert(from <= to);
return boost::make_iterator_range(boost::counting_iterator<T>(from), boost::counting_iterator<T>(to));
}
//iterator that can do increments in steps (positive and negative)
template <class T>
class xrange_iterator:
public boost::iterator_facade<xrange_iterator<T>, const T, std::forward_iterator_tag>
{
T value, incr;
public:
xrange_iterator(T value, T incr = T()): value(value), incr(incr) {}
private:
friend class boost::iterator_core_access;
void increment() { value += incr; }
bool equal(const xrange_iterator& other) const
{
//this is probably somewhat problematic, assuming that the "end iterator"
//is always the right-hand value?
return (incr >= 0 && value >= other.value) || (incr < 0 && value <= other.value);
}
const T& dereference() const { return value; }
};
template <class T>
boost::iterator_range<xrange_iterator<T> > xrange(T from, T to, T increment)
{
assert((increment >= T() && from <= to) || (increment < T() && from >= to));
return boost::make_iterator_range(xrange_iterator<T>(from, increment), xrange_iterator<T>(to));
}
int main()
{
BOOST_FOREACH(int i, xrange(10)) {
std::cout << i << ' ';
}
BOOST_FOREACH(int i, xrange(10, 20)) {
std::cout << i << ' ';
}
std::cout << '\n';
BOOST_FOREACH(int i, xrange(0, 46, 5)) {
std::cout << i << ' ';
}
BOOST_FOREACH(int i, xrange(10, 0, -1)) {
std::cout << i << ' ';
}
}
As others are saying, I don't see this buying you much over a normal for loop.
std::iota (not yet standardized) is kinda like range. Doesn't make things any shorter or clearer than an explicit for loop, though.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <numeric>
#include <vector>
int main() {
std::vector<int> nums(5);
std::iota(nums.begin(), nums.end(), 1);
std::copy(nums.begin(), nums.end(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
return 0;
}
Compile with g++ -std=c++0x; this prints "1 2 3 4 5 \n".
well, here is what i wrote, since there does not seem to be one.
the generator does not use any internal storage besides single integer.
range object can be passed around and used in nested loops.
there is a small test case.
#include "iostream"
#include "foreach.hpp"
#include "boost/iterator/iterator_categories.hpp"
struct range {
struct iterator_type {
typedef int value_type;
typedef int difference_type;
typedef boost::single_pass_traversal_tag iterator_category;
typedef const value_type* pointer;
typedef const value_type & reference;
mutable value_type value;
const difference_type increment;
iterator_type(value_type value, difference_type increment = 0)
: value(value), increment(increment) {}
bool operator==(const iterator_type &rhs) const {
return value >= rhs.value;
}
value_type operator++() const { return value += increment; }
operator pointer() const { return &value; }
};
typedef iterator_type iterator;
typedef const iterator_type const_iterator;
int first_, last_, increment_;
range(int last) : first_(0), last_(last), increment_(1) {}
range(int first, int last, int increment = 1)
: first_(first), last_(last), increment_(increment) {}
iterator begin() const {return iterator(first_, increment_);}
iterator end() const {return iterator(last_);}
};
int test(const range & range0, const range & range1){
foreach(int i, range0) {
foreach(int j, range1) {
std::cout << i << " " << j << "\n";
}
}
}
int main() {
test(range(6), range(3, 10, 3));
}
my main reason for wanting to do so is because i use speech to text software, and programming loop usual way is difficult, even if using code completion. It is much more efficient to have pronounceable constructs.
That makes sense. But couldn't a simple macro solve this problem? #define for_i_to(N, body) for (int i = 0; i < N; ++i) { body }
or something similar. Or avoid the loop entirely and use the standard library algorithms. (std::for_each(range.begin(), rang.end(), myfunctor()) seems easier to pronounce)
many loops start with zero and increment by one, which is default for range. I find python construct more intuitive
You're wrong. The Python version is more intuitive to a Python programmer. And it may be more intuitive to a non-programmer. But you're writing C++ code. Your goal should be to make it intuitive to a C++ programmer. And C++ programmer know for-loops and they know the standard library algorithms. Stick to using those. (Or stick to writing Python)
functions which need to take range as argument:
Function(int start, int and, int inc);
function(xrange r);
Or the idiomatic C++ version:
template <typename iter_type>
void function(iter_type first, iter_type last);
In C++, ranges are represented by iterator pairs. Not integers.
If you're going to write code in a new language, respect the conventions of that language. Even if it means you have to adapt and change some habits.
If you're not willing to do that, stick with the language you know.
Trying to turn language X into language Y is always the wrong thing to do. It own't work, and it'll confuse the language X programmers who are going to maintain (or just read) your code.
Since I've started to use BOOST_FOREACH for all my iteration (probably a misguided idea, but that's another story), here's another use for aaa's range class:
std::vector<int> vec;
// ... fill the vector ...
BOOST_FOREACH(size_t idx, make_range(0, vec.size()))
{
// ... do some stuff ...
}
(yes, range should be templatized so I can use user-defined integral types with it)
And here's make_range():
template<typename T>
range<T> make_range(T const & start, T const & end)
{
return range<T>(start, end);
}
See also:
http://groups.google.com/group/boost-list/browse_thread/thread/3e11117be9639bd
and:
https://svn.boost.org/trac/boost/ticket/3469
which propose similar solutions.
And I've just found boost::integer_range; with the above example, the code would look like:
using namespace boost;
std::vector<int> vec;
// ... fill the vector ...
BOOST_FOREACH(size_t idx, make_integer_range(0, vec.size()))
{
// ... do some stuff ...
}
C++ 20's ranges header has iota_view which does this:
#include <ranges>
#include <vector>
#include <iostream>
int main()
{
for (int i : std::views::iota{1, 10})
std::cout << i << ' ';
std::cout << '\n';
for (int i : std::views::iota(1) | std::views::take(9))
std::cout << i << ' ';
}
Output:
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Since we don't really know what you actually want to use this for, I'm assuming your test case is representative. And then plain simple for loops are a whole lot simpler and more readable:
int main() {
for (int i = 0; i <= 6; ++i){
for (int j = 3; j <= 10; j += 3){
std::cout << i << " " << j << "\n";
}
}
}
A C++ programmer can walk in from the street and understand this function without having to look up complex classes elsewhere. And it's 5 lines instead of your 60. Of course if you have 400 loops exactly like these, then yes, you'd save some effort by using your range object. Or you could just wrap these two loops inside a helper function, and call that whenever you needed.
We don't really have enough information to say what's wrong with simple for loops, or what would be a suitable replacement. The loops here solve your problem with far less complexity and far fewer lines of code than your sample implementation. If this is a bad solution, tell us your requirements (as in what problem you need to solve, rather than "I want python-style loops in C++")
Keep it simple, make a stupid macro;
#define for_range(VARNAME, START, STOP, INCREMENT) \
for(int VARNAME = START, int STOP_ = STOP, INCREMENT_ = INCREMENT; VARNAME != STOP_; VARNAME += INCREMENT_)
and use as;
for_range(i, 10, 5, -1)
cout << i << endl;
You're trying to bring a python idiom into C++. That's unncessary. Use
for(int i=initVal;i<range;i+=increment)
{
/*loop body*/
}
to achieve this. In Python, the for(i in xrange(init, rng, increment)) form is necessary because Python doesn't provide a simple for loop, only a for-each type construct. So you can iterate only over a sequence or a generator. This is simply unnecessary and almost certainly bad practice in a language that provides a for(;;) syntax.
EDIT: As a completely non-recommended aside, the closest I can get to the for i xrange(first, last, inc) syntax in C++ is:
#include <cstdio>
using namespace std;
int xrange(unsigned int last, unsigned int first=0, unsigned int inc=1)
{
static int i = first;
return (i<last)?i+=inc:i=0;
}
int main()
{
while(int i=xrange(10, 0, 1))
printf("in loop at i=%d\n",i);
}
Not that while this loops the correct number of times, i varies from first+inc to last and NOT first to last-inc as in Python. Also, the function can only work reliably with unsigned values, as when i==0, the while loop will exit. Do not use this function. I only added this code here to demonstrate that something of the sort is indeed possible. There are also several other caveats and gotchas (the code won't really work for first!=0 on subsequent function calls, for example)