Say I have a nest for loop like
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
function_doing_stuff(std::make_tuple(x, y, z));
}
}
}
and would like to transform it into
MyRange range(xstart,xend,ystart,yend, zstart,zend);
for (auto point : range){
function_doing_stuff(point);
}
How would I write the MyRange class to be as efficient as the nested for loops?
The motivation for this is to be able to use std algorithms (such as transform, accumulate, etc), and to create code that is largely dimension agnostic.
By having an iterator, it would be easy to create templated functions that operate over a range of 1d, 2d or 3d points.
Code base is currently C++14.
EDIT:
Writing clear questions is hard. I'll try to clarify.
My problem is not writing an iterator, that I can do. Instead, the problem is one of performance: Is it possible to make an iterator that is as fast as the nested for loops?
With range/v3, you may do
auto xs = ranges::view::iota(xstart, xend);
auto ys = ranges::view::iota(ystart, yend);
auto zs = ranges::view::iota(zstart, zend);
for (const auto& point : ranges::view::cartesian_product(xs, ys, zs)){
function_doing_stuff(point);
}
You can introduce your own class as
class myClass {
public:
myClass (int x, int y, int z):m_x(x) , m_y(y), m_z(z){};
private:
int m_x, m_y, m_z;
}
and then initialize a std::vector<myClass> with your triple loop
std::vector<myClass> myVec;
myVec.reserve((xend-xstart)*(yend-ystart)*(zend-zstart)); // alloc memory only once;
for (int x = ystart; x < xend; x++){
for (int y = xstart; y < yend; y++){ // I assume you have a copy paste error here
for (int z = zstart; z < zend; z++){
myVec.push_back({x,y,z})
}
}
}
Finally, you can use all the nice std algorithms with the std::vector<myClass> myVec. With the syntactic sugar
using MyRange = std::vector<MyClass>;
and
MyRange makeMyRange(int xstart, int xend, int ystart, int yend, int zstart,int zend) {
MyRange myVec;
// loop from above
return MyRange;
}
you can write
const MyRange range = makeMyRange(xstart, xend, ystart, yend, zstart, zend);
for (auto point : range){
function_doing_stuff(point);
}
With the new move semantics this wont create unneeded copies. Please note, that the interface to this function is rather bad. Perhaps rather use 3 pairs of int, denoting the x,y,z interval.
Perhaps you change the names to something meaningful (e.g.myClass could be Point).
Another option, which directly transplants whatever looping code, is to use a Coroutine. This emulates yield from Python or C#.
using point = std::tuple<int, int, int>;
using coro = boost::coroutines::asymmetric_coroutine<point>;
coro::pull_type points(
[&](coro::push_type& yield){
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
yield(std::make_tuple(x, y, z));
}
}
}
});
for(auto p : points)
function_doing_stuff(p);
Since you care about performance, you should forget about combining iterators for the foreseeable future. The central problem is that compilers cannot yet untangle the mess and figure out that there are 3 independent variables in it, much less perform any loop interchange or unrolling or fusion.
If you must use ranges, use simple ones that the compiler can see through:
for (int const x : boost::irange<int>(xstart,xend))
for (int const y : boost::irange<int>(ystart,yend))
for (int const z : boost::irange<int>(zstart,zend))
function_doing_stuff(x, y, z);
Alternatively, you can actually pass your functor and the boost ranges to a template:
template <typename Func, typename Range0, typename Range1, typename Range2>
void apply_ranges (Func func, Range0 r0, Range1 r1, Range2 r2)
{
for (auto const i0 : r0)
for (auto const i1 : r1)
for (auto const i2 : r2)
func (i0, i1, i2);
}
If you truly care about performance, then you should not contort your code with complicated ranges, because they make it harder to untangle later when you want to rewrite them in AVX intrinsics.
Here's a bare-bones implementation that does not use any advanced language features or other libraries. The performance should be pretty close to the for loop version.
#include <tuple>
class MyRange {
public:
typedef std::tuple<int, int, int> valtype;
MyRange(int xstart, int xend, int ystart, int yend, int zstart, int zend): xstart(xstart), xend(xend), ystart(ystart), yend(yend), zstart(zstart), zend(zend) {
}
class iterator {
public:
iterator(MyRange &c): me(c) {
curvalue = std::make_tuple(me.xstart, me.ystart, me.zstart);
}
iterator(MyRange &c, bool end): me(c) {
curvalue = std::make_tuple(end ? me.xend : me.xstart, me.ystart, me.zstart);
}
valtype operator*() {
return curvalue;
}
iterator &operator++() {
if (++std::get<2>(curvalue) == me.zend) {
std::get<2>(curvalue) = me.zstart;
if (++std::get<1>(curvalue) == me.yend) {
std::get<1>(curvalue) = me.ystart;
++std::get<0>(curvalue);
}
}
return *this;
}
bool operator==(const iterator &other) const {
return curvalue == other.curvalue;
}
bool operator!=(const iterator &other) const {
return curvalue != other.curvalue;
}
private:
MyRange &me;
valtype curvalue;
};
iterator begin() {
return iterator(*this);
}
iterator end() {
return iterator(*this, true);
}
private:
int xstart, xend;
int ystart, yend;
int zstart, zend;
};
And an example of usage:
#include <iostream>
void display(std::tuple<int, int, int> v) {
std::cout << "(" << std::get<0>(v) << ", " << std::get<1>(v) << ", " << std::get<2>(v) << ")\n";
}
int main() {
MyRange c(1, 4, 2, 5, 7, 9);
for (auto v: c) {
display(v);
}
}
I've left off things like const iterators, possible operator+=, decrementing, post increment, etc. They've been left as an exercise for the reader.
It stores the initial values, then increments each value in turn, rolling it back and incrementing the next when it get to the end value. It's a bit like incrementing a multi-digit number.
Using boost::iterator_facade for simplicity, you can spell out all the required members.
First we have a class that iterates N-dimensional indexes as std::array<std::size_t, N>
template<std::size_t N>
class indexes_iterator : public boost::iterator_facade<indexes_iterator, std::array<std::size_t, N>>
{
public:
template<typename... Dims>
indexes_iterator(Dims... dims) : dims{ dims... }, values{} {}
private:
friend class boost::iterator_core_access;
void increment() { advance(1); }
void decrement() { advance(-1); }
void advance(int n)
{
for (std::size_t i = 0; i < N; ++i)
{
int next = ((values[i] + n) % dims[i]);
n = (n \ dims[i]) + (next < value);
values[i] = next;
}
}
std::size_t distance(indexes_iterator const & other) const
{
std::size_t result = 0, mul = 1;
for (std::size_t i = 0; i < dims; ++i)
{
result += mul * other[i] - values[i];
mul *= ends[i];
}
}
bool equal(indexes_iterator const& other) const
{
return values == other.values;
}
std::array<std::size_t, N> & dereference() const { return values; }
std::array<std::size_t, N> ends;
std::array<std::size_t, N> values;
}
Then we use that to make something similar to a boost::zip_iterator, but instead of advancing all together we add our indexes.
template <typename... Iterators>
class product_iterator : public boost::iterator_facade<product_iterator<Iterators...>, const std::tuple<decltype(*std::declval<Iterators>())...>, boost::random_access_traversal_tag>
{
using ref = std::tuple<decltype(*std::declval<Iterators>())...>;
public:
product_iterator(Iterators ... ends) : indexes() , iterators(std::make_tuple(ends...)) {}
template <typename ... Sizes>
product_iterator(Iterators ... begins, Sizes ... sizes)
: indexes(sizes...),
iterators(begins...)
{}
private:
friend class boost::iterator_core_access;
template<std::size_t... Is>
ref dereference_impl(std::index_sequence<Is...> idxs) const
{
auto offs = offset(idxs);
return { *std::get<Is>(offs)... };
}
ref dereference() const
{
return dereference_impl(std::index_sequence_for<Iterators...>{});
}
void increment() { ++indexes; }
void decrement() { --indexes; }
void advance(int n) { indexes += n; }
template<std::size_t... Is>
std::tuple<Iterators...> offset(std::index_sequence<Is...>) const
{
auto idxs = *indexes;
return { (std::get<Is>(iterators) + std::get<Is>(idxs))... };
}
bool equal(product_iterator const & other) const
{
return offset(std::index_sequence_for<Iterators...>{})
== other.offset(std::index_sequence_for<Iterators...>{});
}
indexes_iterator<sizeof...(Iterators)> indexes;
std::tuple<Iterators...> iterators;
};
Then we wrap it up in a boost::iterator_range
template <typename... Ranges>
auto make_product_range(Ranges&&... rngs)
{
product_iterator<decltype(begin(rngs))...> b(begin(rngs)..., std::distance(std::begin(rngs), std::end(rngs))...);
product_iterator<decltype(begin(rngs))...> e(end(rngs)...);
return boost::iterator_range<product_iterator<decltype(begin(rngs))...>>(b, e);
}
int main()
{
using ranges::view::iota;
for (auto p : make_product_range(iota(xstart, xend), iota(ystart, yend), iota(zstart, zend)))
// ...
return 0;
}
See it on godbolt
Just a very simplified version that will be as efficient as a for loop:
#include <tuple>
struct iterator{
int x;
int x_start;
int x_end;
int y;
int y_start;
int y_end;
int z;
constexpr auto
operator*() const{
return std::tuple{x,y,z};
}
constexpr iterator&
operator++ [[gnu::always_inline]](){
++x;
if (x==x_end){
x=x_start;
++y;
if (y==y_end) {
++z;
y=y_start;
}
}
return *this;
}
constexpr iterator
operator++(int){
auto old=*this;
operator++();
return old;
}
};
struct sentinel{
int z_end;
friend constexpr bool
operator == (const iterator& x,const sentinel& s){
return x.z==s.z_end;
}
friend constexpr bool
operator == (const sentinel& a,const iterator& x){
return x==a;
}
friend constexpr bool
operator != (const iterator& x,const sentinel& a){
return !(x==a);
}
friend constexpr bool
operator != (const sentinel& a,const iterator& x){
return !(x==a);
}
};
struct range{
iterator start;
sentinel finish;
constexpr auto
begin() const{
return start;
}
constexpr auto
end()const{
return finish;
}
};
void func(int,int,int);
void test(const range& r){
for(auto [x,y,z]: r)
func(x,y,z);
}
void test(int x_start,int x_end,int y_start,int y_end,int z_start,int z_end){
for(int z=z_start;z<z_end;++z)
for(int y=y_start;y<y_end;++y)
for(int x=x_start;x<x_end;++x)
func(x,y,z);
}
The advantage over 1201ProgramAlarm answer is the faster test performed at each iteration thanks to the use of a sentinel.
Related
I wrote an expression template to sum up to three vectors together. However, as you can see in my code, this doesn't scale very well because for every additional sum operand I have to add another nested template expression. Is there a way to refactor this code to handle a (theoretically) infinite amount of additions?
template<class A>
struct Expr {
operator const A&() const {
return *static_cast<const A*>(this);
}
};
template<class A, class B>
class Add : public Expr<Add<A,B>> {
private:
const A &a_;
const B &b_;
public:
Add(const A &a, const B &b) : a_(a), b_(b) { }
double operator[] (int i) const {
return a_[i] + b_[i];
}
};
class Vector : public Expr<Vector> {
private:
double *data_;
int n_;
public:
Vector(int n, double w = 0.0) : n_(n) {
data_ = new double[n];
for(int i = 0; i < n; ++i) {
data_[i] = w;
}
}
double operator[] (int i) const {
return data_[i];
}
friend Expr<Add<Vector, Vector>> operator+(Vector &a, Vector &b) {
return Add<Vector, Vector>(a, b);
}
friend Expr<Add<Add<Vector, Vector>, Vector>> operator+(const Add<Vector, Vector> &add, const Vector &b) {
return Add<Add<Vector, Vector>, Vector>(add, b);
}
template<class A>
void operator= (const Expr<A> &a) {
const A &a_(a);
for(int i = 0; i < n_; ++i) {
data_[i] = a_[i];
}
}
};
int main() {
constexpr int size = 5;
Vector a(size, 1.0), b(size, 2.0), c(size);
c = a + b + a;
return 0;
}
This was working for me:
class Vector : public Expr<Vector> {
private:
double *data_;
int n_;
public:
Vector(int n, double w = 0.0) : n_(n) {
data_ = new double[n];
for(int i = 0; i < n; ++i) {
data_[i] = w;
}
}
double operator[] (int i) const {
return data_[i];
}
template<class A, class B>
friend Add<A, B> operator+(const Expr<A> &a, const Expr<B> &b) {
return Add<A, B>(a, b);
}
template<class A>
void operator= (const Expr<A> &a) {
const A &a_(a);
for(int i = 0; i < n_; ++i) {
data_[i] = a_[i];
}
}
};
I'm no template wizard (and I'm not up-to-date with the latest possibilities), but you can at least make a function that added a variadic amount of vectors, using something like described in the code below.
You could then buildup you expressiontree like you did before and call this function in you evaluation (operator=) function.
edit: updated the code, based on this solution (credits there)
#include <vector>
#include <algorithm>
template<typename T>
using Vec = std::vector<T>;
template<typename T, typename...Args>
auto AddVector_impl(Vec<Args> const & ... vecs){
auto its = std::tuple(cbegin(vecs)...);
auto add_inc = [](auto&... iters){
return ((*iters++) + ... );
};
auto end_check = [&](auto&...iters){
return ((iters != cend(vecs)) && ...);
};
Vec<T> res;
for(auto it = back_inserter(res); apply(end_check,its);){
*it++ = apply(add_inc,its);
}
return res;
}
template<typename T, typename... Args>
Vec<T> AddVector(Vec<T> const& vt, Vec<Args> const&... vargs){
return AddVector_impl<T>(vt,vargs...);
}
#include <iostream>
int main() {
constexpr auto size = 5;
Vec<double> a(size, 1.0), b(size, 2.0);
auto c = AddVector(a, b, a);
for(auto const& el : c){
std::cout << el << " ";
}
}
outputs:
4 4 4 4 4
i have to filter a container and the copy of each item is expensive. So i came up with this C++ code .. maybe there is better concept that i miss. Please comment. Also important that operations like counting and empty are also fast.
The sample below will create a vector with some items and return a filtered copy and doing some boolean operations.
#include <vector>
#include <functional>
template <class X> struct filterChainT {
/// the list here
const X & list;
/// take store type from access operator
typedef decltype(list[0]) storeType;
/// the test functions
std::function<bool(storeType &) > tests[10];
/// counting the test (don´t use array here)
size_t countTest;
/// ctor with given list
filterChainT(const X & list):list(list),countTest(0) { }
/// add a rule here
filterChainT<X> & apply(const std::function<bool(storeType &) > & fnc)
{
tests[countTest++] = fnc;
return *this;
}
/// downcast to container (will return the filter copy)
operator X() const
{
X ret;
eval([&](const storeType & hit) { ret.push_back(hit); return false; });
return ret;
}
/// just count item after filter
int count() const
{
int count = 0;
eval([&](const storeType hit) { count++; return false; });
return count;
}
bool operator==(int number) const { return count() == number; }
bool operator>(int number) const
{
int count = 0;
return eval([&](const storeType & hit) { return ++count>number; });
}
bool operator<(int number) const
{
int count = 0;
return !eval([&](const storeType & hit) { return ++count>=number; });
}
//// the magic eval functions. Return true if the fnc object abort the loop
template <class FNC> bool eval(const FNC fnc) const
{
for (auto i : list)
{
for (size_t t = 0; t < countTest; t++)
{
if (!tests[t](i))
goto next;
}
if (fnc(i))
return true;
next:;
}
return false;
}
};
struct myFilter : public filterChainT < std::vector<int>> {
myFilter(const std::vector<int> & in) :filterChainT<std::vector<int>>(in){ };
// filter items i%2==0
myFilter & mod()
{
apply([](const int & i) { return i % 2; });
return *this;
}
// filter items smaller than x
myFilter & smaller(int x)
{
apply([=](const int & i) { return i<x; });
return *this;
}
};
int main()
{
std::vector<int> vec;
vec.push_back(1);
vec.push_back(3);
vec.push_back(10);
// apply filter and return result use two filters „mod“ and „smaller“
std::vector<int> ret = myFilter(vec).mod().smaller(5);
// just see if they are less then 2 items in list same filter than above
bool res1 = myFilter(vec).mod().smaller(5) < 2;
// some custom without the helper -> filter all items larger than 3 and put in list
std::vector<int> result=filterChainT < std::vector<int>>(vec).apply([](const int & p) { return p > 3; });
}
regards
Markus
the hint with "expression templates" works very well. Two times faster than my posting. Now the code looks like this. Thank you very much for pointing to the right place.
I use the same concept as the WIKI entry by adding a plus operation to create the filter chain and a final function to reduce the given list with this filter.
template <class T> struct filterNodeBaseT {
template <class X> bool apply(const X& x) const { return static_cast<T const & >(*this).apply(x); }
template <class LST> LST filter(const LST & input) const
{ LST ret;
for (auto i : input)
{ if (this->apply(i))
ret.push_back(i);
}
return ret;
}
template <class LST> int count(const LST & input) const
{ int ret = 0;
for (auto i : input)
{ if (this->apply(i))
ret++;
}
return ret;
}
};
template <class T1, class T2> struct filterCombine : public filterNodeBaseT<filterCombine<T1, T2> > {
const T1 & t1;
const T2 & t2;
filterCombine(const T1 & t1, const T2 & t2) :t1(t1), t2(t2) { }
template <class X> bool apply(const X & x) const { return t1.apply(x) && t2.apply(x); }
};
template <class T1,class T2> filterCombine<T1,T2> operator + (const T1 & t1,const T2 & t2)
{ return filterCombine<T1,T2>(t1,t2); }
struct filterNodeSmaller : public filterNodeBaseT<filterNodeSmaller> {
int limit;
filterNodeSmaller(int limit) :limit(limit) {};
bool apply(const int & x) const { return x < limit; }
};
struct filterNodeLarger: public filterNodeBaseT<filterNodeLarger> {
int limit;
filterNodeLarger(int limit) :limit(limit) {};
bool apply(const int & x) const { return x > limit; }
};
struct filterNodeMod : public filterNodeBaseT<filterNodeMod> {
bool apply(const int & x) const { return x % 2; }
};
struct filterStrlenLarger : public filterNodeBaseT<filterStrlenLarger> {
int limit;
filterStrlenLarger(int limit) :limit(limit) { };
bool apply(const std::string & s) const { return s.length() > limit; }
};
struct filterStrGreater : public filterNodeBaseT<filterStrGreater> {
std::string cmp;
filterStrGreater(const std::string & cmp) :cmp(cmp) { };
bool apply(const std::string & s) const { return s>cmp; }
};
_declspec(noinline) void nodeTest()
{
std::vector<int> intList;
intList.push_back(1); intList.push_back(3); intList.push_back(4);
int count= (filterNodeMod() + filterNodeSmaller(5)+ filterNodeLarger(1)).count(intList);
std::vector<int> resList1= (filterNodeMod() + filterNodeSmaller(5)+filterNodeLarger(1)).filter(intList);
printf("%d\n", count);
std::vector<std::string> strList;
strList.push_back("Hello");
strList.push_back("World");
strList.push_back("!");
count = (filterStrlenLarger(3)+filterStrGreater("Hello")).count(strList);
std::vector<std::string> resList2= (filterStrlenLarger(3) + filterStrGreater("Hello")).filter(strList);
printf("%d\n", count);
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently I came across an interesting article what's called the Kronecker-Produkt. At the same time I'm working on my neural network library.
So that my algorithm works, I need a tensor class, where I can get the product of two tensor's with an overloaded * operator.
Consider the following example/questions:
How to efficiently construct/store the nested matrices?
How to perform the product of two tensor's?
How to visualize tensor c as simply as possible?
My class 3 tensor which currently only supports 3 dimensions:
#pragma once
#include <iostream>
#include <sstream>
#include <random>
#include <cmath>
#include <iomanip>
template<typename T>
class tensor {
public:
const unsigned int x, y, z, s;
tensor(unsigned int x, unsigned int y, unsigned int z, T val) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
for (unsigned int i = 0; i < s; i++) p_data[i] = val;
}
tensor(const tensor<T> & other) : x(other.x), y(other.y), z(other.z), s(other.s) {
p_data = new T[s];
memcpy(p_data, other.get_data(), s * sizeof(T));
}
~tensor() {
delete[] p_data;
p_data = nullptr;
}
T * get_data() {
return p_data;
}
static tensor<T> * random(unsigned int x, unsigned int y, unsigned int z, T val, T min, T max) {
tensor<T> * p_tensor = new tensor<T>(x, y, z, val);
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<T> dist(min, max);
for (unsigned int i = 0; i < p_tensor->s; i++) {
T rnd = dist(mt);
while (abs(rnd) < 0.001) rnd = dist(mt);
p_tensor->get_data()[i] = rnd;
}
return p_tensor;
}
static tensor<T> * from(std::vector<T> * p_data, T val) {
tensor<T> * p_tensor = new tensor<T>(p_data->size(), 1, 1, val);
for (unsigned int i = 0; i < p_tensor->get_x(); i++) p_tensor->set_data(i + 0 * p_tensor->get_x() * + 0 * p_tensor->get_x() * p_tensor->get_y(), p_data->at(i));
return p_tensor;
}
friend std::ostream & operator <<(std::ostream & stream, tensor<T> & tensor) {
stream << "(" << tensor.x << "," << tensor.y << "," << tensor.z << ") Tensor\n";
for (unsigned int i = 0; i < tensor.x; i++) {
for (unsigned int k = 0; k < tensor.z; k++) {
stream << "[";
for (unsigned int j = 0; j < tensor.y; j++) {
stream << std::setw(5) << roundf(tensor(i, j, k) * 1000) / 1000;
if (j + 1 < tensor.y) stream << ",";
}
stream << "]";
}
stream << std::endl;
}
return stream;
}
tensor<T> & operator +(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator -(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator *(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
T & operator ()(unsigned int i, unsigned int j, unsigned int k) {
return p_data[i + (j * x) + (k * x * y)];
}
T & operator ()(unsigned int i) {
return p_data[i];
}
private:
T * p_data = nullptr;
};
int main() {
tensor<double> * p_tensor_input = tensor<double>::random(6, 2, 3, 0.0, 0.0, 1.0);
tensor<double> * p_tensor_weight = tensor<double>::random(2, 6, 3, 0.0, 0.0, 1.0);
std::cout << *p_tensor_input << std::endl;
std::cout << *p_tensor_weight << std::endl;
tensor<double> p_tensor_output = *p_tensor_input + *p_tensor_weight;
return 0;
}
Your first step is #2 -- and get it correct.
After that, optimize.
Start with a container C<T>.
Define some operations on it. wrap(T) returns a C<T> containing that T. map takes a C<T> and a function on T U f(T) and returns C<U>. flatten takes a C<C<U>> and returns a C<U>.
Define scale( T, C<T> ) which takes a T and a C<T> and returns a C<T> with the elements scaled. Aka, scalar multiplication.
template<class T>
C<T> scale( T scalar, C<T> container ) {
return map( container, [&](T t){ return t*scalar; } );
}
Then we have:
template<class T>
C<T> tensor( C<T> lhs, C<T> rhs ) {
return flatten( map( lhs, [&](T t) { return scale( t, rhs ); } ) );
}
is your tensor product. And yes, that can be your actual code. I would tweak it a bit for efficiency.
(Note I used different terms, but I'm basically describing monadic operations using different words.)
After you have this, test, optimize, and iterate.
As for 3, the result of tensor products get large and complex, there is no simple visualization for a large tensor.
Oh, and keep things simple and store data in a std::vector to start.
Here are some tricks for efficient vectors i learned in class, but they should be equally good for a tensor.
Define an empty constructor and assignment operator. For example
tensor(unsigned int x, unsigned int y, unsigned int z) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
}
tensor& operator=( tensor const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
template <typename T>
tensor& operator=( T const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
Now we can implement things like addition and scaling with deferred evaluation. For example:
template<typename T1, typename T2>
class tensor_sum {
//add value_type to base tensor class for this to work
typedef decltype( typename T1::value_type() + typename T2::value_type() ) value_type ;
//also add function to get size of tensor
value_type operator()( int i, int j, int k ) const {
return t1_(i,j,k) + v2_(i,j,k) ;
}
value_type operator()( int i ) const {
return t1_(i) + v2_(i) ;
}
private:
T1 const& t1_;
T2 const& t2_;
}
template <typename T1, typename T2>
tensor_sum<T1,T2> operator+(T1 const& t1, T2 const& t2 ) {
return vector_sum<T1,T2>(t1,t2) ;
}
This tensor_sum behaves exactly like any normal tensor, except that we don't have to allocate memory to store the result. So we can do something like this:
tensor<double> t0(...);
tensor<double> t1(...);
tensor<double> t2(...);
tensor<double> result(...); //define result to be empty, we will fill it later
result = t0 + t1 + 5.0*t2;
The compiler should optimize this to be just one loop, without storing intermediate results or modifying the original tensors. You can do the same thing for scaling and the kronecker product. Depending on what you want to do with the tensors, this can be a big advantage. But be careful, this isn't always the best option.
When implementing the kronecker product you should be careful of the of the ordering of your loop, try to go through the tensors in the order they are stored for cache efficiency.
Is there a nicer way to generate a list of points like than this? Libraries wise I'm open to any Eigen based method.
auto it = voxels.begin();
for(auto i = -180; i < 90; i++) {
for(auto j = -80; j < 70; j++) {
for(auto k = 20; k < 460; k++) {
*it = (Point3(i,j,k));
it++;
}
}
}
There's an immediate way to improve performance, by reserving enough space in the vector before you fill it with values.
There are many 'nicer' ways of doing it depending on what you think is nice.
Here's one way:
std::vector<Point3> populate()
{
// (arguable) maintainability benefit
constexpr auto I = axis_limits(-180, 90);
constexpr auto J = axis_limits(-80, 70);
constexpr auto K = axis_limits(20, 460);
// pre-reserve the space
std::vector<Point3> voxels;
voxels.reserve(volume(I, J, K));
// although it looks like it might be more work for the compiler, it gets optimised
// there is no loss of performance
for(i : I)
for(j : J)
for(k : J)
voxels.emplace_back(i, j, k);
return voxels;
}
Which will rely on the following infrastructure code:
struct Point3 {
Point3(int, int, int) {}
};
struct int_generator {
int_generator(int v)
: _v(v)
{}
int operator*() const {
return _v;
}
int_generator& operator++() {
++_v;
return *this;
}
bool operator!=(const int_generator& rhs) const {
return _v != rhs._v;
}
private:
int _v;
};
struct axis_limits : std::tuple<int, int>
{
using std::tuple<int, int>::tuple;
int_generator begin() const {
return std::get<0>(*this);
}
int_generator end() const {
return std::get<1>(*this);
}
};
constexpr int lower(const axis_limits& t)
{
return std::get<0>(t);
}
constexpr int upper(const axis_limits& t)
{
return std::get<1>(t);
}
int_generator begin(const axis_limits& t)
{
return std::get<0>(t);
}
int_generator end(const axis_limits& t)
{
return std::get<1>(t);
}
constexpr int volume(const axis_limits& x, const axis_limits& y, const axis_limits& z)
{
return (upper(x) - lower(x))
* (upper(y) - lower(y))
* (upper(z) - lower(z));
}
I would like to know if it is possible to create an actual functor object from a lambda expression. I don't think so, but if not, why?
To illustrate, given the code below, which sorts points using various policies for x and y coordinates:
#include <vector>
#include <functional>
#include <algorithm>
#include <iostream>
struct Point
{
Point(int x, int y) : x(x), y(y) {}
int x, y;
};
template <class XOrder, class YOrder>
struct SortXY :
std::binary_function<const Point&, const Point&, bool>
{
bool operator()(const Point& lhs, const Point& rhs) const
{
if (XOrder()(lhs.x, rhs.x))
return true;
else if (XOrder()(rhs.x, lhs.x))
return false;
else
return YOrder()(lhs.y, rhs.y);
}
};
struct Ascending { bool operator()(int l, int r) const { return l<r; } };
struct Descending { bool operator()(int l, int r) const { return l>r; } };
int main()
{
// fill vector with data
std::vector<Point> pts;
pts.push_back(Point(10, 20));
pts.push_back(Point(20, 5));
pts.push_back(Point( 5, 0));
pts.push_back(Point(10, 30));
// sort array
std::sort(pts.begin(), pts.end(), SortXY<Descending, Ascending>());
// dump content
std::for_each(pts.begin(), pts.end(),
[](const Point& p)
{
std::cout << p.x << "," << p.y << "\n";
});
}
The expression std::sort(pts.begin(), pts.end(), SortXY<Descending, Ascending>()); sorts according to descending x values, and then to ascending y values. It's easily understandable, and I'm not sure I really want to make use of lambda expressions here.
But if I wanted to replace Ascending / Descending by lambda expressions, how would you do it? The following isn't valid:
std::sort(pts.begin(), pts.end(), SortXY<
[](int l, int r) { return l>r; },
[](int l, int r) { return l<r; }
>());
This problem arises because SortXY only takes types, whereas lambdas are objects. You need to re-write it so that it takes objects, not just types. This is basic use of functional objects- see how std::for_each doesn't take a type, it takes an object.
I have posted a similar question w.r.t. lambda functors within classes.
Check this out, perhaps it helps:
Lambda expression as member functors in a class
I had a similar problem: It was required to provide in some cases a "raw"-function pointer and in other a functor. So I came up with a "workaround" like this:
template<class T>
class Selector
{
public:
Selector(int (*theSelector)(T& l, T& r))
: selector(theSelector) {}
virtual int operator()(T& l, T& r) {
return selector(l, r);
}
int (*getRawSelector() const)(T&, T&) {
return this->selector;
}
private:
int(*selector)(T& l, T& r);
};
Assuming you have two very simple functions taking --- as described --- either a functor or a raw function pointer like this:
int
findMinWithFunctor(int* array, int size, Selector<int> selector)
{
if (array && size > 0) {
int min = array[0];
for (int i = 0; i < size; i++) {
if (selector(array[i], min) < 0) {
min = array[i];
}
}
return min;
}
return -1;
}
int
findMinWithFunctionPointer(int* array, int size, int(*selector)(int&, int&))
{
if (array && size > 0) {
int min = array[0];
for (int i = 0; i < size; i++) {
if (selector(array[i], min) < 0) {
min = array[i];
}
}
return min;
}
return -1;
}
Then you would call this functions like this:
int numbers[3] = { 4, 2, 99 };
cout << "The min with functor is:" << findMinWithFunctor(numbers, 3, Selector<int>([](int& l, int& r) -> int {return (l > r ? 1 : (r > l ? -1 : 0)); })) << endl;
// or with the plain version
cout << "The min with raw fn-pointer is:" << findMinWithFunctionPointer(numbers, 3, Selector<int>([](int& l, int& r) -> int {return (l > r ? 1 : (r > l ? -1 : 0)); }).getRawSelector()) << endl;
Of course in this example there is no real benefit passing the int's as reference...it's just an example :-)
Improvements:
You can also modify the Selector class to be more concise like this:
template<class T>
class Selector
{
public:
typedef int(*selector_fn)(T& l, T& r);
Selector(selector_fn theSelector)
: selector(theSelector) {}
virtual int operator()(T& l, T& r) {
return selector(l, r);
}
selector_fn getRawSelector() {
return this->selector;
}
private:
selector_fn selector;
};
Here we are taking advantage of a simple typedef in order to define the function pointer once and use only it's name rather then writing the declaration over and over.