Creating functor from lambda expression - c++

I would like to know if it is possible to create an actual functor object from a lambda expression. I don't think so, but if not, why?
To illustrate, given the code below, which sorts points using various policies for x and y coordinates:
#include <vector>
#include <functional>
#include <algorithm>
#include <iostream>
struct Point
{
Point(int x, int y) : x(x), y(y) {}
int x, y;
};
template <class XOrder, class YOrder>
struct SortXY :
std::binary_function<const Point&, const Point&, bool>
{
bool operator()(const Point& lhs, const Point& rhs) const
{
if (XOrder()(lhs.x, rhs.x))
return true;
else if (XOrder()(rhs.x, lhs.x))
return false;
else
return YOrder()(lhs.y, rhs.y);
}
};
struct Ascending { bool operator()(int l, int r) const { return l<r; } };
struct Descending { bool operator()(int l, int r) const { return l>r; } };
int main()
{
// fill vector with data
std::vector<Point> pts;
pts.push_back(Point(10, 20));
pts.push_back(Point(20, 5));
pts.push_back(Point( 5, 0));
pts.push_back(Point(10, 30));
// sort array
std::sort(pts.begin(), pts.end(), SortXY<Descending, Ascending>());
// dump content
std::for_each(pts.begin(), pts.end(),
[](const Point& p)
{
std::cout << p.x << "," << p.y << "\n";
});
}
The expression std::sort(pts.begin(), pts.end(), SortXY<Descending, Ascending>()); sorts according to descending x values, and then to ascending y values. It's easily understandable, and I'm not sure I really want to make use of lambda expressions here.
But if I wanted to replace Ascending / Descending by lambda expressions, how would you do it? The following isn't valid:
std::sort(pts.begin(), pts.end(), SortXY<
[](int l, int r) { return l>r; },
[](int l, int r) { return l<r; }
>());

This problem arises because SortXY only takes types, whereas lambdas are objects. You need to re-write it so that it takes objects, not just types. This is basic use of functional objects- see how std::for_each doesn't take a type, it takes an object.

I have posted a similar question w.r.t. lambda functors within classes.
Check this out, perhaps it helps:
Lambda expression as member functors in a class

I had a similar problem: It was required to provide in some cases a "raw"-function pointer and in other a functor. So I came up with a "workaround" like this:
template<class T>
class Selector
{
public:
Selector(int (*theSelector)(T& l, T& r))
: selector(theSelector) {}
virtual int operator()(T& l, T& r) {
return selector(l, r);
}
int (*getRawSelector() const)(T&, T&) {
return this->selector;
}
private:
int(*selector)(T& l, T& r);
};
Assuming you have two very simple functions taking --- as described --- either a functor or a raw function pointer like this:
int
findMinWithFunctor(int* array, int size, Selector<int> selector)
{
if (array && size > 0) {
int min = array[0];
for (int i = 0; i < size; i++) {
if (selector(array[i], min) < 0) {
min = array[i];
}
}
return min;
}
return -1;
}
int
findMinWithFunctionPointer(int* array, int size, int(*selector)(int&, int&))
{
if (array && size > 0) {
int min = array[0];
for (int i = 0; i < size; i++) {
if (selector(array[i], min) < 0) {
min = array[i];
}
}
return min;
}
return -1;
}
Then you would call this functions like this:
int numbers[3] = { 4, 2, 99 };
cout << "The min with functor is:" << findMinWithFunctor(numbers, 3, Selector<int>([](int& l, int& r) -> int {return (l > r ? 1 : (r > l ? -1 : 0)); })) << endl;
// or with the plain version
cout << "The min with raw fn-pointer is:" << findMinWithFunctionPointer(numbers, 3, Selector<int>([](int& l, int& r) -> int {return (l > r ? 1 : (r > l ? -1 : 0)); }).getRawSelector()) << endl;
Of course in this example there is no real benefit passing the int's as reference...it's just an example :-)
Improvements:
You can also modify the Selector class to be more concise like this:
template<class T>
class Selector
{
public:
typedef int(*selector_fn)(T& l, T& r);
Selector(selector_fn theSelector)
: selector(theSelector) {}
virtual int operator()(T& l, T& r) {
return selector(l, r);
}
selector_fn getRawSelector() {
return this->selector;
}
private:
selector_fn selector;
};
Here we are taking advantage of a simple typedef in order to define the function pointer once and use only it's name rather then writing the declaration over and over.

Related

C++: How to implement polymorphic object creator to populate a table

I have a table widget object that can be resized with set_num_col(int) and set_num_row(int).
A call to each of these functions will call a resize_table() function to populate the widget with table_cells objects.
However, I have two polymorphic types of cells: table_cell_default and table_cell_custom, derived from the same base class.
Upon creation of the table, how can I populate it with mixed types of cells, considering that the client knows which cells will be custom and which will be of default type?
I thought about adding a map in the table class, and populate this map with for example set_custom_cells( vector<index>() ), with the ij indices of the cells as keys and the corresponding lamda creator returning the correct type as value, but this map will only be used once to populate the table and never again. Is there a more dynamic way, using a lambda as a table_cell creator to fill that widget in a better way?
Thanks
Here's an example of using a factory lambda to produce the initial cells in the Table's constructor. Refer to main function where the lambda is located, and the Table constructor for how it is used.
I do not know what your code looks like, so I just wrap each cell into an object_t and put that into the Table.
#include <cstdint>
#include <functional>
#include <iostream>
#include <memory>
#include <sstream>
#include <string>
#include <vector>
// Not idempotent. Should be last include.
#include <cassert>
using std::cout;
using std::function;
using std::make_shared;
using std::move;
using std::ostream;
using std::shared_ptr;
using std::size_t;
using std::string;
using std::stringstream;
using std::vector;
namespace {
template <typename T>
void draw_right_justified(T const& x, ostream& out, size_t width) {
stringstream ss;
ss << x;
string s = ss.str();
size_t pad_width = s.length() < width ? width - s.length() : 1;
out << string(pad_width, ' ') << s;
}
class object_t {
public:
template <typename T>
object_t(T x) : self_{make_shared<model<T>>(move(x))}
{ }
friend void draw_right_justified(object_t const& x, ostream& out, size_t width) {
x.self_->draw_right_justified_thunk(out, width);
}
private:
struct concept_t {
virtual ~concept_t() = default;
virtual void draw_right_justified_thunk(ostream&, size_t) const = 0;
};
template <typename T>
struct model : concept_t {
model(T x) : data_{move(x)} { }
void draw_right_justified_thunk(ostream& out, size_t width) const {
draw_right_justified(data_, out, width);
}
T data_;
};
shared_ptr<const concept_t> self_;
};
class Table {
size_t col;
size_t row;
// data will be constructed with col_ * row_ entries.
vector<object_t> data;
public:
using object_factory = function<object_t(size_t, size_t)>;
Table(size_t col_, size_t row_, object_factory& fn);
auto operator()(size_t x, size_t y) const -> object_t;
void display(ostream& out) const;
};
Table::Table(size_t col_, size_t row_, Table::object_factory& fn)
: col{col_}, row{row_}
{
data.reserve(col * row);
for (size_t y = 0; y < row; ++y) {
for (size_t x = 0; x < row; ++x) {
data.emplace_back(fn(x, y));
}
}
}
object_t Table::operator()(size_t x, size_t y) const {
assert(x < col);
assert(y < row);
return data[y * row + x];
}
void Table::display(ostream& out) const {
auto const& self = *this;
for (size_t y = 0; y < row; ++y) {
for (size_t x = 0; x < col; ++x) {
draw_right_justified(self(x, y), out, 8);
}
out << "\n";
}
}
struct empty_t {};
void draw_right_justified(empty_t, ostream& out, size_t width) {
string s = "(empty)";
size_t pad_width = s.length() < width ? width - s.length() : 1;
out << string(pad_width, ' ') << s;
}
struct bunny { string name; };
void draw_right_justified(bunny const& bunny, ostream& out, size_t width) {
auto const& s = bunny.name;
size_t pad_width = s.length() < width ? width - s.length() : 1;
out << string(pad_width, ' ') << s;
}
} // anon
int main() {
Table::object_factory maker = [](size_t x, size_t y) {
if (x == 0 && y == 1) return object_t{bunny{"Bugs"}};
if (x == 0 && y == 0) return object_t{empty_t{}};
if (x == y) return object_t{string("EQUAL")};
return object_t{x * y};
};
auto table = Table{3, 5, maker};
table.display(cout);
}
Output...
(empty) 0 0
Bugs EQUAL 2
0 2 EQUAL
0 3 6
0 4 8

Combining multiple for loops into single iterator

Say I have a nest for loop like
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
function_doing_stuff(std::make_tuple(x, y, z));
}
}
}
and would like to transform it into
MyRange range(xstart,xend,ystart,yend, zstart,zend);
for (auto point : range){
function_doing_stuff(point);
}
How would I write the MyRange class to be as efficient as the nested for loops?
The motivation for this is to be able to use std algorithms (such as transform, accumulate, etc), and to create code that is largely dimension agnostic.
By having an iterator, it would be easy to create templated functions that operate over a range of 1d, 2d or 3d points.
Code base is currently C++14.
EDIT:
Writing clear questions is hard. I'll try to clarify.
My problem is not writing an iterator, that I can do. Instead, the problem is one of performance: Is it possible to make an iterator that is as fast as the nested for loops?
With range/v3, you may do
auto xs = ranges::view::iota(xstart, xend);
auto ys = ranges::view::iota(ystart, yend);
auto zs = ranges::view::iota(zstart, zend);
for (const auto& point : ranges::view::cartesian_product(xs, ys, zs)){
function_doing_stuff(point);
}
You can introduce your own class as
class myClass {
public:
myClass (int x, int y, int z):m_x(x) , m_y(y), m_z(z){};
private:
int m_x, m_y, m_z;
}
and then initialize a std::vector<myClass> with your triple loop
std::vector<myClass> myVec;
myVec.reserve((xend-xstart)*(yend-ystart)*(zend-zstart)); // alloc memory only once;
for (int x = ystart; x < xend; x++){
for (int y = xstart; y < yend; y++){ // I assume you have a copy paste error here
for (int z = zstart; z < zend; z++){
myVec.push_back({x,y,z})
}
}
}
Finally, you can use all the nice std algorithms with the std::vector<myClass> myVec. With the syntactic sugar
using MyRange = std::vector<MyClass>;
and
MyRange makeMyRange(int xstart, int xend, int ystart, int yend, int zstart,int zend) {
MyRange myVec;
// loop from above
return MyRange;
}
you can write
const MyRange range = makeMyRange(xstart, xend, ystart, yend, zstart, zend);
for (auto point : range){
function_doing_stuff(point);
}
With the new move semantics this wont create unneeded copies. Please note, that the interface to this function is rather bad. Perhaps rather use 3 pairs of int, denoting the x,y,z interval.
Perhaps you change the names to something meaningful (e.g.myClass could be Point).
Another option, which directly transplants whatever looping code, is to use a Coroutine. This emulates yield from Python or C#.
using point = std::tuple<int, int, int>;
using coro = boost::coroutines::asymmetric_coroutine<point>;
coro::pull_type points(
[&](coro::push_type& yield){
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
yield(std::make_tuple(x, y, z));
}
}
}
});
for(auto p : points)
function_doing_stuff(p);
Since you care about performance, you should forget about combining iterators for the foreseeable future. The central problem is that compilers cannot yet untangle the mess and figure out that there are 3 independent variables in it, much less perform any loop interchange or unrolling or fusion.
If you must use ranges, use simple ones that the compiler can see through:
for (int const x : boost::irange<int>(xstart,xend))
for (int const y : boost::irange<int>(ystart,yend))
for (int const z : boost::irange<int>(zstart,zend))
function_doing_stuff(x, y, z);
Alternatively, you can actually pass your functor and the boost ranges to a template:
template <typename Func, typename Range0, typename Range1, typename Range2>
void apply_ranges (Func func, Range0 r0, Range1 r1, Range2 r2)
{
for (auto const i0 : r0)
for (auto const i1 : r1)
for (auto const i2 : r2)
func (i0, i1, i2);
}
If you truly care about performance, then you should not contort your code with complicated ranges, because they make it harder to untangle later when you want to rewrite them in AVX intrinsics.
Here's a bare-bones implementation that does not use any advanced language features or other libraries. The performance should be pretty close to the for loop version.
#include <tuple>
class MyRange {
public:
typedef std::tuple<int, int, int> valtype;
MyRange(int xstart, int xend, int ystart, int yend, int zstart, int zend): xstart(xstart), xend(xend), ystart(ystart), yend(yend), zstart(zstart), zend(zend) {
}
class iterator {
public:
iterator(MyRange &c): me(c) {
curvalue = std::make_tuple(me.xstart, me.ystart, me.zstart);
}
iterator(MyRange &c, bool end): me(c) {
curvalue = std::make_tuple(end ? me.xend : me.xstart, me.ystart, me.zstart);
}
valtype operator*() {
return curvalue;
}
iterator &operator++() {
if (++std::get<2>(curvalue) == me.zend) {
std::get<2>(curvalue) = me.zstart;
if (++std::get<1>(curvalue) == me.yend) {
std::get<1>(curvalue) = me.ystart;
++std::get<0>(curvalue);
}
}
return *this;
}
bool operator==(const iterator &other) const {
return curvalue == other.curvalue;
}
bool operator!=(const iterator &other) const {
return curvalue != other.curvalue;
}
private:
MyRange &me;
valtype curvalue;
};
iterator begin() {
return iterator(*this);
}
iterator end() {
return iterator(*this, true);
}
private:
int xstart, xend;
int ystart, yend;
int zstart, zend;
};
And an example of usage:
#include <iostream>
void display(std::tuple<int, int, int> v) {
std::cout << "(" << std::get<0>(v) << ", " << std::get<1>(v) << ", " << std::get<2>(v) << ")\n";
}
int main() {
MyRange c(1, 4, 2, 5, 7, 9);
for (auto v: c) {
display(v);
}
}
I've left off things like const iterators, possible operator+=, decrementing, post increment, etc. They've been left as an exercise for the reader.
It stores the initial values, then increments each value in turn, rolling it back and incrementing the next when it get to the end value. It's a bit like incrementing a multi-digit number.
Using boost::iterator_facade for simplicity, you can spell out all the required members.
First we have a class that iterates N-dimensional indexes as std::array<std::size_t, N>
template<std::size_t N>
class indexes_iterator : public boost::iterator_facade<indexes_iterator, std::array<std::size_t, N>>
{
public:
template<typename... Dims>
indexes_iterator(Dims... dims) : dims{ dims... }, values{} {}
private:
friend class boost::iterator_core_access;
void increment() { advance(1); }
void decrement() { advance(-1); }
void advance(int n)
{
for (std::size_t i = 0; i < N; ++i)
{
int next = ((values[i] + n) % dims[i]);
n = (n \ dims[i]) + (next < value);
values[i] = next;
}
}
std::size_t distance(indexes_iterator const & other) const
{
std::size_t result = 0, mul = 1;
for (std::size_t i = 0; i < dims; ++i)
{
result += mul * other[i] - values[i];
mul *= ends[i];
}
}
bool equal(indexes_iterator const& other) const
{
return values == other.values;
}
std::array<std::size_t, N> & dereference() const { return values; }
std::array<std::size_t, N> ends;
std::array<std::size_t, N> values;
}
Then we use that to make something similar to a boost::zip_iterator, but instead of advancing all together we add our indexes.
template <typename... Iterators>
class product_iterator : public boost::iterator_facade<product_iterator<Iterators...>, const std::tuple<decltype(*std::declval<Iterators>())...>, boost::random_access_traversal_tag>
{
using ref = std::tuple<decltype(*std::declval<Iterators>())...>;
public:
product_iterator(Iterators ... ends) : indexes() , iterators(std::make_tuple(ends...)) {}
template <typename ... Sizes>
product_iterator(Iterators ... begins, Sizes ... sizes)
: indexes(sizes...),
iterators(begins...)
{}
private:
friend class boost::iterator_core_access;
template<std::size_t... Is>
ref dereference_impl(std::index_sequence<Is...> idxs) const
{
auto offs = offset(idxs);
return { *std::get<Is>(offs)... };
}
ref dereference() const
{
return dereference_impl(std::index_sequence_for<Iterators...>{});
}
void increment() { ++indexes; }
void decrement() { --indexes; }
void advance(int n) { indexes += n; }
template<std::size_t... Is>
std::tuple<Iterators...> offset(std::index_sequence<Is...>) const
{
auto idxs = *indexes;
return { (std::get<Is>(iterators) + std::get<Is>(idxs))... };
}
bool equal(product_iterator const & other) const
{
return offset(std::index_sequence_for<Iterators...>{})
== other.offset(std::index_sequence_for<Iterators...>{});
}
indexes_iterator<sizeof...(Iterators)> indexes;
std::tuple<Iterators...> iterators;
};
Then we wrap it up in a boost::iterator_range
template <typename... Ranges>
auto make_product_range(Ranges&&... rngs)
{
product_iterator<decltype(begin(rngs))...> b(begin(rngs)..., std::distance(std::begin(rngs), std::end(rngs))...);
product_iterator<decltype(begin(rngs))...> e(end(rngs)...);
return boost::iterator_range<product_iterator<decltype(begin(rngs))...>>(b, e);
}
int main()
{
using ranges::view::iota;
for (auto p : make_product_range(iota(xstart, xend), iota(ystart, yend), iota(zstart, zend)))
// ...
return 0;
}
See it on godbolt
Just a very simplified version that will be as efficient as a for loop:
#include <tuple>
struct iterator{
int x;
int x_start;
int x_end;
int y;
int y_start;
int y_end;
int z;
constexpr auto
operator*() const{
return std::tuple{x,y,z};
}
constexpr iterator&
operator++ [[gnu::always_inline]](){
++x;
if (x==x_end){
x=x_start;
++y;
if (y==y_end) {
++z;
y=y_start;
}
}
return *this;
}
constexpr iterator
operator++(int){
auto old=*this;
operator++();
return old;
}
};
struct sentinel{
int z_end;
friend constexpr bool
operator == (const iterator& x,const sentinel& s){
return x.z==s.z_end;
}
friend constexpr bool
operator == (const sentinel& a,const iterator& x){
return x==a;
}
friend constexpr bool
operator != (const iterator& x,const sentinel& a){
return !(x==a);
}
friend constexpr bool
operator != (const sentinel& a,const iterator& x){
return !(x==a);
}
};
struct range{
iterator start;
sentinel finish;
constexpr auto
begin() const{
return start;
}
constexpr auto
end()const{
return finish;
}
};
void func(int,int,int);
void test(const range& r){
for(auto [x,y,z]: r)
func(x,y,z);
}
void test(int x_start,int x_end,int y_start,int y_end,int z_start,int z_end){
for(int z=z_start;z<z_end;++z)
for(int y=y_start;y<y_end;++y)
for(int x=x_start;x<x_end;++x)
func(x,y,z);
}
The advantage over 1201ProgramAlarm answer is the faster test performed at each iteration thanks to the use of a sentinel.

How to implement a tensor class for Kronecker-Produkt [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently I came across an interesting article what's called the Kronecker-Produkt. At the same time I'm working on my neural network library.
So that my algorithm works, I need a tensor class, where I can get the product of two tensor's with an overloaded * operator.
Consider the following example/questions:
How to efficiently construct/store the nested matrices?
How to perform the product of two tensor's?
How to visualize tensor c as simply as possible?
My class 3 tensor which currently only supports 3 dimensions:
#pragma once
#include <iostream>
#include <sstream>
#include <random>
#include <cmath>
#include <iomanip>
template<typename T>
class tensor {
public:
const unsigned int x, y, z, s;
tensor(unsigned int x, unsigned int y, unsigned int z, T val) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
for (unsigned int i = 0; i < s; i++) p_data[i] = val;
}
tensor(const tensor<T> & other) : x(other.x), y(other.y), z(other.z), s(other.s) {
p_data = new T[s];
memcpy(p_data, other.get_data(), s * sizeof(T));
}
~tensor() {
delete[] p_data;
p_data = nullptr;
}
T * get_data() {
return p_data;
}
static tensor<T> * random(unsigned int x, unsigned int y, unsigned int z, T val, T min, T max) {
tensor<T> * p_tensor = new tensor<T>(x, y, z, val);
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<T> dist(min, max);
for (unsigned int i = 0; i < p_tensor->s; i++) {
T rnd = dist(mt);
while (abs(rnd) < 0.001) rnd = dist(mt);
p_tensor->get_data()[i] = rnd;
}
return p_tensor;
}
static tensor<T> * from(std::vector<T> * p_data, T val) {
tensor<T> * p_tensor = new tensor<T>(p_data->size(), 1, 1, val);
for (unsigned int i = 0; i < p_tensor->get_x(); i++) p_tensor->set_data(i + 0 * p_tensor->get_x() * + 0 * p_tensor->get_x() * p_tensor->get_y(), p_data->at(i));
return p_tensor;
}
friend std::ostream & operator <<(std::ostream & stream, tensor<T> & tensor) {
stream << "(" << tensor.x << "," << tensor.y << "," << tensor.z << ") Tensor\n";
for (unsigned int i = 0; i < tensor.x; i++) {
for (unsigned int k = 0; k < tensor.z; k++) {
stream << "[";
for (unsigned int j = 0; j < tensor.y; j++) {
stream << std::setw(5) << roundf(tensor(i, j, k) * 1000) / 1000;
if (j + 1 < tensor.y) stream << ",";
}
stream << "]";
}
stream << std::endl;
}
return stream;
}
tensor<T> & operator +(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator -(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator *(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
T & operator ()(unsigned int i, unsigned int j, unsigned int k) {
return p_data[i + (j * x) + (k * x * y)];
}
T & operator ()(unsigned int i) {
return p_data[i];
}
private:
T * p_data = nullptr;
};
int main() {
tensor<double> * p_tensor_input = tensor<double>::random(6, 2, 3, 0.0, 0.0, 1.0);
tensor<double> * p_tensor_weight = tensor<double>::random(2, 6, 3, 0.0, 0.0, 1.0);
std::cout << *p_tensor_input << std::endl;
std::cout << *p_tensor_weight << std::endl;
tensor<double> p_tensor_output = *p_tensor_input + *p_tensor_weight;
return 0;
}
Your first step is #2 -- and get it correct.
After that, optimize.
Start with a container C<T>.
Define some operations on it. wrap(T) returns a C<T> containing that T. map takes a C<T> and a function on T U f(T) and returns C<U>. flatten takes a C<C<U>> and returns a C<U>.
Define scale( T, C<T> ) which takes a T and a C<T> and returns a C<T> with the elements scaled. Aka, scalar multiplication.
template<class T>
C<T> scale( T scalar, C<T> container ) {
return map( container, [&](T t){ return t*scalar; } );
}
Then we have:
template<class T>
C<T> tensor( C<T> lhs, C<T> rhs ) {
return flatten( map( lhs, [&](T t) { return scale( t, rhs ); } ) );
}
is your tensor product. And yes, that can be your actual code. I would tweak it a bit for efficiency.
(Note I used different terms, but I'm basically describing monadic operations using different words.)
After you have this, test, optimize, and iterate.
As for 3, the result of tensor products get large and complex, there is no simple visualization for a large tensor.
Oh, and keep things simple and store data in a std::vector to start.
Here are some tricks for efficient vectors i learned in class, but they should be equally good for a tensor.
Define an empty constructor and assignment operator. For example
tensor(unsigned int x, unsigned int y, unsigned int z) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
}
tensor& operator=( tensor const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
template <typename T>
tensor& operator=( T const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
Now we can implement things like addition and scaling with deferred evaluation. For example:
template<typename T1, typename T2>
class tensor_sum {
//add value_type to base tensor class for this to work
typedef decltype( typename T1::value_type() + typename T2::value_type() ) value_type ;
//also add function to get size of tensor
value_type operator()( int i, int j, int k ) const {
return t1_(i,j,k) + v2_(i,j,k) ;
}
value_type operator()( int i ) const {
return t1_(i) + v2_(i) ;
}
private:
T1 const& t1_;
T2 const& t2_;
}
template <typename T1, typename T2>
tensor_sum<T1,T2> operator+(T1 const& t1, T2 const& t2 ) {
return vector_sum<T1,T2>(t1,t2) ;
}
This tensor_sum behaves exactly like any normal tensor, except that we don't have to allocate memory to store the result. So we can do something like this:
tensor<double> t0(...);
tensor<double> t1(...);
tensor<double> t2(...);
tensor<double> result(...); //define result to be empty, we will fill it later
result = t0 + t1 + 5.0*t2;
The compiler should optimize this to be just one loop, without storing intermediate results or modifying the original tensors. You can do the same thing for scaling and the kronecker product. Depending on what you want to do with the tensors, this can be a big advantage. But be careful, this isn't always the best option.
When implementing the kronecker product you should be careful of the of the ordering of your loop, try to go through the tensors in the order they are stored for cache efficiency.

C++: Performance of Loop in Expression Template

I'm using Expression Templates in a vector-like class for transformations such as moving averages. Here, different to standard arithmetic operations, the operator[](size_t i) does not make a single single acces to element i, but rather there is a whole loop which needs to be evaluated, e.g. for the moving average of a vector v
double operator[](size_t i) const
{
double ad=0.0;
for(int j=i-period+1; j<=i; ++j)
ad+=v[j];
return ad/period;
}
(thats not the real function because one must care for non-negative indices, but that doesn't matter now).
In using such a moving average construct, I have the fear that the code becomes rather in-performant, especially if one takes a double- or triple-moving average. Then one obtains nested loops and therefore quadratic or cubic scaling with the period-size.
My question is, whether compilers are so smart to optimize such redundant loops somehow away? Or is that not the case and one must manually take care for intermediate storage (--which is what I guess)? How could one do this reasonably in the example code below?
Example code, adapted from Wikipedia, compiles with Visual Studio 2013:
CRTP-base class and actual vector:
#include <vector>
template <typename E>
struct VecExpression
{
double operator[](int i) const { return static_cast<E const&>(*this)[i]; }
};
struct Vec : public VecExpression<Vec>
{
Vec(size_t N) : data(N) {}
double operator[](int i) const { return data[i]; }
double& operator[](int i) { return data[i]; }
std::vector<double> data;
};
Moving Average class:
template <typename VectorType>
struct VecMovingAverage : public VecExpression<VecMovingAverage<VectorType> >
{
VecMovingAverage(VectorType const& _vector, int _period) : vector(_vector), period(_period) {}
double operator[](int i) const
{
int s = std::max(i - period + 1, 0);
double ad = 0.0;
for (int j = s; j <= i; ++j)
ad += vector[j];
return ad / (i - s + 1);
}
VectorType const& vector;
int period;
};
template<typename VectorType>
auto MovingAverage(VectorType const& vector, int period = 10) -> VecMovingAverage<VectorType>
{
return VecMovingAverage<VectorType>(vector, period);
}
Now my above mentioned fears arise with expressions like this,
Vec vec(100);
auto tripleMA= MovingAverage(MovingAverage(MovingAverage(vec,20),20),20);
std::cout << tripleMA[40] << std::endl;
which I suppose require 20^3 evaluations for the single operator[] ... ?
EDIT: One obvious solution is to store the result. Move std::vector<double> data into the base class, then change the Moving Average class to something like (untested)
template <typename VectorType, bool Store>
struct VecMovingAverage : public VecExpression<VecMovingAverage<VectorType, Store> >
{
VecMovingAverage(VectorType const& _vector, int _period) : vector(_vector), period(_period) {}
double operator[](int i) const
{
if(Store && i<data.size())
{
return data[i];
}
else
{
int s = std::max(i - period + 1, 0);
double ad = 0.0;
for (int j = s; j <= i; ++j)
ad += vector[j];
ad /= (i - s + 1)
if(Store)
{
data.resize(i+1);
data[i]=ad;
}
return ad;
}
}
VectorType const& vector;
int period;
};
In the function one can then choose to store the result:
template<typename VectorType>
auto MovingAverage(VectorType const& vector, int period = 10) -> VecMovingAverage<VectorType>
{
static const bool Store=true;
return VecMovingAverage<VectorType, Store>(vector, period);
}
This could be extended such that storage is applied only for multiple applications, etc.

Complex class [PrintMax]

So I am working on "TEMPLATES" and I'm required to make a 3 attempt of a function called PrintMax -it's obvious what it does-, to print the maximum element in an array of 3 elements, each attempt is for a different data type in this array -double/int/complex-. So I'm required to first, create the class Complex, and its required operator overloads, after that I use the PrintMax function as template function to work on the 3 types of arrays.
The problem here lies within the 3rd array of course, I can't write the elements of Complex into the array in this for ( a + bi ), because this is my class Complex :
class Complex
{
private :
int imaginary;
int real;
public:
Complex (int = 0, int = 0);
~Complex ();
int getImaginary();
int getReal();
void setImagniary(int i);
void setReal (int r);
bool operator > (Complex&);
};
You can notice, I overloaded operator > to check, but I also have a little problem besides not being able to write the elements in that way, the second problem is I can't -or sleepy and my brain is dying- calculate which is maximum in this array of Complex numbers :
// Input: Complex Array
// 1+3i, 2+4i, 3+3i
// Expected Output: 2+4i
So I want to assign them in the array with this form : Arr[3] = {1+3i, 2+4i, 3+3i};
Why is that the expected output, why not 3+3i ?
Thanks for reading ~
It seems to me that you are looking for something like:
template <typename T> void PrintMax(T array[])
{
// It is assumed that array has 3 elements in it.
std::cout <<
array[0] > array[1] ?
(array[0] > array[2] ? array[0] : array[2]) :
(array[1] > array[2] ? array[1] : array[2])
std::endl;
}
You could use something like the following. Note that there are no range checks in the code, it is just to demonstrate a way how you could solve your problem.
Plus i would suggest you to use a container (eg. std::vector) instead of plain arrays.
#include <algorithm>
#include <cmath>
#include <iostream>
class Complex {
private:
int imaginary;
int real;
public:
Complex(int r, int i) :
imaginary(i), real(r) {
}
~Complex() {
}
int getImaginary() const {
return imaginary;
}
int getReal() const {
return real;
}
void setImaginary(int i) {
imaginary = i;
}
void setReal(int r) {
real = r;
}
double getAbsolut() const {
return std::abs(std::sqrt(std::pow(imaginary, 2) + std::pow(real, 2)));
}
friend bool operator<(const Complex& lhs, const Complex& rhs);
friend std::ostream& operator<<(std::ostream& stream,
const Complex& complex);
};
bool operator<(const Complex& lhs, const Complex& rhs) {
return lhs.getAbsolut() < rhs.getAbsolut();
}
std::ostream& operator<<(std::ostream& stream, const Complex& complex) {
stream << "Complex(" << complex.real << "+" << complex.imaginary
<< "i)";
return stream;
}
template<int size, class T>
void getMax(const T arr[]) {
T max_value = arr[0];
for (size_t i = 1; i < size; ++i) {
max_value = std::max(max_value, arr[i]);
}
std::cout << "max: " << max_value << std::endl;
}
int main(int argc, char **argv) {
Complex arr_complex[] = { Complex(3, 3), Complex(2, 4), Complex(1, 3) };
int arr_int[] = { 3, 5, 1 };
double arr_double[] = { 2.3, 5.6, 9.1 };
getMax<3>(arr_complex);
getMax<3>(arr_int);
getMax<3>(arr_double);
return 0;
}