C++: Performance of Loop in Expression Template

C++: Performance of Loop in Expression Template - c++

I'm using Expression Templates in a vector-like class for transformations such as moving averages. Here, different to standard arithmetic operations, the operator[](size_t i) does not make a single single acces to element i, but rather there is a whole loop which needs to be evaluated, e.g. for the moving average of a vector v
double operator[](size_t i) const
{
double ad=0.0;
for(int j=i-period+1; j<=i; ++j)
ad+=v[j];
return ad/period;
}
(thats not the real function because one must care for non-negative indices, but that doesn't matter now).
In using such a moving average construct, I have the fear that the code becomes rather in-performant, especially if one takes a double- or triple-moving average. Then one obtains nested loops and therefore quadratic or cubic scaling with the period-size.
My question is, whether compilers are so smart to optimize such redundant loops somehow away? Or is that not the case and one must manually take care for intermediate storage (--which is what I guess)? How could one do this reasonably in the example code below?
Example code, adapted from Wikipedia, compiles with Visual Studio 2013:
CRTP-base class and actual vector:
#include <vector>
template <typename E>
struct VecExpression
{
double operator[](int i) const { return static_cast<E const&>(*this)[i]; }
};
struct Vec : public VecExpression<Vec>
{
Vec(size_t N) : data(N) {}
double operator[](int i) const { return data[i]; }
double& operator[](int i) { return data[i]; }
std::vector<double> data;
};
Moving Average class:
template <typename VectorType>
struct VecMovingAverage : public VecExpression<VecMovingAverage<VectorType> >
{
VecMovingAverage(VectorType const& _vector, int _period) : vector(_vector), period(_period) {}
double operator[](int i) const
{
int s = std::max(i - period + 1, 0);
double ad = 0.0;
for (int j = s; j <= i; ++j)
ad += vector[j];
return ad / (i - s + 1);
}
VectorType const& vector;
int period;
};
template<typename VectorType>
auto MovingAverage(VectorType const& vector, int period = 10) -> VecMovingAverage<VectorType>
{
return VecMovingAverage<VectorType>(vector, period);
}
Now my above mentioned fears arise with expressions like this,
Vec vec(100);
auto tripleMA= MovingAverage(MovingAverage(MovingAverage(vec,20),20),20);
std::cout << tripleMA[40] << std::endl;
which I suppose require 20^3 evaluations for the single operator[] ... ?
EDIT: One obvious solution is to store the result. Move std::vector<double> data into the base class, then change the Moving Average class to something like (untested)
template <typename VectorType, bool Store>
struct VecMovingAverage : public VecExpression<VecMovingAverage<VectorType, Store> >
{
VecMovingAverage(VectorType const& _vector, int _period) : vector(_vector), period(_period) {}
double operator[](int i) const
{
if(Store && i<data.size())
{
return data[i];
}
else
{
int s = std::max(i - period + 1, 0);
double ad = 0.0;
for (int j = s; j <= i; ++j)
ad += vector[j];
ad /= (i - s + 1)
if(Store)
{
data.resize(i+1);
data[i]=ad;
}
return ad;
}
}
VectorType const& vector;
int period;
};
In the function one can then choose to store the result:
template<typename VectorType>
auto MovingAverage(VectorType const& vector, int period = 10) -> VecMovingAverage<VectorType>
{
static const bool Store=true;
return VecMovingAverage<VectorType, Store>(vector, period);
}
This could be extended such that storage is applied only for multiple applications, etc.

Related

Combining multiple for loops into single iterator

Say I have a nest for loop like
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
function_doing_stuff(std::make_tuple(x, y, z));
}
}
}
and would like to transform it into
MyRange range(xstart,xend,ystart,yend, zstart,zend);
for (auto point : range){
function_doing_stuff(point);
}
How would I write the MyRange class to be as efficient as the nested for loops?
The motivation for this is to be able to use std algorithms (such as transform, accumulate, etc), and to create code that is largely dimension agnostic.
By having an iterator, it would be easy to create templated functions that operate over a range of 1d, 2d or 3d points.
Code base is currently C++14.
EDIT:
Writing clear questions is hard. I'll try to clarify.
My problem is not writing an iterator, that I can do. Instead, the problem is one of performance: Is it possible to make an iterator that is as fast as the nested for loops?

With range/v3, you may do
auto xs = ranges::view::iota(xstart, xend);
auto ys = ranges::view::iota(ystart, yend);
auto zs = ranges::view::iota(zstart, zend);
for (const auto& point : ranges::view::cartesian_product(xs, ys, zs)){
function_doing_stuff(point);
}

You can introduce your own class as
class myClass {
public:
myClass (int x, int y, int z):m_x(x) , m_y(y), m_z(z){};
private:
int m_x, m_y, m_z;
}
and then initialize a std::vector<myClass> with your triple loop
std::vector<myClass> myVec;
myVec.reserve((xend-xstart)*(yend-ystart)*(zend-zstart)); // alloc memory only once;
for (int x = ystart; x < xend; x++){
for (int y = xstart; y < yend; y++){ // I assume you have a copy paste error here
for (int z = zstart; z < zend; z++){
myVec.push_back({x,y,z})
}
}
}
Finally, you can use all the nice std algorithms with the std::vector<myClass> myVec. With the syntactic sugar
using MyRange = std::vector<MyClass>;
and
MyRange makeMyRange(int xstart, int xend, int ystart, int yend, int zstart,int zend) {
MyRange myVec;
// loop from above
return MyRange;
}
you can write
const MyRange range = makeMyRange(xstart, xend, ystart, yend, zstart, zend);
for (auto point : range){
function_doing_stuff(point);
}
With the new move semantics this wont create unneeded copies. Please note, that the interface to this function is rather bad. Perhaps rather use 3 pairs of int, denoting the x,y,z interval.
Perhaps you change the names to something meaningful (e.g.myClass could be Point).

Another option, which directly transplants whatever looping code, is to use a Coroutine. This emulates yield from Python or C#.
using point = std::tuple<int, int, int>;
using coro = boost::coroutines::asymmetric_coroutine<point>;
coro::pull_type points(
[&](coro::push_type& yield){
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
yield(std::make_tuple(x, y, z));
}
}
}
});
for(auto p : points)
function_doing_stuff(p);

Since you care about performance, you should forget about combining iterators for the foreseeable future. The central problem is that compilers cannot yet untangle the mess and figure out that there are 3 independent variables in it, much less perform any loop interchange or unrolling or fusion.
If you must use ranges, use simple ones that the compiler can see through:
for (int const x : boost::irange<int>(xstart,xend))
for (int const y : boost::irange<int>(ystart,yend))
for (int const z : boost::irange<int>(zstart,zend))
function_doing_stuff(x, y, z);
Alternatively, you can actually pass your functor and the boost ranges to a template:
template <typename Func, typename Range0, typename Range1, typename Range2>
void apply_ranges (Func func, Range0 r0, Range1 r1, Range2 r2)
{
for (auto const i0 : r0)
for (auto const i1 : r1)
for (auto const i2 : r2)
func (i0, i1, i2);
}
If you truly care about performance, then you should not contort your code with complicated ranges, because they make it harder to untangle later when you want to rewrite them in AVX intrinsics.

Here's a bare-bones implementation that does not use any advanced language features or other libraries. The performance should be pretty close to the for loop version.
#include <tuple>
class MyRange {
public:
typedef std::tuple<int, int, int> valtype;
MyRange(int xstart, int xend, int ystart, int yend, int zstart, int zend): xstart(xstart), xend(xend), ystart(ystart), yend(yend), zstart(zstart), zend(zend) {
}
class iterator {
public:
iterator(MyRange &c): me(c) {
curvalue = std::make_tuple(me.xstart, me.ystart, me.zstart);
}
iterator(MyRange &c, bool end): me(c) {
curvalue = std::make_tuple(end ? me.xend : me.xstart, me.ystart, me.zstart);
}
valtype operator*() {
return curvalue;
}
iterator &operator++() {
if (++std::get<2>(curvalue) == me.zend) {
std::get<2>(curvalue) = me.zstart;
if (++std::get<1>(curvalue) == me.yend) {
std::get<1>(curvalue) = me.ystart;
++std::get<0>(curvalue);
}
}
return *this;
}
bool operator==(const iterator &other) const {
return curvalue == other.curvalue;
}
bool operator!=(const iterator &other) const {
return curvalue != other.curvalue;
}
private:
MyRange &me;
valtype curvalue;
};
iterator begin() {
return iterator(*this);
}
iterator end() {
return iterator(*this, true);
}
private:
int xstart, xend;
int ystart, yend;
int zstart, zend;
};
And an example of usage:
#include <iostream>
void display(std::tuple<int, int, int> v) {
std::cout << "(" << std::get<0>(v) << ", " << std::get<1>(v) << ", " << std::get<2>(v) << ")\n";
}
int main() {
MyRange c(1, 4, 2, 5, 7, 9);
for (auto v: c) {
display(v);
}
}
I've left off things like const iterators, possible operator+=, decrementing, post increment, etc. They've been left as an exercise for the reader.
It stores the initial values, then increments each value in turn, rolling it back and incrementing the next when it get to the end value. It's a bit like incrementing a multi-digit number.

Using boost::iterator_facade for simplicity, you can spell out all the required members.
First we have a class that iterates N-dimensional indexes as std::array<std::size_t, N>
template<std::size_t N>
class indexes_iterator : public boost::iterator_facade<indexes_iterator, std::array<std::size_t, N>>
{
public:
template<typename... Dims>
indexes_iterator(Dims... dims) : dims{ dims... }, values{} {}
private:
friend class boost::iterator_core_access;
void increment() { advance(1); }
void decrement() { advance(-1); }
void advance(int n)
{
for (std::size_t i = 0; i < N; ++i)
{
int next = ((values[i] + n) % dims[i]);
n = (n \ dims[i]) + (next < value);
values[i] = next;
}
}
std::size_t distance(indexes_iterator const & other) const
{
std::size_t result = 0, mul = 1;
for (std::size_t i = 0; i < dims; ++i)
{
result += mul * other[i] - values[i];
mul *= ends[i];
}
}
bool equal(indexes_iterator const& other) const
{
return values == other.values;
}
std::array<std::size_t, N> & dereference() const { return values; }
std::array<std::size_t, N> ends;
std::array<std::size_t, N> values;
}
Then we use that to make something similar to a boost::zip_iterator, but instead of advancing all together we add our indexes.
template <typename... Iterators>
class product_iterator : public boost::iterator_facade<product_iterator<Iterators...>, const std::tuple<decltype(*std::declval<Iterators>())...>, boost::random_access_traversal_tag>
{
using ref = std::tuple<decltype(*std::declval<Iterators>())...>;
public:
product_iterator(Iterators ... ends) : indexes() , iterators(std::make_tuple(ends...)) {}
template <typename ... Sizes>
product_iterator(Iterators ... begins, Sizes ... sizes)
: indexes(sizes...),
iterators(begins...)
{}
private:
friend class boost::iterator_core_access;
template<std::size_t... Is>
ref dereference_impl(std::index_sequence<Is...> idxs) const
{
auto offs = offset(idxs);
return { *std::get<Is>(offs)... };
}
ref dereference() const
{
return dereference_impl(std::index_sequence_for<Iterators...>{});
}
void increment() { ++indexes; }
void decrement() { --indexes; }
void advance(int n) { indexes += n; }
template<std::size_t... Is>
std::tuple<Iterators...> offset(std::index_sequence<Is...>) const
{
auto idxs = *indexes;
return { (std::get<Is>(iterators) + std::get<Is>(idxs))... };
}
bool equal(product_iterator const & other) const
{
return offset(std::index_sequence_for<Iterators...>{})
== other.offset(std::index_sequence_for<Iterators...>{});
}
indexes_iterator<sizeof...(Iterators)> indexes;
std::tuple<Iterators...> iterators;
};
Then we wrap it up in a boost::iterator_range
template <typename... Ranges>
auto make_product_range(Ranges&&... rngs)
{
product_iterator<decltype(begin(rngs))...> b(begin(rngs)..., std::distance(std::begin(rngs), std::end(rngs))...);
product_iterator<decltype(begin(rngs))...> e(end(rngs)...);
return boost::iterator_range<product_iterator<decltype(begin(rngs))...>>(b, e);
}
int main()
{
using ranges::view::iota;
for (auto p : make_product_range(iota(xstart, xend), iota(ystart, yend), iota(zstart, zend)))
// ...
return 0;
}
See it on godbolt

Just a very simplified version that will be as efficient as a for loop:
#include <tuple>
struct iterator{
int x;
int x_start;
int x_end;
int y;
int y_start;
int y_end;
int z;
constexpr auto
operator*() const{
return std::tuple{x,y,z};
}
constexpr iterator&
operator++ [[gnu::always_inline]](){
++x;
if (x==x_end){
x=x_start;
++y;
if (y==y_end) {
++z;
y=y_start;
}
}
return *this;
}
constexpr iterator
operator++(int){
auto old=*this;
operator++();
return old;
}
};
struct sentinel{
int z_end;
friend constexpr bool
operator == (const iterator& x,const sentinel& s){
return x.z==s.z_end;
}
friend constexpr bool
operator == (const sentinel& a,const iterator& x){
return x==a;
}
friend constexpr bool
operator != (const iterator& x,const sentinel& a){
return !(x==a);
}
friend constexpr bool
operator != (const sentinel& a,const iterator& x){
return !(x==a);
}
};
struct range{
iterator start;
sentinel finish;
constexpr auto
begin() const{
return start;
}
constexpr auto
end()const{
return finish;
}
};
void func(int,int,int);
void test(const range& r){
for(auto [x,y,z]: r)
func(x,y,z);
}
void test(int x_start,int x_end,int y_start,int y_end,int z_start,int z_end){
for(int z=z_start;z<z_end;++z)
for(int y=y_start;y<y_end;++y)
for(int x=x_start;x<x_end;++x)
func(x,y,z);
}
The advantage over 1201ProgramAlarm answer is the faster test performed at each iteration thanks to the use of a sentinel.

How to implement a tensor class for Kronecker-Produkt [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently I came across an interesting article what's called the Kronecker-Produkt. At the same time I'm working on my neural network library.
So that my algorithm works, I need a tensor class, where I can get the product of two tensor's with an overloaded * operator.
Consider the following example/questions:
How to efficiently construct/store the nested matrices?
How to perform the product of two tensor's?
How to visualize tensor c as simply as possible?
My class 3 tensor which currently only supports 3 dimensions:
#pragma once
#include <iostream>
#include <sstream>
#include <random>
#include <cmath>
#include <iomanip>
template<typename T>
class tensor {
public:
const unsigned int x, y, z, s;
tensor(unsigned int x, unsigned int y, unsigned int z, T val) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
for (unsigned int i = 0; i < s; i++) p_data[i] = val;
}
tensor(const tensor<T> & other) : x(other.x), y(other.y), z(other.z), s(other.s) {
p_data = new T[s];
memcpy(p_data, other.get_data(), s * sizeof(T));
}
~tensor() {
delete[] p_data;
p_data = nullptr;
}
T * get_data() {
return p_data;
}
static tensor<T> * random(unsigned int x, unsigned int y, unsigned int z, T val, T min, T max) {
tensor<T> * p_tensor = new tensor<T>(x, y, z, val);
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<T> dist(min, max);
for (unsigned int i = 0; i < p_tensor->s; i++) {
T rnd = dist(mt);
while (abs(rnd) < 0.001) rnd = dist(mt);
p_tensor->get_data()[i] = rnd;
}
return p_tensor;
}
static tensor<T> * from(std::vector<T> * p_data, T val) {
tensor<T> * p_tensor = new tensor<T>(p_data->size(), 1, 1, val);
for (unsigned int i = 0; i < p_tensor->get_x(); i++) p_tensor->set_data(i + 0 * p_tensor->get_x() * + 0 * p_tensor->get_x() * p_tensor->get_y(), p_data->at(i));
return p_tensor;
}
friend std::ostream & operator <<(std::ostream & stream, tensor<T> & tensor) {
stream << "(" << tensor.x << "," << tensor.y << "," << tensor.z << ") Tensor\n";
for (unsigned int i = 0; i < tensor.x; i++) {
for (unsigned int k = 0; k < tensor.z; k++) {
stream << "[";
for (unsigned int j = 0; j < tensor.y; j++) {
stream << std::setw(5) << roundf(tensor(i, j, k) * 1000) / 1000;
if (j + 1 < tensor.y) stream << ",";
}
stream << "]";
}
stream << std::endl;
}
return stream;
}
tensor<T> & operator +(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator -(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator *(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
T & operator ()(unsigned int i, unsigned int j, unsigned int k) {
return p_data[i + (j * x) + (k * x * y)];
}
T & operator ()(unsigned int i) {
return p_data[i];
}
private:
T * p_data = nullptr;
};
int main() {
tensor<double> * p_tensor_input = tensor<double>::random(6, 2, 3, 0.0, 0.0, 1.0);
tensor<double> * p_tensor_weight = tensor<double>::random(2, 6, 3, 0.0, 0.0, 1.0);
std::cout << *p_tensor_input << std::endl;
std::cout << *p_tensor_weight << std::endl;
tensor<double> p_tensor_output = *p_tensor_input + *p_tensor_weight;
return 0;
}

Your first step is #2 -- and get it correct.
After that, optimize.
Start with a container C<T>.
Define some operations on it. wrap(T) returns a C<T> containing that T. map takes a C<T> and a function on T U f(T) and returns C<U>. flatten takes a C<C<U>> and returns a C<U>.
Define scale( T, C<T> ) which takes a T and a C<T> and returns a C<T> with the elements scaled. Aka, scalar multiplication.
template<class T>
C<T> scale( T scalar, C<T> container ) {
return map( container, [&](T t){ return t*scalar; } );
}
Then we have:
template<class T>
C<T> tensor( C<T> lhs, C<T> rhs ) {
return flatten( map( lhs, [&](T t) { return scale( t, rhs ); } ) );
}
is your tensor product. And yes, that can be your actual code. I would tweak it a bit for efficiency.
(Note I used different terms, but I'm basically describing monadic operations using different words.)
After you have this, test, optimize, and iterate.
As for 3, the result of tensor products get large and complex, there is no simple visualization for a large tensor.
Oh, and keep things simple and store data in a std::vector to start.

Here are some tricks for efficient vectors i learned in class, but they should be equally good for a tensor.
Define an empty constructor and assignment operator. For example
tensor(unsigned int x, unsigned int y, unsigned int z) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
}
tensor& operator=( tensor const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
template <typename T>
tensor& operator=( T const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
Now we can implement things like addition and scaling with deferred evaluation. For example:
template<typename T1, typename T2>
class tensor_sum {
//add value_type to base tensor class for this to work
typedef decltype( typename T1::value_type() + typename T2::value_type() ) value_type ;
//also add function to get size of tensor
value_type operator()( int i, int j, int k ) const {
return t1_(i,j,k) + v2_(i,j,k) ;
}
value_type operator()( int i ) const {
return t1_(i) + v2_(i) ;
}
private:
T1 const& t1_;
T2 const& t2_;
}
template <typename T1, typename T2>
tensor_sum<T1,T2> operator+(T1 const& t1, T2 const& t2 ) {
return vector_sum<T1,T2>(t1,t2) ;
}
This tensor_sum behaves exactly like any normal tensor, except that we don't have to allocate memory to store the result. So we can do something like this:
tensor<double> t0(...);
tensor<double> t1(...);
tensor<double> t2(...);
tensor<double> result(...); //define result to be empty, we will fill it later
result = t0 + t1 + 5.0*t2;
The compiler should optimize this to be just one loop, without storing intermediate results or modifying the original tensors. You can do the same thing for scaling and the kronecker product. Depending on what you want to do with the tensors, this can be a big advantage. But be careful, this isn't always the best option.
When implementing the kronecker product you should be careful of the of the ordering of your loop, try to go through the tensors in the order they are stored for cache efficiency.

Generic C++ multidimensional iterators

In my current project I am dealing with a multidimensional datastructure.
The underlying file is stored sequentially (i.e. one huge array, no vector of vectors).
The algorithms that use these datastructures need to know the size of the individual dimensions.
I am wondering if a multidimensional iterator class has been definied somewhere in a generic way and if there are any standards or preferred ways on how to tackle this.
At the moment I am just using a linear iterator with some additional methods that return the size of each dimension and how many dimensions are there in the first part. The reason I don't like it is because I can't use std:: distance in a reasonable way for example (i.e. only returns distance of the whole structure, but not for each dimension separately).
For the most part I will access the datastructure in a linear fashion (first dimension start to finish -> next dimension+...and so on), but it would be good to know when one dimension "ends". I don't know how to do this with just operator*(), operator+() and operator==() in such an approach.
A vector of vectors approach is disfavored, because I don't want to split up the file. Also the algorithms must operate on structure with different dimensionality and are therefore hard to generalize (or maybe there is a way?).
Boost multi_array has the same problems (multiple "levels" of iterators).
I hope this is not too vague or abstract. Any hint in the right direction would be appreciated.
I was looking for a solution myself again and revisited boost:: multi_array. As it turns out it is possible to generate sub views on the data with them, but at the same time also take a direct iterator at the top level and implicitely "flatten" the data structure. The implemented versions of multi_array however do not suit my needs, therefore I probably will implement one myself (that handles the caching of the files in the background) that is compatible with the other multi_arrays.
I will update it again once the implementation is done.

I have just decided to open a public repository on Github : MultiDim Grid which might help for your needs. This is an ongoing project so
I would be glad if you can try it and tell me what you miss / need.
I have started working on this with this topic on codereview.
Put it simply :
MultiDim Grid proposes a flat uni-dimensional array which offer a
generic fast access between multi-dimension coordinates and flatten
index.
You get a container behaviour so you have access to iterators.

That's not that difficult to implement. Just state precisely what functionality your project requires. Here's a dumb sample.
#include <iostream>
#include <array>
#include <vector>
#include <cassert>
template<typename T, int dim>
class DimVector : public std::vector<T> {
public:
DimVector() {
clear();
}
void clear() {
for (auto& i : _sizes)
i = 0;
std::vector<T>::clear();
}
template<class ... Types>
void resize(Types ... args) {
std::array<int, dim> new_sizes = { args ... };
resize(new_sizes);
}
void resize(std::array<int, dim> new_sizes) {
clear();
for (int i = 0; i < dim; ++i)
if (new_sizes[i] == 0)
return;
_sizes = new_sizes;
int realsize = _sizes[0];
for (int i = 1; i < dim; ++i)
realsize *= _sizes[i];
std::vector<T>::resize(static_cast<size_t>(realsize));
}
decltype(auto) operator()(std::array<int, dim> pos) {
// check indexes and compute original index
size_t index;
for (int i = 0; i < dim; ++i) {
assert(0 <= pos[i] && pos[i] < _sizes[i]);
index = (i == 0) ? pos[i] : (index * _sizes[i] + pos[i]);
}
return std::vector<T>::at(index);
}
template<class ... Types>
decltype(auto) at(Types ... args) {
std::array<int, dim> pos = { args ... };
return (*this)(pos);
}
int size(int d) const {
return _sizes[d];
}
class Iterator {
public:
T& operator*() const;
T* operator->() const;
bool operator!=(const Iterator& other) const {
if (&_vec != &other._vec)
return true;
for (int i = 0; i < dim; ++i)
if (_pos[i] != other._pos[i])
return true;
return false;
}
int get_dim(int d) const {
assert(0 <= d && d < dim);
return _pos[d];
}
void add_dim(int d, int value = 1) {
assert(0 <= d && d < dim);
_pos[d] += value;
assert(0 <= _pos[i] && _pos[i] < _vec._sizes[i]);
}
private:
DimVector &_vec;
std::array<int, dim> _pos;
Iterator(DimVector& vec, std::array<int, dim> pos) : _vec(vec), _pos(pos) { }
};
Iterator getIterator(int pos[dim]) {
return Iterator(*this, pos);
}
private:
std::array<int, dim> _sizes;
};
template<typename T, int dim>
inline T& DimVector<T, dim>::Iterator::operator*() const {
return _vec(_pos);
}
template<typename T, int dim>
inline T* DimVector<T, dim>::Iterator::operator->() const {
return &_vec(_pos);
}
using namespace std;
int main() {
DimVector<int, 4> v;
v.resize(1, 2, 3, 4);
v.at(0, 0, 0, 1) = 1;
v.at(0, 1, 0, 0) = 1;
for (int w = 0; w < v.size(0); ++w) {
for (int z = 0; z < v.size(1); ++z) {
for (int y = 0; y < v.size(2); ++y) {
for (int x = 0; x < v.size(3); ++x) {
cout << v.at(w, z, y, x) << ' ';
}
cout << endl;
}
cout << "----------------------------------" << endl;
}
cout << "==================================" << endl;
}
return 0;
}
TODO list:
optimize: use T const& when possible
optimizate iterator: precompute realindex and then just change that realindex
implement const accessors
implement ConstIterator
implement operator>> and operator<< to serialize DimVector to/from file

Understanding on User Defined function

Create a UserArray of bit fields which can be declared as follows: The size occupied by our Array will be less then a normal array. Suppose we want an ARRAY of 20 FLAGs (TRUE/FALSE). A bool FLAG[20] will take 20 bytes of memory, while UserArray<bool,bool,0,20> will take 4 bytes of memory.
Use class Template to create user array.
Use Bit wise operators to pack the array.
Equality operation should also be implemented.
template<class T,int W,int L,int H>//i have used template<class T>
//but never used such way
class UserArray{
//....
};
typedef UserArray<bool,4,0,20> MyType;
where:
T = type of an array element
W = width of an array element, 0 < W < 8
L = low bound of array index (preferably zero)
H = high bound of array index
A main program:
int main() {
MyType Display; //typedef UserArray<T,W,L,H> MyType; defined above
Display[0] = FALSE; //need to understand that how can we write this?
Display[1] = TRUE; //need to understand that how can we write this?
//assert(Display[0]);//commented once, need to understand above code first
//assert(Display[1]);//commented once..
//cout << “Size of the Display” << sizeof(Display);//commented once..
}
My doubt is how those parameters i.e T,L,W & H are used in class UserArray and how can we write instance of UserArray as Display[0] & Display[1] what does it represent?
Short & simple example of similar type will be easy for me to understand.

W, L and H are non-type template parameters. You can instantiate a template (at compile-time) with constant values, e.g.:
template <int N>
class MyArray
{
public:
float data[N];
void print() { std::cout << "MyArray of size " << N << std::endl; }
};
MyArray<7> foo;
MyArray<8> bar;
foo.print(); // "MyArray of size 7"
bar.print(); // "MyArray of size 8"
In the example above, everywhere that N appears in the template definition, it will be replaced at compile-time by the supplied constant.
Note that MyArray<7> and MyArray<8> are completely different types as far as the compile is concerned.
I have no idea what the solution to your specific problem is. But your code won't compile, currently, as you have not provided values for the template parameters.

This is not simple, particularly as you can have variable bit widths.
<limits.h> has a constant CHAR_BIT, which is the number of bits in a byte. Usually this is 8, but it could be greater than 8 (not less though).
I suggest the number of elements per byte be CHAR_BIT / W. This might waste a few bits for example, if width is 3 and CHAR_BIT is 8, but this is complicated enough as is.
You'll then need to define operator[] to access the elements, and likely need to do some bit fiddling to do this. For the non-const version of operator[], you'll probably have to return some sort of proxy object when there are more than one elements in a byte, and have its operator= overridden so it writes back to the appropriate spot in the array.
It's a good exercise though to figure this one out though.

Here's some code that implements what you ask for, except the lower bound is fixed at 0. It also shows a rare use case for the address_of operator. You could take this further and make this container compatible with STL algorithms if you liked.
#include <iostream>
#include <limits.h>
#include <stddef.h>
template<class T, size_t WIDTH, size_t SIZE>
class UserArray;
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayProxy;
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayAddressProxy
{
public:
typedef UserArray<T, WIDTH, SIZE> array_type;
typedef UserArrayProxy<T, WIDTH, SIZE> proxy_type;
typedef UserArrayAddressProxy<T, WIDTH, SIZE> this_type;
UserArrayAddressProxy(array_type& a_, size_t i_) : a(a_), i(i_) {}
UserArrayAddressProxy(const this_type& x) : a(x.a), i(x.i) {}
proxy_type operator*() { return proxy_type(a, i); }
this_type& operator+=(size_t n) { i += n; return *this; }
this_type& operator-=(size_t n) { i -= n; return *this; }
this_type& operator++() { ++i; return *this; }
this_type& operator--() { --i; return *this; }
this_type operator++(int) { this_type x = *this; ++i; return x; }
this_type operator--(int) { this_type x = *this; --i; return x; }
this_type operator+(size_t n) const { this_type x = *this; x += n; return x; }
this_type operator-(size_t n) const { this_type x = *this; x -= n; return x; }
bool operator==(const this_type& x) { return (&a == &x.a) && (i == x.i); }
bool operator!=(const this_type& x) { return !(*this == x); }
private:
array_type& a;
size_t i;
};
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayProxy
{
public:
static const size_t BITS_IN_T = sizeof(T) * CHAR_BIT;
static const size_t ELEMENTS_PER_T = BITS_IN_T / WIDTH;
static const size_t NUMBER_OF_TS = (SIZE - 1) / ELEMENTS_PER_T + 1;
static const T MASK = (1 << WIDTH) - 1;
typedef UserArray<T, WIDTH, SIZE> array_type;
typedef UserArrayProxy<T, WIDTH, SIZE> this_type;
typedef UserArrayAddressProxy<T, WIDTH, SIZE> address_proxy_type;
UserArrayProxy(array_type& a_, int i_) : a(a_), i(i_) {}
this_type& operator=(T x)
{
a.write(i, x);
return *this;
}
address_proxy_type operator&() { return address_proxy_type(a, i); }
operator T()
{
return a.get(i);
}
private:
array_type& a;
size_t i;
};
template<class T, size_t WIDTH, size_t SIZE>
class UserArray
{
public:
typedef UserArrayAddressProxy<T, WIDTH, SIZE> ptr_t;
static const size_t BITS_IN_T = sizeof(T) * CHAR_BIT;
static const size_t ELEMENTS_PER_T = BITS_IN_T / WIDTH;
static const size_t NUMBER_OF_TS = (SIZE - 1) / ELEMENTS_PER_T + 1;
static const T MASK = (1 << WIDTH) - 1;
T operator[](size_t i) const
{
return get(i);
}
UserArrayProxy<T, WIDTH, SIZE> operator[](size_t i)
{
return UserArrayProxy<T, WIDTH, SIZE>(*this, i);
}
friend class UserArrayProxy<T, WIDTH, SIZE>;
private:
void write(size_t i, T x)
{
T& element = data[i / ELEMENTS_PER_T];
int offset = (i % ELEMENTS_PER_T) * WIDTH;
x &= MASK;
element &= ~(MASK << offset);
element |= x << offset;
}
T get(size_t i)
{
return (data[i / ELEMENTS_PER_T] >> ((i % ELEMENTS_PER_T) * WIDTH)) & MASK;
}
T data[NUMBER_OF_TS];
};
int main()
{
typedef UserArray<int, 6, 20> myarray_t;
myarray_t a;
std::cout << "Sizeof a in bytes: " << sizeof(a) << std::endl;
for (size_t i = 0; i != 20; ++i) { a[i] = i; }
for (size_t i = 0; i != 20; ++i) { std::cout << a[i] << std::endl; }
std::cout << "We can even use address_of operator: " << std::endl;
for (myarray_t::ptr_t e = &a[0]; e != &a[20]; ++e) { std::cout << *e << std::endl; }
}

Creating functor from lambda expression

I would like to know if it is possible to create an actual functor object from a lambda expression. I don't think so, but if not, why?
To illustrate, given the code below, which sorts points using various policies for x and y coordinates:
#include <vector>
#include <functional>
#include <algorithm>
#include <iostream>
struct Point
{
Point(int x, int y) : x(x), y(y) {}
int x, y;
};
template <class XOrder, class YOrder>
struct SortXY :
std::binary_function<const Point&, const Point&, bool>
{
bool operator()(const Point& lhs, const Point& rhs) const
{
if (XOrder()(lhs.x, rhs.x))
return true;
else if (XOrder()(rhs.x, lhs.x))
return false;
else
return YOrder()(lhs.y, rhs.y);
}
};
struct Ascending { bool operator()(int l, int r) const { return l<r; } };
struct Descending { bool operator()(int l, int r) const { return l>r; } };
int main()
{
// fill vector with data
std::vector<Point> pts;
pts.push_back(Point(10, 20));
pts.push_back(Point(20, 5));
pts.push_back(Point( 5, 0));
pts.push_back(Point(10, 30));
// sort array
std::sort(pts.begin(), pts.end(), SortXY<Descending, Ascending>());
// dump content
std::for_each(pts.begin(), pts.end(),
[](const Point& p)
{
std::cout << p.x << "," << p.y << "\n";
});
}
The expression std::sort(pts.begin(), pts.end(), SortXY<Descending, Ascending>()); sorts according to descending x values, and then to ascending y values. It's easily understandable, and I'm not sure I really want to make use of lambda expressions here.
But if I wanted to replace Ascending / Descending by lambda expressions, how would you do it? The following isn't valid:
std::sort(pts.begin(), pts.end(), SortXY<
[](int l, int r) { return l>r; },
[](int l, int r) { return l<r; }
>());

This problem arises because SortXY only takes types, whereas lambdas are objects. You need to re-write it so that it takes objects, not just types. This is basic use of functional objects- see how std::for_each doesn't take a type, it takes an object.

I have posted a similar question w.r.t. lambda functors within classes.
Check this out, perhaps it helps:
Lambda expression as member functors in a class

I had a similar problem: It was required to provide in some cases a "raw"-function pointer and in other a functor. So I came up with a "workaround" like this:
template<class T>
class Selector
{
public:
Selector(int (*theSelector)(T& l, T& r))
: selector(theSelector) {}
virtual int operator()(T& l, T& r) {
return selector(l, r);
}
int (*getRawSelector() const)(T&, T&) {
return this->selector;
}
private:
int(*selector)(T& l, T& r);
};
Assuming you have two very simple functions taking --- as described --- either a functor or a raw function pointer like this:
int
findMinWithFunctor(int* array, int size, Selector<int> selector)
{
if (array && size > 0) {
int min = array[0];
for (int i = 0; i < size; i++) {
if (selector(array[i], min) < 0) {
min = array[i];
}
}
return min;
}
return -1;
}
int
findMinWithFunctionPointer(int* array, int size, int(*selector)(int&, int&))
{
if (array && size > 0) {
int min = array[0];
for (int i = 0; i < size; i++) {
if (selector(array[i], min) < 0) {
min = array[i];
}
}
return min;
}
return -1;
}
Then you would call this functions like this:
int numbers[3] = { 4, 2, 99 };
cout << "The min with functor is:" << findMinWithFunctor(numbers, 3, Selector<int>([](int& l, int& r) -> int {return (l > r ? 1 : (r > l ? -1 : 0)); })) << endl;
// or with the plain version
cout << "The min with raw fn-pointer is:" << findMinWithFunctionPointer(numbers, 3, Selector<int>([](int& l, int& r) -> int {return (l > r ? 1 : (r > l ? -1 : 0)); }).getRawSelector()) << endl;
Of course in this example there is no real benefit passing the int's as reference...it's just an example :-)
Improvements:
You can also modify the Selector class to be more concise like this:
template<class T>
class Selector
{
public:
typedef int(*selector_fn)(T& l, T& r);
Selector(selector_fn theSelector)
: selector(theSelector) {}
virtual int operator()(T& l, T& r) {
return selector(l, r);
}
selector_fn getRawSelector() {
return this->selector;
}
private:
selector_fn selector;
};
Here we are taking advantage of a simple typedef in order to define the function pointer once and use only it's name rather then writing the declaration over and over.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js