Get linear index for multidimensional access - c++

I'm trying to implement a multidimensional std::array, which hold a contigous array of memory of size Dim-n-1 * Dim-n-2 * ... * Dim-1. For that, i use private inheritance from std::array :
constexpr std::size_t factorise(std::size_t value)
{
return value;
}
template<typename... Ts>
constexpr std::size_t factorise(std::size_t value, Ts... values)
{
return value * factorise(values...);
}
template<typename T, std::size_t... Dims>
class multi_array : std::array<T, factorise(Dims...)>
{
// using directive and some stuff here ...
template<typename... Indexes>
reference operator() (Indexes... indexes)
{
return base_type::operator[] (linearise(std::make_integer_sequence<Dims...>(), indexes...)); // Not legal, just to explain the need.
}
}
For instance, multi_array<5, 2, 8, 12> arr; arr(2, 1, 4, 3) = 12; will access to the linear index idx = 2*(5*2*8) + 1*(2*8) + 4*(8) + 3.
I suppose that i've to use std::integer_sequence, passing an integer sequence to the linearise function and the list of the indexes, but i don't know how to do it. What i want is something like :
template<template... Dims, std::size_t... Indexes>
auto linearise(std::integer_sequence<int, Dims...> dims, Indexes... indexes)
{
return (index * multiply_but_last(dims)) + ...;
}
With multiply_but_last multiplying all remaining dimension but the last (i see how to implement with a constexpr variadic template function such as for factorise, but i don't understand if it is possible with std::integer_sequence).
I'm a novice in variadic template manipulation and std::integer_sequence and I think that I'm missing something. Is it possible to get the linear index computation without overhead (i.e. like if the operation has been hand-writtent) ?
Thanks you very much for your help.

Following might help:
#include <array>
#include <cassert>
#include <iostream>
template <std::size_t, typename T> using alwaysT_t = T;
template<typename T, std::size_t ... Dims>
class MultiArray
{
public:
const T& operator() (alwaysT_t<Dims, std::size_t>... indexes) const
{
return values[computeIndex(indexes...)];
}
T& operator() (alwaysT_t<Dims, std::size_t>... indexes)
{
return values[computeIndex(indexes...)];
}
private:
size_t computeIndex(alwaysT_t<Dims, std::size_t>... indexes_args) const
{
constexpr std::size_t dimensions[] = {Dims...};
std::size_t indexes[] = {indexes_args...};
size_t index = 0;
size_t mul = 1;
for (size_t i = 0; i != sizeof...(Dims); ++i) {
assert(indexes[i] < dimensions[i]);
index += indexes[i] * mul;
mul *= dimensions[i];
}
assert(index < (Dims * ...));
return index;
}
private:
std::array<T, (Dims * ...)> values;
};
Demo
I replaced your factorize by fold expression (C++17).

I have a very simple function that converts multi-dimensional index to 1D index.
#include <initializer_list>
template<typename ...Args>
inline constexpr size_t IDX(const Args... params) {
constexpr size_t NDIMS = sizeof...(params) / 2 + 1;
std::initializer_list<int> args{params...};
auto ibegin = args.begin();
auto sbegin = ibegin + NDIMS;
size_t res = 0;
for (int dim = 0; dim < NDIMS; ++dim) {
size_t factor = dim > 0 ? sbegin[dim - 1] : 0;
res = res * factor + ibegin[dim];
}
return res;
}
You may need to add "-Wno-c++11-narrowing" flag to your compiler if you see a warning like non-constant-expression cannot be narrowed from type 'int'.
Example usage:
2D array
int array2D[rows*cols];
// Usually, you need to access the element at (i,j) like this:
int elem = array2D[i * cols + j]; // = array2D[i,j]
// Now, you can do it like this:
int elem = array2D[IDX(i,j,cols)]; // = array2D[i,j]
3D array
int array3D[rows*cols*depth];
// Usually, you need to access the element at (i,j,k) like this:
int elem = array3D[(i * cols + j) * depth + k]; // = array3D[i,j,k]
// Now, you can do it like this:
int elem = array3D[IDX(i,j,k,cols,depth)]; // = array3D[i,j,k]
ND array
// shapes = {s1,s2,...,sn}
T arrayND[s1*s2*...*sn]
// indices = {e1,e2,...,en}
T elem = arrayND[IDX(e1,e2,...,en,s2,...,sn)] // = arrayND[e1,e2,...,en]
Note that the shape parameters passed to IDX(...) begins at the second shape, which is s2 in this case.
BTW: This implementation requires C++ 14.

Related

C++ Multidimensional array in existing memory

(This is not a duplicate of this or this that refer to fixed sizes, the issue is not to understand how pointers are stored, but if the compiler can automate the manual function).
Based on this SO question multidimensional arrays are stored sequentially.
// These arrays are the same
int array1[3][2] = {{0, 1}, {2, 3}, {4, 5}};
int array2[6] = { 0, 1, 2, 3, 4, 5 };
However I'm trying to create a 2 dimension array of floats in pre-allocated memory:
float a[5][10]
float b[50]; // should be same memory
Then I'm trying:
vector<char> x(1000);
float** a = (float**)x.data();
a[0][1] = 5;
The above code crashes, obviously because the compiler does not know the size of the array to allocate it in memory like in the compiler-level known array in the first example.
Is there a way to tell the compiler to allocate a multi dimensional array in sequential memory without manually calculating the pointers (say, by manually shifting the index and calling placement new for example)?
Currently, I'm doing it manually, for example:
template <typename T> size_t CreateBuffersInMemory(char* p,int n,int BufferSize)
{
// ib = T** to store the data
int ty = sizeof(T);
int ReqArraysBytes = n * sizeof(void*);
int ReqT = ReqArraysBytes * (ty*BufferSize);
if (!p)
return ReqT;
memset(p, 0, ReqT);
ib = (T**)p;
p += n * sizeof(void*);
for (int i = 0; i < n; i++)
{
ib[i] = (T*)p;
p += ty*BufferSize;
}
return ReqT;
}
Thanks a lot.
To allocate T[rows][cols] array as a one-dimensional array allocate T[rows * cols].
To access element [i][j] of that one-dimensional array you can do p[i * cols + j].
Example:
template<class T>
struct Array2d {
T* elements_;
unsigned columns_;
Array2d(unsigned rows, unsigned columns)
: elements_(new T[rows * columns]{}) // Allocate and value-initialize.
, columns_(columns)
{}
T* operator[](unsigned row) {
return elements_ + row * columns_;
}
// TODO: Implement the special member functions.
};
int main() {
Array2d<int> a(5, 10);
a[3][1] = 0;
}
Your code invokes undefined behavior because x.data() does not point to an array of pointers but to an array of 1000 objects of type char. You should be thankful that it crashes… ;-)
One way to access a contiguous buffer of some type as if it was a multidimensional array is to have another object that represents a multidimensional view into this buffer. This view object can then, e.g., provide member functions to access the data using a multidimensional index. To enable the a[i][j][k] kind of syntax (which you seem to be aiming for), provide an overloaded [] operator which returns a proxy object that itself offers an operator [] and so on until you get down to a single dimension.
For example, for the case that dimensions are fixed at compile time, we can define
template <int Extent, int... Extents>
struct row_major_layout;
template <int Extent>
struct row_major_layout<Extent>
{
template <typename T>
static auto view(T* data) { return data; }
};
template <int Extent, int... Extents>
struct row_major_layout
{
static constexpr int stride = (Extents * ... * 1);
template <typename T>
class span
{
T* data;
public:
span(T* data) : data(data) {}
auto operator[](std::size_t i) const
{
return row_major_layout<Extents...>::view(data + i * stride);
}
};
template <typename T>
static auto view(T* data) { return span<T>(data); }
};
and then simply create and access such a row_major_layout view
void test()
{
constexpr int M = 7, N = 2, K = 5;
std::vector<int> bla(row_major_layout<M, N, K>::size);
auto a3d = row_major_layout<M, N, K>::view(data(bla));
a3d[2][1][3] = 42;
}
live example here
Or in case the array bounds are dynamic:
template <int D>
class row_major_layout;
template <>
class row_major_layout<1>
{
public:
row_major_layout(std::size_t extent) {}
static constexpr std::size_t size(std::size_t extent)
{
return extent;
}
template <typename T>
friend auto view(T* data, const row_major_layout&)
{
return data;
}
};
template <int D>
class row_major_layout : row_major_layout<D - 1>
{
std::size_t stride;
public:
template <typename... Dim>
row_major_layout(std::size_t extent, Dim&&... extents)
: row_major_layout<D - 1>(std::forward<Dim>(extents)...), stride((extents * ... * 1))
{
}
template <typename... Dim>
static constexpr std::size_t size(std::size_t extent, Dim&&... extents)
{
return extent * row_major_layout<D - 1>::size(std::forward<Dim>(extents)...);
}
template <typename T>
class span
{
T* data;
std::size_t stride;
const row_major_layout<D - 1>& layout;
public:
span(T* data, std::size_t stride, const row_major_layout<D - 1>& layout)
: data(data), stride(stride), layout(layout)
{
}
auto operator[](std::size_t i) const
{
return view(data + i * stride, layout);
}
};
template <typename T>
friend auto view(T* data, const row_major_layout& layout)
{
return span<T>(data, layout.stride, layout);
}
};
and
void test(int M, int N, int K)
{
std::vector<int> bla(row_major_layout<3>::size(M, N, K));
auto a3d = view(data(bla), row_major_layout<3>(M, N, K));
a3d[2][1][3] = 42;
}
live example here
Based on this answer assuming you want an array of char you can do something like
std::vector<char> x(1000);
char (&ar)[200][5] = *reinterpret_cast<char (*)[200][5]>(x.data());
Then you can use ar as a normal two-dimensional array, like
char c = ar[2][3];
For anyone trying to achieve the same, I 've created a variadit template function that would create a n-dimension array in existing memory:
template <typename T = char> size_t CreateArrayAtMemory(void*, size_t bs)
{
return bs*sizeof(T);
}
template <typename T = char,typename ... Args>
size_t CreateArrayAtMemory(void* p, size_t bs, Args ... args)
{
size_t R = 0;
size_t PS = sizeof(void*);
char* P = (char*)p;
char* P0 = (char*)p;
size_t BytesForAllPointers = bs*PS;
R = BytesForAllPointers;
char* pos = P0 + BytesForAllPointers;
for (size_t i = 0; i < bs; i++)
{
char** pp = (char**)P;
if (p)
*pp = pos;
size_t RLD = CreateArrayAtMemory<T>(p ? pos : nullptr, args ...);
P += PS;
R += RLD;
pos += RLD;
}
return R;
}
Usage:
Create a 2x3x4 char array:
int j = 0;
size_t n3 = CreateArrayAtMemory<char>(nullptr,2,3,4);
std::vector<char> a3(n3);
char*** f3 = (char***)a3.data();
CreateArrayAtMemory<char>(f3,2,3,4);
for (int i1 = 0; i1 < 2; i1++)
{
for (int i2 = 0; i2 < 3; i2++)
{
for (int i3 = 0; i3 < 4; i3++)
{
f3[i1][i2][i3] = j++;
}
}
}

Create a constexpr C string from concatenation of a some string literal and an int template parameter

I have a class with an int template parameter. Under some circumstances I want it to output an error message. This message should be a concatenated string from some fixed text and the template parameters. For performance reasons I'd like to avoid building up this string at runtime each time the error occurs and theoretically both, the string literal and the template parameter are known at compiletime. So I'm looking for a possibility to declare it as a constexpr.
Code example:
template<int size>
class MyClass
{
void onError()
{
// obviously won't work but expressing the concatenation like
// it would be done with a std::string for clarification
constexpr char errMsg[] = "Error in MyClass of size " + std::to_string (size) + ": Detailed error description\n";
outputErrorMessage (errMsg);
}
}
Using static const would allow to compute it only once (but at runtime):
template<int size>
class MyClass
{
void onError()
{
static const std::string = "Error in MyClass of size "
+ std::to_string(size)
+ ": Detailed error description\n";
outputErrorMessage(errMsg);
}
};
If you really want to have that string at compile time, you might use std::array, something like:
template <std::size_t N>
constexpr std::size_t count_digit() {
if (N == 0) {
return 1;
}
std::size_t res = 0;
for (int i = N; i; i /= 10) {
++res;
}
return res;
}
template <std::size_t N>
constexpr auto to_char_array()
{
constexpr auto digit_count = count_digit<N>();
std::array<char, digit_count> res{};
auto n = N;
for (std::size_t i = 0; i != digit_count; ++i) {
res[digit_count - 1 - i] = static_cast<char>('0' + n % 10);
n /= 10;
}
return res;
}
template <std::size_t N>
constexpr std::array<char, N - 1> to_array(const char (&a)[N])
{
std::array<char, N - 1> res{};
for (std::size_t i = 0; i != N - 1; ++i) {
res[i] = a[i];
}
return res;
}
template <std::size_t ...Ns>
constexpr std::array<char, (Ns + ...)> concat(const std::array<char, Ns>&... as)
{
std::array<char, (Ns + ...)> res{};
std::size_t i = 0;
auto l = [&](const auto& a) { for (auto c : a) {res[i++] = c;} };
(l(as), ...);
return res;
}
And finally:
template<int size>
class MyClass
{
public:
void onError()
{
constexpr auto errMsg = concat(to_array("Error in MyClass of size "),
to_char_array<size>(),
to_array(": Detailed error description\n"),
std::array<char, 1>{{0}});
std::cout << errMsg.data();
}
};
Demo
Here's my solution. Tested on godbolt:
#include <string_view>
#include <array>
#include <algorithm>
void outputErrorMessage(std::string_view s);
template<int N> struct cint
{
constexpr int value() const { return N; }
};
struct concat_op {};
template<std::size_t N>
struct fixed_string
{
constexpr static std::size_t length() { return N; }
constexpr static std::size_t capacity() { return N + 1; }
template<std::size_t L, std::size_t R>
constexpr fixed_string(concat_op, fixed_string<L> l, fixed_string<R> r)
: fixed_string()
{
static_assert(L + R == N);
overwrite(0, l.data(), L);
overwrite(L, r.data(), R);
}
constexpr fixed_string()
: buffer_ { 0 }
{
}
constexpr fixed_string(const char (&source)[N + 1])
: fixed_string()
{
do_copy(source, buffer_.data());
}
static constexpr void do_copy(const char (&source)[N + 1], char* dest)
{
for(std::size_t i = 0 ; i < capacity() ; ++i)
dest[i] = source[i];
}
constexpr const char* data() const
{
return buffer_.data();
}
constexpr const char* data()
{
return buffer_.data();
}
constexpr void overwrite(std::size_t where, const char* source, std::size_t len)
{
auto dest = buffer_.data() + where;
while(len--)
*dest++ = *source++;
}
operator std::string_view() const
{
return { buffer_.data(), N };
}
std::array<char, capacity()> buffer_;
};
template<std::size_t N> fixed_string(const char (&)[N]) -> fixed_string<N - 1>;
template<std::size_t L, std::size_t R>
constexpr auto operator+(fixed_string<L> l, fixed_string<R> r) -> fixed_string<L + R>
{
auto result = fixed_string<L + R>(concat_op(), l , r);
return result;
};
template<int N>
constexpr auto to_string()
{
auto log10 = []
{
if constexpr (N < 10)
return 1;
else if constexpr(N < 100)
return 2;
else if constexpr(N < 1000)
return 3;
else
return 4;
// etc
};
constexpr auto len = log10();
auto result = fixed_string<len>();
auto pow10 = [](int n, int x)
{
if (x == 0)
return 1;
else while(x--)
n *= 10;
return n;
};
auto to_char = [](int n)
{
return '0' + char(n);
};
int n = N;
for (int i = 0 ; i < len ; ++i)
{
auto pow = pow10(10, i);
auto digit = to_char(n % 10);
if (n == 0 && i != 0) digit = ' ';
result.buffer_[len - i - 1] = digit;
n /= 10;
}
return result;
}
template<int size>
struct MyClass
{
void onError()
{
// obviously won't work but expressing the concatenation like
// it would be done with a std::string for clarification
static const auto errMsg = fixed_string("Error in MyClass of size ") + to_string<size>() + fixed_string(": Detailed error description\n");
outputErrorMessage (errMsg);
}
};
int main()
{
auto x = MyClass<10>();
x.onError();
}
Results in the following code:
main:
sub rsp, 8
mov edi, 56
mov esi, OFFSET FLAT:MyClass<10>::onError()::errMsg
call outputErrorMessage(std::basic_string_view<char, std::char_traits<char> >)
xor eax, eax
add rsp, 8
ret
https://godbolt.org/z/LTgn4F
Update:
The call to pow10 is not necessary. It's dead code which can be removed.
Unfortunately your options are limited. C++ doesn't permit string literals to be used for template arguments and, even if it did, literal concatenation happens in the preprocessor, before templates come into it. You would need some ghastly character-by-character array definition and some manual int-to-char conversion. Horrible enough that I can't bring myself to make an attempt, and horrible enough that to be honest I'd recommend not bothering. I'd generate it at runtime, albeit only once (you can make errMsg a function-static std::string at least).

How to make a "variadic" vector like class

I am trying to make class that acts as multidimensional vector. It doesn't have to do anything fancy. I basically want to have a "container" class foo where I can access elements by foo[x][y][z]. Now I would also need similar classes for foo[x][y] and foo[x]. Which lead me to ponder about the following (more general) question, is there a way to make something like this where you can just initialize as foo A(a,b,c,...) for any n number of arguments and get a n-dimensional vector with elements accessible by [][][]...? Below the class I have for (in example) the four-dimensional case.
First the header
#ifndef FCONTAINER_H
#define FCONTAINER_H
#include <iostream>
using namespace std;
class Fcontainer
{
private:
unsigned dim1, dim2, dim3, dim4 ;
double* data;
public:
Fcontainer(unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4);
~Fcontainer();
Fcontainer(const Fcontainer& m);
Fcontainer& operator= (const Fcontainer& m);
double& operator() (unsigned const dim1, unsigned const dim2, unsigned const dim3, unsigned const dim4);
double const& operator() (unsigned const dim1, unsigned const dim2, unsigned const dim3, unsigned const dim4) const;
};
#endif // FCONTAINER_H
Now the cpp:
#include "fcontainer.hpp"
Fcontainer::Fcontainer(unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4)
{
dim1 = dims1; dim2 = dims2; dim3 = dims3; dim4 = dims4;
if (dims1 == 0 || dims2 == 0 || dims3 == 0 || dims4 == 0)
throw std::invalid_argument("Container constructor has 0 size");
data = new double[dims1 * dims2 * dims3 * dims4];
}
Fcontainer::~Fcontainer()
{
delete[] data;
}
double& Fcontainer::operator() (unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4)
{
if (dims1 >= dim1 || dims2 >= dim2 || dims3 >= dim3 || dims4 >= dim4)
throw std::invalid_argument("Container subscript out of bounds");
return data[dims1*dim2*dims3*dim4 + dims2*dim3*dim4 + dim3*dim4 + dims4];
}
double const& Fcontainer::operator() (unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4) const
{
if(dims1 >= dim1 || dims2 >= dim2 || dims3 >= dim3 || dims4 >= dim4)
throw std::invalid_argument("Container subscript out of bounds");
return data[dims1*dim2*dims3*dim4 + dims2*dim3*dim4 + dim3*dim4 + dims4];
}
So I want to expand this to an arbitrary amount of dimensions. I suppose it will take something along the lines of a variadic template or an std::initializer_list but I am not clear on how to approach this( for this problem).
Messing around in Visual Studio for a little while, I came up with this nonsense:
template<typename T>
class Matrix {
std::vector<size_t> dimensions;
std::unique_ptr<T[]> _data;
template<typename ... Dimensions>
size_t apply_dimensions(size_t dim, Dimensions&& ... dims) {
dimensions.emplace_back(dim);
return dim * apply_dimensions(std::forward<Dimensions>(dims)...);
}
size_t apply_dimensions(size_t dim) {
dimensions.emplace_back(dim);
return dim;
}
public:
Matrix(std::vector<size_t> dims) : dimensions(std::move(dims)) {
size_t size = flat_size();
_data = std::make_unique<T[]>(size);
}
template<typename ... Dimensions>
Matrix(size_t dim, Dimensions&&... dims) {
size_t size = apply_dimensions(dim, std::forward<Dimensions>(dims)...);
_data = std::make_unique<T[]>(size);
}
T & operator()(std::vector<size_t> const& indexes) {
if(indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
T const& operator()(std::vector<size_t> const& indexes) const {
if (indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
template<typename ... Indexes>
T & operator()(size_t idx, Indexes&& ... indexes) {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
template<typename ... Indexes>
T const& operator()(size_t idx, Indexes&& ... indexes) const {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
T & at(size_t flat_index) {
return _data[flat_index];
}
T const& at(size_t flat_index) const {
return _data[flat_index];
}
size_t dimension_size(size_t dim) const {
return dimensions[dim];
}
size_t num_of_dimensions() const {
return dimensions.size();
}
size_t flat_size() const {
size_t size = 1;
for (size_t dim : dimensions)
size *= dim;
return size;
}
private:
size_t get_flat_index(std::vector<size_t> const& indexes) const {
size_t dim = 0;
size_t flat_index = 0;
for (size_t index : indexes) {
flat_index += get_offset(index, dim++);
}
return flat_index;
}
template<typename ... Indexes>
size_t get_flat_index(size_t dim, size_t index, Indexes&& ... indexes) const {
return get_offset(index, dim) + get_flat_index(dim + 1, std::forward<Indexes>(indexes)...);
}
size_t get_flat_index(size_t dim, size_t index) const {
return get_offset(index, dim);
}
size_t get_offset(size_t index, size_t dim) const {
if (index >= dimensions[dim])
throw std::runtime_error("Index out of Bounds");
for (size_t i = dim + 1; i < dimensions.size(); i++) {
index *= dimensions[i];
}
return index;
}
};
Let's talk about what this code accomplishes.
//private:
template<typename ... Dimensions>
size_t apply_dimensions(size_t dim, Dimensions&& ... dims) {
dimensions.emplace_back(dim);
return dim * apply_dimensions(std::forward<Dimensions>(dims)...);
}
size_t apply_dimensions(size_t dim) {
dimensions.emplace_back(dim);
return dim;
}
public:
Matrix(std::vector<size_t> dims) : dimensions(std::move(dims)) {
size_t size = flat_size();
_data = std::make_unique<T[]>(size);
}
template<typename ... Dimensions>
Matrix(size_t dim, Dimensions&&... dims) {
size_t size = apply_dimensions(dim, std::forward<Dimensions>(dims)...);
_data = std::make_unique<T[]>(size);
}
What this code enables us to do is write an initializer for this matrix that takes an arbitrary number of dimensions.
int main() {
Matrix<int> mat{2, 2}; //Yields a 2x2 2D Rectangular Matrix
mat = Matrix<int>{4, 6, 5};//mat is now a 4x6x5 3D Rectangular Matrix
mat = Matrix<int>{9};//mat is now a 9-length 1D array.
mat = Matrix<int>{2, 3, 4, 5, 6, 7, 8, 9};//Why would you do this? (yet it compiles...)
}
And if the number and sizes of the dimensions is only known at runtime, this code will work around that:
int main() {
std::cout << "Input the sizes of each of the dimensions.\n";
std::string line;
std::getline(std::cin, line);
std::stringstream ss(line);
size_t dim;
std::vector<size_t> dimensions;
while(ss >> dim)
dimensions.emplace_back(dim);
Matrix<int> mat{dimensions};//Voila.
}
Then, we want to be able to access arbitrary indexes of this matrix. This code offers two ways to do so: either statically using templates, or variably at runtime.
//public:
T & operator()(std::vector<size_t> const& indexes) {
if(indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
T const& operator()(std::vector<size_t> const& indexes) const {
if (indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
template<typename ... Indexes>
T & operator()(size_t idx, Indexes&& ... indexes) {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
template<typename ... Indexes>
T const& operator()(size_t idx, Indexes&& ... indexes) const {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
And then, in practice:
Matrix<int> mat{6, 5};
mat(5, 2) = 17;
//mat(5, 1, 7) = 24; //throws exception at runtime because of wrong number of dimensions.
mat = Matrix<int>{9, 2, 8};
mat(5, 1, 7) = 24;
//mat(5, 2) = 17; //throws exception at runtime because of wrong number of dimensions.
And this works fine with runtime-dynamic indexing:
std::vector<size_t> indexes;
/*...*/
mat(indexes) = 54; //Will throw if index count is wrong, will succeed otherwise
There are a number of other functions that this kind of object might want, like a resize method, but choosing how to implement that is a high-level design decision. I've also left out tons of other potentially valuable implementation details (like an optimizing move-constructor, a comparison operator, a copy constructor) but this should give you a pretty good idea of how to start.
EDIT:
If you want to avoid use of templates entirely, you can cut like half of the code provided here, and just use the methods/constructor that uses std::vector<size_t> to provide dimensions/index data. If you don't need the ability to dynamically adapt at runtime to the number of dimensions, you can remove the std::vector<size_t> overloads, and possibly even make the number of dimensions a template argument for the class itself (which would enable you to use size_t[] or std::array[size_t, N] to store dimensional data).
Well, assuming you care about efficiency at all, you probably want to store all of the elements in a contiguous manner regardless. So you probably want to do something like:
template <std::size_t N, class T>
class MultiArray {
MultiArray(const std::array<std::size_t, N> sizes)
: m_sizes(sizes)
, m_data.resize(product(m_sizes)) {}
std::array<std::size_t, N> m_sizes;
std::vector<T> m_data;
};
The indexing part is where it gets kind of fun. Basically, if you want a[1][2][3] etc to work, you have to have a return some kind of proxy object, that has its own operator[]. Each one would have to be aware of its own rank. Each time you do [] it returns a proxy letting you specify the next index.
template <std::size_t N, class T>
class MultiArray {
// as before
template <std::size_t rank>
class Indexor {
Indexor(MultiArray& parent, const std::array<std::size_t, N>& indices = {})
: m_parent(parent), m_indices(indices) {}
auto operator[](std::size_t index) {
m_indices[rank] = index;
return Indexor<rank+1>(m_indices, m_parent);
}
std::array<std::size_t, N> m_indices;
MultiArray& m_parent;
};
auto operator[](std::size_t index) {
return Indexor<0>(*this)[index];
}
}
Finally, you have a specialization for when you're done with the last index:
template <>
class Indexor<N-1> { // with obvious constructor
auto operator[](std::size_t index) {
m_indices[N-1] = index;
return m_parent.m_data[indexed_product(m_indices, m_parent.m_sizes)];
}
std::array<std::size_t, N> m_indices;
MultiArray& m_parent;
};
Obviously this is a sketch but at this point its just filling out details and getting it to compile. There are other approaches like instead having the indexor object have two iterators and narrowing but that seemed a bit more complex. You also don't need to template the Indexor class and could use a runtime integer instead but that would make it very easy to misuse, having one too many or too few [] would be a runtime error, not compile time.
Edit: you would also be able to initialize this in the way you describe in 17, but not in 14. But in 14 you can just use a function:
template <class ... Ts>
auto make_double_array(Ts ts) {
return MultiArray<sizeof ... Ts, double>(ts...);
}
Edit2: I use product and indexed_product in the implementation. The first is obvious, the second is less so, but hopefully they should be clear. The latter is a function that given an array of dimensions, and an array of indices, would return the position of that element in the array.

CUDA Thrust - Counting matching subarrays

I'm trying to figure out if it's possible to efficiently calculate the conditional entropy of a set of numbers using CUDA. You can calculate the conditional entropy by dividing an array into windows, then counting the number of matching subarrays/substrings for different lengths. For each subarray length, you calculate the entropy by adding together the matching subarray counts times the log of those counts. Then, whatever you get as the minimum entropy is the conditional entropy.
To give a more clear example of what I mean, here is full calculation:
The initial array is [1,2,3,5,1,2,5]. Assuming the window size is 3, this must be divided into five windows: [1,2,3], [2,3,5], [3,5,1], [5,1,2], and [1,2,5].
Next, looking at each window, we want to find the matching subarrays for each length.
The subarrays of length 1 are [1],[2],[3],[5],[1]. There are two 1s, and one of each other number. So the entropy is log(2)2 + 4(log(1)*1) = 0.6.
The subarrays of length 2 are [1,2], [2,3], [3,5], [5,1], and [1,2]. There are two [1,2]s, and four unique subarrays. The entropy is the same as length 1, log(2)2 + 4(log(1)*1) = 0.6.
The subarrays of length 3 are the full windows: [1,2,3], [2,3,5], [3,5,1], [5,1,2], and [1,2,5]. All five windows are unique, so the entropy is 5*(log(1)*1) = 0.
The minimum entropy is 0, meaning it is the conditional entropy for this array.
This can also be presented as a tree, where the counts at each node represent how many matches exist. The entropy for each subarray length is equivalent to the entropy for each level of the tree.
If possible, I'd like to perform this calculation on many arrays at once, and also perform the calculation itself in parallel. Does anyone have suggestions on how to accomplish this? Could thrust be useful? Please let me know if there is any additional information I should provide.
I tried solving your problem using thrust. It works, but it results in a lot of thrust calls.
Since your input size is rather small, you should process multiple arrays in parallel.
However, doing this results in a lot of book-keeping effort, you will see this in the following code.
Your input range is limited to [1,5], which is equivalent to [0,4]. The general idea is that (theoretically) any tuple out of this range (e.g. {1,2,3} can be represented as a number in base 4 (e.g. 1+2*4+3*16 = 57).
In practice we are limited by the size of the integer type. For a 32bit unsigned integer this will lead to a maximum tuple size of 16. This is also the maximum window size the following code can handle (changing to a 64bit unsigned integer will lead to a maximum tuple size of 32).
Let's assume the input data is structured like this:
We have 2 arrays we want to process in parallel, each array is of size 5 and window size is 3.
{{0,0,3,4,4},{0,2,1,1,3}}
We can now generate all windows:
{{0,0,3},{0,3,4},{3,4,4}},{{0,2,1},{2,1,1},{1,1,3}}
Using a per tuple prefix sum and applying the aforementioned representation of each tuple as a single base-4 number, we get:
{{0,0,48},{0,12,76},{3,19,83}},{{0,8,24},{2,6,22},{1,5,53}}
Now we reorder the values so we have the numbers which represent a subarray of a specific length next to each other:
{{0,0,3},{0,12,19},{48,76,83}},{0,2,1},{8,6,5},{24,22,53}}
We then sort within each group:
{{0,0,3},{0,12,19},{48,76,83}},{0,1,2},{5,6,8},{22,24,53}}
Now we can count how often a number occurs in each group:
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Applying the log-formula results in
0.60206,0,0,0,0,0
Now we fetch the minimum value per array:
0,0
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/functional.h>
#include <thrust/random.h>
#include <iostream>
#include <thrust/tuple.h>
#include <thrust/reduce.h>
#include <thrust/scan.h>
#include <thrust/gather.h>
#include <thrust/sort.h>
#include <math.h>
#include <chrono>
#ifdef PRINT_ENABLED
#define PRINTER(name) print(#name, (name))
#else
#define PRINTER(name)
#endif
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
std::cout << name << ":\t";
thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
std::cout << std::endl;
}
template <typename Integer, Integer Min, Integer Max>
struct random_filler
{
__device__
Integer operator()(std::size_t index) const
{
thrust::default_random_engine rng;
thrust::uniform_int_distribution<Integer> dist(Min, Max);
rng.discard(index);
return dist(rng);
}
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename T,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
__device__ __inline__
thrust::tuple<T,T,T,T> calc_indices(const T& i0)
{
const T i1 = i0 / PerArrayCount;
const T i2 = i0 % PerArrayCount;
const T i3 = i2 / WindowSize;
const T i4 = i2 % WindowSize;
return thrust::make_tuple(i1,i2,i3,i4);
}
template <typename Iterator,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount,
std::size_t TotalCount = PerArrayCount * ArrayCount
>
class sliding_window
{
public:
typedef typename thrust::iterator_difference<Iterator>::type difference_type;
struct window_functor : public thrust::unary_function<difference_type,difference_type>
{
__host__ __device__
difference_type operator()(const difference_type& i0) const
{
auto t = calc_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::get<0>(t) * ArraySize + thrust::get<2>(t) + thrust::get<3>(t);
}
};
typedef typename thrust::counting_iterator<difference_type> CountingIterator;
typedef typename thrust::transform_iterator<window_functor, CountingIterator> TransformIterator;
typedef typename thrust::permutation_iterator<Iterator,TransformIterator> PermutationIterator;
typedef PermutationIterator iterator;
sliding_window(Iterator first) : first(first){}
iterator begin(void) const
{
return PermutationIterator(first, TransformIterator(CountingIterator(0), window_functor()));
}
iterator end(void) const
{
return begin() + TotalCount;
}
protected:
Iterator first;
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename Iterator>
sliding_window<Iterator, ArraySize, ArrayCount, WindowSize>
make_sliding_window(Iterator first)
{
return sliding_window<Iterator, ArraySize, ArrayCount, WindowSize>(first);
}
template <typename KeyType,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize>
struct key_generator : thrust::unary_function<KeyType, thrust::tuple<KeyType,KeyType> >
{
__device__
thrust::tuple<KeyType,KeyType> operator()(std::size_t i0) const
{
auto t = calc_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::make_tuple(thrust::get<0>(t),thrust::get<2>(t));
}
};
template <typename Integer,
std::size_t Base,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize>
struct base_n : thrust::unary_function<thrust::tuple<Integer, Integer>, Integer>
{
__host__ __device__
Integer operator()(const thrust::tuple<Integer, Integer> t) const
{
const auto i = calc_indices<ArraySize, ArrayCount, WindowSize>(thrust::get<0>(t));
// ipow could be optimized by precomputing a lookup table at compile time
const auto result = thrust::get<1>(t)*ipow(Base, thrust::get<3>(i));
return result;
}
// taken from http://stackoverflow.com/a/101613/678093
__host__ __device__ __inline__
Integer ipow(Integer base, Integer exp) const
{
Integer result = 1;
while (exp)
{
if (exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename T,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
__device__ __inline__
thrust::tuple<T,T,T,T> calc_sort_indices(const T& i0)
{
const T i1 = i0 % PerArrayCount;
const T i2 = i0 / PerArrayCount;
const T i3 = i1 % WindowCount;
const T i4 = i1 / WindowCount;
return thrust::make_tuple(i1,i2,i3,i4);
}
template <typename Integer,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
struct pre_sort : thrust::unary_function<Integer, Integer>
{
__device__
Integer operator()(Integer i0) const
{
auto t = calc_sort_indices<ArraySize, ArrayCount,WindowSize>(i0);
const Integer i_result = ( thrust::get<2>(t) * WindowSize + thrust::get<3>(t) ) + thrust::get<1>(t) * PerArrayCount;
return i_result;
}
};
template <typename Integer,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
struct generate_sort_keys : thrust::unary_function<Integer, Integer>
{
__device__
thrust::tuple<Integer,Integer> operator()(Integer i0) const
{
auto t = calc_sort_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::make_tuple( thrust::get<1>(t), thrust::get<3>(t));
}
};
template<typename... Iterators>
__host__ __device__
thrust::zip_iterator<thrust::tuple<Iterators...>> zip(Iterators... its)
{
return thrust::make_zip_iterator(thrust::make_tuple(its...));
}
struct calculate_log : thrust::unary_function<std::size_t, float>
{
__host__ __device__
float operator()(std::size_t i) const
{
return i*log10f(i);
}
};
int main()
{
typedef int Integer;
typedef float Real;
const std::size_t array_count = ARRAY_COUNT;
const std::size_t array_size = ARRAY_SIZE;
const std::size_t window_size = WINDOW_SIZE;
const std::size_t window_count = array_size - (window_size-1);
const std::size_t input_size = array_count * array_size;
const std::size_t base = 4;
thrust::device_vector<Integer> input_arrays(input_size);
thrust::counting_iterator<Integer> counting_it(0);
thrust::transform(counting_it,
counting_it + input_size,
input_arrays.begin(),
random_filler<Integer,0,base>());
PRINTER(input_arrays);
const int runs = 100;
auto start = std::chrono::high_resolution_clock::now();
for (int k = 0 ; k < runs; ++k)
{
auto sw = make_sliding_window<array_size, array_count, window_size>(input_arrays.begin());
const std::size_t total_count = window_size * window_count * array_count;
thrust::device_vector<Integer> result(total_count);
thrust::copy(sw.begin(), sw.end(), result.begin());
PRINTER(result);
auto ti_begin = thrust::make_transform_iterator(counting_it, key_generator<Integer, array_size, array_count, window_size>());
auto base_4_ti = thrust::make_transform_iterator(zip(counting_it, sw.begin()), base_n<Integer, base, array_size, array_count, window_size>());
thrust::inclusive_scan_by_key(ti_begin, ti_begin+total_count, base_4_ti, result.begin());
PRINTER(result);
thrust::device_vector<Integer> result_2(total_count);
auto ti_pre_sort = thrust::make_transform_iterator(counting_it, pre_sort<Integer, array_size, array_count, window_size>());
thrust::gather(ti_pre_sort,
ti_pre_sort+total_count,
result.begin(),
result_2.begin());
PRINTER(result_2);
thrust::device_vector<Integer> sort_keys_1(total_count);
thrust::device_vector<Integer> sort_keys_2(total_count);
auto zip_begin = zip(sort_keys_1.begin(),sort_keys_2.begin());
thrust::transform(counting_it,
counting_it+total_count,
zip_begin,
generate_sort_keys<Integer, array_size, array_count, window_size>());
thrust::stable_sort_by_key(result_2.begin(), result_2.end(), zip_begin);
thrust::stable_sort_by_key(zip_begin, zip_begin+total_count, result_2.begin());
PRINTER(result_2);
thrust::device_vector<Integer> key_counts(total_count);
thrust::device_vector<Integer> sort_keys_1_reduced(total_count);
thrust::device_vector<Integer> sort_keys_2_reduced(total_count);
// count how often each sub array occurs
auto zip_count_begin = zip(sort_keys_1.begin(), sort_keys_2.begin(), result_2.begin());
auto new_end = thrust::reduce_by_key(zip_count_begin,
zip_count_begin + total_count,
thrust::constant_iterator<Integer>(1),
zip(sort_keys_1_reduced.begin(), sort_keys_2_reduced.begin(), thrust::make_discard_iterator()),
key_counts.begin()
);
std::size_t new_size = new_end.second - key_counts.begin();
key_counts.resize(new_size);
sort_keys_1_reduced.resize(new_size);
sort_keys_2_reduced.resize(new_size);
PRINTER(key_counts);
PRINTER(sort_keys_1_reduced);
PRINTER(sort_keys_2_reduced);
auto log_ti = thrust::make_transform_iterator (key_counts.begin(), calculate_log());
thrust::device_vector<Real> log_result(new_size);
auto zip_keys_reduced_begin = zip(sort_keys_1_reduced.begin(), sort_keys_2_reduced.begin());
auto log_end = thrust::reduce_by_key(zip_keys_reduced_begin,
zip_keys_reduced_begin + new_size,
log_ti,
zip(sort_keys_1.begin(),thrust::make_discard_iterator()),
log_result.begin()
);
std::size_t final_size = log_end.second - log_result.begin();
log_result.resize(final_size);
sort_keys_1.resize(final_size);
PRINTER(log_result);
thrust::device_vector<Real> final_result(final_size);
auto final_end = thrust::reduce_by_key(sort_keys_1.begin(),
sort_keys_1.begin() + final_size,
log_result.begin(),
thrust::make_discard_iterator(),
final_result.begin(),
thrust::equal_to<Integer>(),
thrust::minimum<Real>()
);
final_result.resize(final_end.second-final_result.begin());
PRINTER(final_result);
}
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start);
std::cout << "took " << duration.count()/runs << "milliseconds" << std::endl;
return 0;
}
compile using
nvcc -std=c++11 conditional_entropy.cu -o benchmark -DARRAY_SIZE=1000 -DARRAY_COUNT=1000 -DWINDOW_SIZE=10 && ./benchmark
This configuration takes 133 milliseconds on my GPU (GTX 680), so around 0.1 milliseconds per array.
The implementation can definitely be optimized, e.g. using a precomputed lookup table for the base-4 conversion and maybe some of the thrust calls can be avoided.

Hash function for sequences of tuples - Duplicate elimination

I want to eliminate duplicates of sequences of tuples. These sequences look like this:
1. (1,1)(2,5,9)(2,3,10)(2,1)
2. (1,2)(3,2,1)(2,5,9)(2,1)
3. (1,1)(2,5,9)(2,3,10)(2,1)
4. (2,1)(2,3,10)(2,5,9)(1,1)
5. (2,1)(2,3,10)(1,1)
6. (1,1)(2,5,9)(2,3,10)(2,2)
The number of entries per tuple varies as does the number of tuple per sequence. Since I have lots of sequences which I ultimately want to deal with in parallel using CUDA, I thought that calculating a hash per sequence would be an efficient way to identify duplicate sequences.
How would such a hash function be implemented?
And: How big is the collision probability of two different sequences producing the same final hash value?
I have two requirements which I am not sure if they can be fulfilled:
a) Can such a hash be calculated on the fly?
I want to avoid the storage of the full sequences, therefore I'd like to do something like this:
h = 0; // init hash
...
h = h + hash(1,1);
...
h = h + hash(2,5,9);
...
h = h + hash(2,3,10)
...
h = h + hash(2,1)
where + is any operator which combines hashes of tuples.
b) Can such a hash be independent of the "direction" of the sequence?
In the above example sequences 1. and 4. consist of the same tuples but the order is reversed, but I like to identify them as duplicates.
For hashing you can use std::hash<std::size_t> or whatever (unsigned) integer type you use. The collision probability is somewhere around 1.0/std::numeric_limits<std::size_t>::max(), which is very small. To make the usability a bit better you can write your own tuple hasher:
namespace hash_tuple
{
std::size_t hash_combine(std::size_t l, std::size_t r) noexcept
{
constexpr static const double phi = 1.6180339887498949025257388711906969547271728515625;
static const double val = std::pow(2ULL, 4ULL * sizeof(std::size_t));
static const std::size_t magic_number = val / phi;
l ^= r + magic_number + (l << 6) + (l >> 2);
return l;
}
template <typename TT>
struct hash
{
std::size_t operator()(TT const& tt) const noexcept
{
return std::hash<TT>()(tt);
}
};
namespace
{
template <class TupleT, std::size_t Index = std::tuple_size<TupleT>::value - 1ULL>
struct HashValueImpl
{
static std::size_t apply(std::size_t seed, TupleT const& tuple) noexcept
{
seed = HashValueImpl<TupleT, Index - 1ULL>::apply(seed, tuple);
seed = hash_combine(seed, std::get<Index>(tuple));
return seed;
}
};
template <class TupleT>
struct HashValueImpl<TupleT, 0ULL>
{
static std::size_t apply(size_t seed, TupleT const& tuple) noexcept
{
seed = hash_combine(seed, std::get<0>(tuple));
return seed;
}
};
}
template <typename ... TT>
struct hash<std::tuple<TT...>>
{
std::size_t operator()(std::tuple<TT...> const& tt) const noexcept
{
std::size_t seed = 0;
seed = HashValueImpl<std::tuple<TT...> >::apply(seed, tt);
return seed;
}
};
}
Thus you can write code like
using hash_tuple::hash;
auto mytuple = std::make_tuple(3, 2, 1, 0);
auto hasher = hash<decltype(mytuple)>();
std::size_t mytuple_hash = hasher(mytuple);
To fulfill your constraint b we need for each tuple 2 hashes, the normal hash and the hash of the reversed tuple.
So at first we need to deal with how to reverse one:
template<typename T, typename TT = typename std::remove_reference<T>::type, size_t... I>
auto reverse_impl(T&& t, std::index_sequence<I...>)
-> std::tuple<typename std::tuple_element<sizeof...(I) - 1 - I, TT>::type...>
{
return std::make_tuple(std::get<sizeof...(I) - 1 - I>(std::forward<T>(t))...);
}
template<typename T, typename TT = typename std::remove_reference<T>::type>
auto reverse(T&& t)
-> decltype(reverse_impl(std::forward<T>(t),
std::make_index_sequence<std::tuple_size<TT>::value>()))
{
return reverse_impl(std::forward<T>(t),
std::make_index_sequence<std::tuple_size<TT>::value>());
}
Then we can calculate our hashes
auto t0 = std::make_tuple(1, 2, 3, 4, 5, 6);
auto t1 = std::make_tuple(6, 5, 4, 3, 2, 1);
using hash_tuple::hash;
auto hasher = hash<decltype(t0)>();
std::size_t t0hash = hasher(t0);
std::size_t t1hash = hasher(t1);
std::size_t t0hsah = hasher(reverse(t0));
std::size_t t1hsah = hasher(reverse(t1));
And if hash_combine(t0hash, t1hash) == hash_combine(t1hsah, t0hsah) you found what you want. You can apply this "inner-tuple-hashing-mechanic" to the hashes of many tuples pretty easily. Play around with this online!