CUDA Thrust - Counting matching subarrays - c++

I'm trying to figure out if it's possible to efficiently calculate the conditional entropy of a set of numbers using CUDA. You can calculate the conditional entropy by dividing an array into windows, then counting the number of matching subarrays/substrings for different lengths. For each subarray length, you calculate the entropy by adding together the matching subarray counts times the log of those counts. Then, whatever you get as the minimum entropy is the conditional entropy.
To give a more clear example of what I mean, here is full calculation:
The initial array is [1,2,3,5,1,2,5]. Assuming the window size is 3, this must be divided into five windows: [1,2,3], [2,3,5], [3,5,1], [5,1,2], and [1,2,5].
Next, looking at each window, we want to find the matching subarrays for each length.
The subarrays of length 1 are [1],[2],[3],[5],[1]. There are two 1s, and one of each other number. So the entropy is log(2)2 + 4(log(1)*1) = 0.6.
The subarrays of length 2 are [1,2], [2,3], [3,5], [5,1], and [1,2]. There are two [1,2]s, and four unique subarrays. The entropy is the same as length 1, log(2)2 + 4(log(1)*1) = 0.6.
The subarrays of length 3 are the full windows: [1,2,3], [2,3,5], [3,5,1], [5,1,2], and [1,2,5]. All five windows are unique, so the entropy is 5*(log(1)*1) = 0.
The minimum entropy is 0, meaning it is the conditional entropy for this array.
This can also be presented as a tree, where the counts at each node represent how many matches exist. The entropy for each subarray length is equivalent to the entropy for each level of the tree.
If possible, I'd like to perform this calculation on many arrays at once, and also perform the calculation itself in parallel. Does anyone have suggestions on how to accomplish this? Could thrust be useful? Please let me know if there is any additional information I should provide.

I tried solving your problem using thrust. It works, but it results in a lot of thrust calls.
Since your input size is rather small, you should process multiple arrays in parallel.
However, doing this results in a lot of book-keeping effort, you will see this in the following code.
Your input range is limited to [1,5], which is equivalent to [0,4]. The general idea is that (theoretically) any tuple out of this range (e.g. {1,2,3} can be represented as a number in base 4 (e.g. 1+2*4+3*16 = 57).
In practice we are limited by the size of the integer type. For a 32bit unsigned integer this will lead to a maximum tuple size of 16. This is also the maximum window size the following code can handle (changing to a 64bit unsigned integer will lead to a maximum tuple size of 32).
Let's assume the input data is structured like this:
We have 2 arrays we want to process in parallel, each array is of size 5 and window size is 3.
{{0,0,3,4,4},{0,2,1,1,3}}
We can now generate all windows:
{{0,0,3},{0,3,4},{3,4,4}},{{0,2,1},{2,1,1},{1,1,3}}
Using a per tuple prefix sum and applying the aforementioned representation of each tuple as a single base-4 number, we get:
{{0,0,48},{0,12,76},{3,19,83}},{{0,8,24},{2,6,22},{1,5,53}}
Now we reorder the values so we have the numbers which represent a subarray of a specific length next to each other:
{{0,0,3},{0,12,19},{48,76,83}},{0,2,1},{8,6,5},{24,22,53}}
We then sort within each group:
{{0,0,3},{0,12,19},{48,76,83}},{0,1,2},{5,6,8},{22,24,53}}
Now we can count how often a number occurs in each group:
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Applying the log-formula results in
0.60206,0,0,0,0,0
Now we fetch the minimum value per array:
0,0
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/functional.h>
#include <thrust/random.h>
#include <iostream>
#include <thrust/tuple.h>
#include <thrust/reduce.h>
#include <thrust/scan.h>
#include <thrust/gather.h>
#include <thrust/sort.h>
#include <math.h>
#include <chrono>
#ifdef PRINT_ENABLED
#define PRINTER(name) print(#name, (name))
#else
#define PRINTER(name)
#endif
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
std::cout << name << ":\t";
thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
std::cout << std::endl;
}
template <typename Integer, Integer Min, Integer Max>
struct random_filler
{
__device__
Integer operator()(std::size_t index) const
{
thrust::default_random_engine rng;
thrust::uniform_int_distribution<Integer> dist(Min, Max);
rng.discard(index);
return dist(rng);
}
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename T,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
__device__ __inline__
thrust::tuple<T,T,T,T> calc_indices(const T& i0)
{
const T i1 = i0 / PerArrayCount;
const T i2 = i0 % PerArrayCount;
const T i3 = i2 / WindowSize;
const T i4 = i2 % WindowSize;
return thrust::make_tuple(i1,i2,i3,i4);
}
template <typename Iterator,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount,
std::size_t TotalCount = PerArrayCount * ArrayCount
>
class sliding_window
{
public:
typedef typename thrust::iterator_difference<Iterator>::type difference_type;
struct window_functor : public thrust::unary_function<difference_type,difference_type>
{
__host__ __device__
difference_type operator()(const difference_type& i0) const
{
auto t = calc_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::get<0>(t) * ArraySize + thrust::get<2>(t) + thrust::get<3>(t);
}
};
typedef typename thrust::counting_iterator<difference_type> CountingIterator;
typedef typename thrust::transform_iterator<window_functor, CountingIterator> TransformIterator;
typedef typename thrust::permutation_iterator<Iterator,TransformIterator> PermutationIterator;
typedef PermutationIterator iterator;
sliding_window(Iterator first) : first(first){}
iterator begin(void) const
{
return PermutationIterator(first, TransformIterator(CountingIterator(0), window_functor()));
}
iterator end(void) const
{
return begin() + TotalCount;
}
protected:
Iterator first;
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename Iterator>
sliding_window<Iterator, ArraySize, ArrayCount, WindowSize>
make_sliding_window(Iterator first)
{
return sliding_window<Iterator, ArraySize, ArrayCount, WindowSize>(first);
}
template <typename KeyType,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize>
struct key_generator : thrust::unary_function<KeyType, thrust::tuple<KeyType,KeyType> >
{
__device__
thrust::tuple<KeyType,KeyType> operator()(std::size_t i0) const
{
auto t = calc_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::make_tuple(thrust::get<0>(t),thrust::get<2>(t));
}
};
template <typename Integer,
std::size_t Base,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize>
struct base_n : thrust::unary_function<thrust::tuple<Integer, Integer>, Integer>
{
__host__ __device__
Integer operator()(const thrust::tuple<Integer, Integer> t) const
{
const auto i = calc_indices<ArraySize, ArrayCount, WindowSize>(thrust::get<0>(t));
// ipow could be optimized by precomputing a lookup table at compile time
const auto result = thrust::get<1>(t)*ipow(Base, thrust::get<3>(i));
return result;
}
// taken from http://stackoverflow.com/a/101613/678093
__host__ __device__ __inline__
Integer ipow(Integer base, Integer exp) const
{
Integer result = 1;
while (exp)
{
if (exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename T,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
__device__ __inline__
thrust::tuple<T,T,T,T> calc_sort_indices(const T& i0)
{
const T i1 = i0 % PerArrayCount;
const T i2 = i0 / PerArrayCount;
const T i3 = i1 % WindowCount;
const T i4 = i1 / WindowCount;
return thrust::make_tuple(i1,i2,i3,i4);
}
template <typename Integer,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
struct pre_sort : thrust::unary_function<Integer, Integer>
{
__device__
Integer operator()(Integer i0) const
{
auto t = calc_sort_indices<ArraySize, ArrayCount,WindowSize>(i0);
const Integer i_result = ( thrust::get<2>(t) * WindowSize + thrust::get<3>(t) ) + thrust::get<1>(t) * PerArrayCount;
return i_result;
}
};
template <typename Integer,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
struct generate_sort_keys : thrust::unary_function<Integer, Integer>
{
__device__
thrust::tuple<Integer,Integer> operator()(Integer i0) const
{
auto t = calc_sort_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::make_tuple( thrust::get<1>(t), thrust::get<3>(t));
}
};
template<typename... Iterators>
__host__ __device__
thrust::zip_iterator<thrust::tuple<Iterators...>> zip(Iterators... its)
{
return thrust::make_zip_iterator(thrust::make_tuple(its...));
}
struct calculate_log : thrust::unary_function<std::size_t, float>
{
__host__ __device__
float operator()(std::size_t i) const
{
return i*log10f(i);
}
};
int main()
{
typedef int Integer;
typedef float Real;
const std::size_t array_count = ARRAY_COUNT;
const std::size_t array_size = ARRAY_SIZE;
const std::size_t window_size = WINDOW_SIZE;
const std::size_t window_count = array_size - (window_size-1);
const std::size_t input_size = array_count * array_size;
const std::size_t base = 4;
thrust::device_vector<Integer> input_arrays(input_size);
thrust::counting_iterator<Integer> counting_it(0);
thrust::transform(counting_it,
counting_it + input_size,
input_arrays.begin(),
random_filler<Integer,0,base>());
PRINTER(input_arrays);
const int runs = 100;
auto start = std::chrono::high_resolution_clock::now();
for (int k = 0 ; k < runs; ++k)
{
auto sw = make_sliding_window<array_size, array_count, window_size>(input_arrays.begin());
const std::size_t total_count = window_size * window_count * array_count;
thrust::device_vector<Integer> result(total_count);
thrust::copy(sw.begin(), sw.end(), result.begin());
PRINTER(result);
auto ti_begin = thrust::make_transform_iterator(counting_it, key_generator<Integer, array_size, array_count, window_size>());
auto base_4_ti = thrust::make_transform_iterator(zip(counting_it, sw.begin()), base_n<Integer, base, array_size, array_count, window_size>());
thrust::inclusive_scan_by_key(ti_begin, ti_begin+total_count, base_4_ti, result.begin());
PRINTER(result);
thrust::device_vector<Integer> result_2(total_count);
auto ti_pre_sort = thrust::make_transform_iterator(counting_it, pre_sort<Integer, array_size, array_count, window_size>());
thrust::gather(ti_pre_sort,
ti_pre_sort+total_count,
result.begin(),
result_2.begin());
PRINTER(result_2);
thrust::device_vector<Integer> sort_keys_1(total_count);
thrust::device_vector<Integer> sort_keys_2(total_count);
auto zip_begin = zip(sort_keys_1.begin(),sort_keys_2.begin());
thrust::transform(counting_it,
counting_it+total_count,
zip_begin,
generate_sort_keys<Integer, array_size, array_count, window_size>());
thrust::stable_sort_by_key(result_2.begin(), result_2.end(), zip_begin);
thrust::stable_sort_by_key(zip_begin, zip_begin+total_count, result_2.begin());
PRINTER(result_2);
thrust::device_vector<Integer> key_counts(total_count);
thrust::device_vector<Integer> sort_keys_1_reduced(total_count);
thrust::device_vector<Integer> sort_keys_2_reduced(total_count);
// count how often each sub array occurs
auto zip_count_begin = zip(sort_keys_1.begin(), sort_keys_2.begin(), result_2.begin());
auto new_end = thrust::reduce_by_key(zip_count_begin,
zip_count_begin + total_count,
thrust::constant_iterator<Integer>(1),
zip(sort_keys_1_reduced.begin(), sort_keys_2_reduced.begin(), thrust::make_discard_iterator()),
key_counts.begin()
);
std::size_t new_size = new_end.second - key_counts.begin();
key_counts.resize(new_size);
sort_keys_1_reduced.resize(new_size);
sort_keys_2_reduced.resize(new_size);
PRINTER(key_counts);
PRINTER(sort_keys_1_reduced);
PRINTER(sort_keys_2_reduced);
auto log_ti = thrust::make_transform_iterator (key_counts.begin(), calculate_log());
thrust::device_vector<Real> log_result(new_size);
auto zip_keys_reduced_begin = zip(sort_keys_1_reduced.begin(), sort_keys_2_reduced.begin());
auto log_end = thrust::reduce_by_key(zip_keys_reduced_begin,
zip_keys_reduced_begin + new_size,
log_ti,
zip(sort_keys_1.begin(),thrust::make_discard_iterator()),
log_result.begin()
);
std::size_t final_size = log_end.second - log_result.begin();
log_result.resize(final_size);
sort_keys_1.resize(final_size);
PRINTER(log_result);
thrust::device_vector<Real> final_result(final_size);
auto final_end = thrust::reduce_by_key(sort_keys_1.begin(),
sort_keys_1.begin() + final_size,
log_result.begin(),
thrust::make_discard_iterator(),
final_result.begin(),
thrust::equal_to<Integer>(),
thrust::minimum<Real>()
);
final_result.resize(final_end.second-final_result.begin());
PRINTER(final_result);
}
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start);
std::cout << "took " << duration.count()/runs << "milliseconds" << std::endl;
return 0;
}
compile using
nvcc -std=c++11 conditional_entropy.cu -o benchmark -DARRAY_SIZE=1000 -DARRAY_COUNT=1000 -DWINDOW_SIZE=10 && ./benchmark
This configuration takes 133 milliseconds on my GPU (GTX 680), so around 0.1 milliseconds per array.
The implementation can definitely be optimized, e.g. using a precomputed lookup table for the base-4 conversion and maybe some of the thrust calls can be avoided.

Related

Get linear index for multidimensional access

I'm trying to implement a multidimensional std::array, which hold a contigous array of memory of size Dim-n-1 * Dim-n-2 * ... * Dim-1. For that, i use private inheritance from std::array :
constexpr std::size_t factorise(std::size_t value)
{
return value;
}
template<typename... Ts>
constexpr std::size_t factorise(std::size_t value, Ts... values)
{
return value * factorise(values...);
}
template<typename T, std::size_t... Dims>
class multi_array : std::array<T, factorise(Dims...)>
{
// using directive and some stuff here ...
template<typename... Indexes>
reference operator() (Indexes... indexes)
{
return base_type::operator[] (linearise(std::make_integer_sequence<Dims...>(), indexes...)); // Not legal, just to explain the need.
}
}
For instance, multi_array<5, 2, 8, 12> arr; arr(2, 1, 4, 3) = 12; will access to the linear index idx = 2*(5*2*8) + 1*(2*8) + 4*(8) + 3.
I suppose that i've to use std::integer_sequence, passing an integer sequence to the linearise function and the list of the indexes, but i don't know how to do it. What i want is something like :
template<template... Dims, std::size_t... Indexes>
auto linearise(std::integer_sequence<int, Dims...> dims, Indexes... indexes)
{
return (index * multiply_but_last(dims)) + ...;
}
With multiply_but_last multiplying all remaining dimension but the last (i see how to implement with a constexpr variadic template function such as for factorise, but i don't understand if it is possible with std::integer_sequence).
I'm a novice in variadic template manipulation and std::integer_sequence and I think that I'm missing something. Is it possible to get the linear index computation without overhead (i.e. like if the operation has been hand-writtent) ?
Thanks you very much for your help.
Following might help:
#include <array>
#include <cassert>
#include <iostream>
template <std::size_t, typename T> using alwaysT_t = T;
template<typename T, std::size_t ... Dims>
class MultiArray
{
public:
const T& operator() (alwaysT_t<Dims, std::size_t>... indexes) const
{
return values[computeIndex(indexes...)];
}
T& operator() (alwaysT_t<Dims, std::size_t>... indexes)
{
return values[computeIndex(indexes...)];
}
private:
size_t computeIndex(alwaysT_t<Dims, std::size_t>... indexes_args) const
{
constexpr std::size_t dimensions[] = {Dims...};
std::size_t indexes[] = {indexes_args...};
size_t index = 0;
size_t mul = 1;
for (size_t i = 0; i != sizeof...(Dims); ++i) {
assert(indexes[i] < dimensions[i]);
index += indexes[i] * mul;
mul *= dimensions[i];
}
assert(index < (Dims * ...));
return index;
}
private:
std::array<T, (Dims * ...)> values;
};
Demo
I replaced your factorize by fold expression (C++17).
I have a very simple function that converts multi-dimensional index to 1D index.
#include <initializer_list>
template<typename ...Args>
inline constexpr size_t IDX(const Args... params) {
constexpr size_t NDIMS = sizeof...(params) / 2 + 1;
std::initializer_list<int> args{params...};
auto ibegin = args.begin();
auto sbegin = ibegin + NDIMS;
size_t res = 0;
for (int dim = 0; dim < NDIMS; ++dim) {
size_t factor = dim > 0 ? sbegin[dim - 1] : 0;
res = res * factor + ibegin[dim];
}
return res;
}
You may need to add "-Wno-c++11-narrowing" flag to your compiler if you see a warning like non-constant-expression cannot be narrowed from type 'int'.
Example usage:
2D array
int array2D[rows*cols];
// Usually, you need to access the element at (i,j) like this:
int elem = array2D[i * cols + j]; // = array2D[i,j]
// Now, you can do it like this:
int elem = array2D[IDX(i,j,cols)]; // = array2D[i,j]
3D array
int array3D[rows*cols*depth];
// Usually, you need to access the element at (i,j,k) like this:
int elem = array3D[(i * cols + j) * depth + k]; // = array3D[i,j,k]
// Now, you can do it like this:
int elem = array3D[IDX(i,j,k,cols,depth)]; // = array3D[i,j,k]
ND array
// shapes = {s1,s2,...,sn}
T arrayND[s1*s2*...*sn]
// indices = {e1,e2,...,en}
T elem = arrayND[IDX(e1,e2,...,en,s2,...,sn)] // = arrayND[e1,e2,...,en]
Note that the shape parameters passed to IDX(...) begins at the second shape, which is s2 in this case.
BTW: This implementation requires C++ 14.

Obfuscate std::array using constexpr

I'm looking for a small function that is able to transform a std::array by adding increasing values. The function must be a compile time function.
I was able to write a small constexpr function which does so for an array of length 3, but I was unable to generalize it to std::arrays of arbitrary lengths. I also failed to generalize it to contain something different than chars.
Does anyone knows how to do it?
#include <array>
#include <iostream>
#include <valarray>
constexpr std::array<char,3> obfuscate(const std::array<char,3>& x) {
return std::array<char, 3>{x.at(0)+1, x.at(1) + 2, x.at(2) + 3 };
}
/* Won't compile
template<typename T,typename S, template<typename, typename> L=std::array<T, U>>
constexpr L<T,U> obfuscate(const L<T, U>& x) {
return {x.at(0) + 1, x.at(0) + 2, x.at(0) + 3 };
}
*/
std::ostream& operator<<(std::ostream& str, const std::array<char, 3>& x) {
for (auto i = 0; i < 3; i++) {
str << x.at(i);
}
return str;
}
int main(int argc, char** args) {
std::array<char, 3> x{ 'a','b','c' };
std::cout << x << std::endl;
std::cout << obfuscate(x) << std::endl;
// std::cout << obfuscate<3>(x) << std::endl;
}
You can use std::index_sequence:
template<class T, std::size_t N, std::size_t... Is>
constexpr std::array<T, N> helper (const std::array<T, N> &x, std::index_sequence<Is...>) {
return std::array<T, N>{static_cast<T>(x.at(Is)+Is+1)...};
}
template<class T, std::size_t N>
constexpr std::array<T, N> obfuscate(const std::array<T, N> &x) {
return helper(x, std::make_index_sequence<N>{});
}
There are a few methods that use tuple packs, these are great except that MSVC has a performance problem compiling large strings.
I've found this compromise works well in MSVC.
template<typename I>
struct encrypted_string;
template<size_t... I>
struct encrypted_string<std::index_sequence<I...>>
{
std::array<char, sizeof...(I)+1> buf;
constexpr static char encrypt(char c) { return c ^ 0x41; }
constexpr static char decrypt(char c) { return encrypt(c); }
constexpr explicit __forceinline encrypted_string(const char* str)
: buf{ encrypt(str[I])... } { }
inline const char* decrypt()
{
for (size_t i = 0; i < sizeof...(I); ++i)
{
buf[i] = decrypt(buf[i]);
}
buf[sizeof...(I)] = 0;
return buf.data();
}
};
#define enc(str) encrypted_string<std::make_index_sequence<sizeof(str)>>(str)
And somewhere later
auto stringo = enc(R"(
kernel void prg_PassThru_src(const global unsigned short * restrict A, int srcstepA, int srcoffsetA,
global float * restrict Beta, int srcstepBeta, int srcoffsetBeta,
int rows, int cols) {
int x = get_global_id(0);
int y0 = get_global_id(1);
if (x < cols) {
int srcA_index = mad24(y0, srcstepA / 2, x + srcoffsetA / 2);
int srcBeta_index = mad24(y0, srcstepBeta / 4, x + srcoffsetBeta / 4);
Beta[srcBeta_index] = A[srcA_index];
}
}
//somewhere later
cv::ocl::ProgramSource programSource(stringo.decrypt());
You can see this guy's talk for more sophisticated methods:
https://www.blackhat.com/docs/eu-14/materials/eu-14-Andrivet-C-plus-plus11-Metaprogramming-Applied-To-software-Obfuscation.pdf

How to make a "variadic" vector like class

I am trying to make class that acts as multidimensional vector. It doesn't have to do anything fancy. I basically want to have a "container" class foo where I can access elements by foo[x][y][z]. Now I would also need similar classes for foo[x][y] and foo[x]. Which lead me to ponder about the following (more general) question, is there a way to make something like this where you can just initialize as foo A(a,b,c,...) for any n number of arguments and get a n-dimensional vector with elements accessible by [][][]...? Below the class I have for (in example) the four-dimensional case.
First the header
#ifndef FCONTAINER_H
#define FCONTAINER_H
#include <iostream>
using namespace std;
class Fcontainer
{
private:
unsigned dim1, dim2, dim3, dim4 ;
double* data;
public:
Fcontainer(unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4);
~Fcontainer();
Fcontainer(const Fcontainer& m);
Fcontainer& operator= (const Fcontainer& m);
double& operator() (unsigned const dim1, unsigned const dim2, unsigned const dim3, unsigned const dim4);
double const& operator() (unsigned const dim1, unsigned const dim2, unsigned const dim3, unsigned const dim4) const;
};
#endif // FCONTAINER_H
Now the cpp:
#include "fcontainer.hpp"
Fcontainer::Fcontainer(unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4)
{
dim1 = dims1; dim2 = dims2; dim3 = dims3; dim4 = dims4;
if (dims1 == 0 || dims2 == 0 || dims3 == 0 || dims4 == 0)
throw std::invalid_argument("Container constructor has 0 size");
data = new double[dims1 * dims2 * dims3 * dims4];
}
Fcontainer::~Fcontainer()
{
delete[] data;
}
double& Fcontainer::operator() (unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4)
{
if (dims1 >= dim1 || dims2 >= dim2 || dims3 >= dim3 || dims4 >= dim4)
throw std::invalid_argument("Container subscript out of bounds");
return data[dims1*dim2*dims3*dim4 + dims2*dim3*dim4 + dim3*dim4 + dims4];
}
double const& Fcontainer::operator() (unsigned const dims1, unsigned const dims2, unsigned const dims3, unsigned const dims4) const
{
if(dims1 >= dim1 || dims2 >= dim2 || dims3 >= dim3 || dims4 >= dim4)
throw std::invalid_argument("Container subscript out of bounds");
return data[dims1*dim2*dims3*dim4 + dims2*dim3*dim4 + dim3*dim4 + dims4];
}
So I want to expand this to an arbitrary amount of dimensions. I suppose it will take something along the lines of a variadic template or an std::initializer_list but I am not clear on how to approach this( for this problem).
Messing around in Visual Studio for a little while, I came up with this nonsense:
template<typename T>
class Matrix {
std::vector<size_t> dimensions;
std::unique_ptr<T[]> _data;
template<typename ... Dimensions>
size_t apply_dimensions(size_t dim, Dimensions&& ... dims) {
dimensions.emplace_back(dim);
return dim * apply_dimensions(std::forward<Dimensions>(dims)...);
}
size_t apply_dimensions(size_t dim) {
dimensions.emplace_back(dim);
return dim;
}
public:
Matrix(std::vector<size_t> dims) : dimensions(std::move(dims)) {
size_t size = flat_size();
_data = std::make_unique<T[]>(size);
}
template<typename ... Dimensions>
Matrix(size_t dim, Dimensions&&... dims) {
size_t size = apply_dimensions(dim, std::forward<Dimensions>(dims)...);
_data = std::make_unique<T[]>(size);
}
T & operator()(std::vector<size_t> const& indexes) {
if(indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
T const& operator()(std::vector<size_t> const& indexes) const {
if (indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
template<typename ... Indexes>
T & operator()(size_t idx, Indexes&& ... indexes) {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
template<typename ... Indexes>
T const& operator()(size_t idx, Indexes&& ... indexes) const {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
T & at(size_t flat_index) {
return _data[flat_index];
}
T const& at(size_t flat_index) const {
return _data[flat_index];
}
size_t dimension_size(size_t dim) const {
return dimensions[dim];
}
size_t num_of_dimensions() const {
return dimensions.size();
}
size_t flat_size() const {
size_t size = 1;
for (size_t dim : dimensions)
size *= dim;
return size;
}
private:
size_t get_flat_index(std::vector<size_t> const& indexes) const {
size_t dim = 0;
size_t flat_index = 0;
for (size_t index : indexes) {
flat_index += get_offset(index, dim++);
}
return flat_index;
}
template<typename ... Indexes>
size_t get_flat_index(size_t dim, size_t index, Indexes&& ... indexes) const {
return get_offset(index, dim) + get_flat_index(dim + 1, std::forward<Indexes>(indexes)...);
}
size_t get_flat_index(size_t dim, size_t index) const {
return get_offset(index, dim);
}
size_t get_offset(size_t index, size_t dim) const {
if (index >= dimensions[dim])
throw std::runtime_error("Index out of Bounds");
for (size_t i = dim + 1; i < dimensions.size(); i++) {
index *= dimensions[i];
}
return index;
}
};
Let's talk about what this code accomplishes.
//private:
template<typename ... Dimensions>
size_t apply_dimensions(size_t dim, Dimensions&& ... dims) {
dimensions.emplace_back(dim);
return dim * apply_dimensions(std::forward<Dimensions>(dims)...);
}
size_t apply_dimensions(size_t dim) {
dimensions.emplace_back(dim);
return dim;
}
public:
Matrix(std::vector<size_t> dims) : dimensions(std::move(dims)) {
size_t size = flat_size();
_data = std::make_unique<T[]>(size);
}
template<typename ... Dimensions>
Matrix(size_t dim, Dimensions&&... dims) {
size_t size = apply_dimensions(dim, std::forward<Dimensions>(dims)...);
_data = std::make_unique<T[]>(size);
}
What this code enables us to do is write an initializer for this matrix that takes an arbitrary number of dimensions.
int main() {
Matrix<int> mat{2, 2}; //Yields a 2x2 2D Rectangular Matrix
mat = Matrix<int>{4, 6, 5};//mat is now a 4x6x5 3D Rectangular Matrix
mat = Matrix<int>{9};//mat is now a 9-length 1D array.
mat = Matrix<int>{2, 3, 4, 5, 6, 7, 8, 9};//Why would you do this? (yet it compiles...)
}
And if the number and sizes of the dimensions is only known at runtime, this code will work around that:
int main() {
std::cout << "Input the sizes of each of the dimensions.\n";
std::string line;
std::getline(std::cin, line);
std::stringstream ss(line);
size_t dim;
std::vector<size_t> dimensions;
while(ss >> dim)
dimensions.emplace_back(dim);
Matrix<int> mat{dimensions};//Voila.
}
Then, we want to be able to access arbitrary indexes of this matrix. This code offers two ways to do so: either statically using templates, or variably at runtime.
//public:
T & operator()(std::vector<size_t> const& indexes) {
if(indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
T const& operator()(std::vector<size_t> const& indexes) const {
if (indexes.size() != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
return _data[get_flat_index(indexes)];
}
template<typename ... Indexes>
T & operator()(size_t idx, Indexes&& ... indexes) {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
template<typename ... Indexes>
T const& operator()(size_t idx, Indexes&& ... indexes) const {
if (sizeof...(indexes)+1 != dimensions.size())
throw std::runtime_error("Incorrect number of parameters used to retrieve Matrix Data!");
size_t flat_index = get_flat_index(0, idx, std::forward<Indexes>(indexes)...);
return at(flat_index);
}
And then, in practice:
Matrix<int> mat{6, 5};
mat(5, 2) = 17;
//mat(5, 1, 7) = 24; //throws exception at runtime because of wrong number of dimensions.
mat = Matrix<int>{9, 2, 8};
mat(5, 1, 7) = 24;
//mat(5, 2) = 17; //throws exception at runtime because of wrong number of dimensions.
And this works fine with runtime-dynamic indexing:
std::vector<size_t> indexes;
/*...*/
mat(indexes) = 54; //Will throw if index count is wrong, will succeed otherwise
There are a number of other functions that this kind of object might want, like a resize method, but choosing how to implement that is a high-level design decision. I've also left out tons of other potentially valuable implementation details (like an optimizing move-constructor, a comparison operator, a copy constructor) but this should give you a pretty good idea of how to start.
EDIT:
If you want to avoid use of templates entirely, you can cut like half of the code provided here, and just use the methods/constructor that uses std::vector<size_t> to provide dimensions/index data. If you don't need the ability to dynamically adapt at runtime to the number of dimensions, you can remove the std::vector<size_t> overloads, and possibly even make the number of dimensions a template argument for the class itself (which would enable you to use size_t[] or std::array[size_t, N] to store dimensional data).
Well, assuming you care about efficiency at all, you probably want to store all of the elements in a contiguous manner regardless. So you probably want to do something like:
template <std::size_t N, class T>
class MultiArray {
MultiArray(const std::array<std::size_t, N> sizes)
: m_sizes(sizes)
, m_data.resize(product(m_sizes)) {}
std::array<std::size_t, N> m_sizes;
std::vector<T> m_data;
};
The indexing part is where it gets kind of fun. Basically, if you want a[1][2][3] etc to work, you have to have a return some kind of proxy object, that has its own operator[]. Each one would have to be aware of its own rank. Each time you do [] it returns a proxy letting you specify the next index.
template <std::size_t N, class T>
class MultiArray {
// as before
template <std::size_t rank>
class Indexor {
Indexor(MultiArray& parent, const std::array<std::size_t, N>& indices = {})
: m_parent(parent), m_indices(indices) {}
auto operator[](std::size_t index) {
m_indices[rank] = index;
return Indexor<rank+1>(m_indices, m_parent);
}
std::array<std::size_t, N> m_indices;
MultiArray& m_parent;
};
auto operator[](std::size_t index) {
return Indexor<0>(*this)[index];
}
}
Finally, you have a specialization for when you're done with the last index:
template <>
class Indexor<N-1> { // with obvious constructor
auto operator[](std::size_t index) {
m_indices[N-1] = index;
return m_parent.m_data[indexed_product(m_indices, m_parent.m_sizes)];
}
std::array<std::size_t, N> m_indices;
MultiArray& m_parent;
};
Obviously this is a sketch but at this point its just filling out details and getting it to compile. There are other approaches like instead having the indexor object have two iterators and narrowing but that seemed a bit more complex. You also don't need to template the Indexor class and could use a runtime integer instead but that would make it very easy to misuse, having one too many or too few [] would be a runtime error, not compile time.
Edit: you would also be able to initialize this in the way you describe in 17, but not in 14. But in 14 you can just use a function:
template <class ... Ts>
auto make_double_array(Ts ts) {
return MultiArray<sizeof ... Ts, double>(ts...);
}
Edit2: I use product and indexed_product in the implementation. The first is obvious, the second is less so, but hopefully they should be clear. The latter is a function that given an array of dimensions, and an array of indices, would return the position of that element in the array.

Hash function for sequences of tuples - Duplicate elimination

I want to eliminate duplicates of sequences of tuples. These sequences look like this:
1. (1,1)(2,5,9)(2,3,10)(2,1)
2. (1,2)(3,2,1)(2,5,9)(2,1)
3. (1,1)(2,5,9)(2,3,10)(2,1)
4. (2,1)(2,3,10)(2,5,9)(1,1)
5. (2,1)(2,3,10)(1,1)
6. (1,1)(2,5,9)(2,3,10)(2,2)
The number of entries per tuple varies as does the number of tuple per sequence. Since I have lots of sequences which I ultimately want to deal with in parallel using CUDA, I thought that calculating a hash per sequence would be an efficient way to identify duplicate sequences.
How would such a hash function be implemented?
And: How big is the collision probability of two different sequences producing the same final hash value?
I have two requirements which I am not sure if they can be fulfilled:
a) Can such a hash be calculated on the fly?
I want to avoid the storage of the full sequences, therefore I'd like to do something like this:
h = 0; // init hash
...
h = h + hash(1,1);
...
h = h + hash(2,5,9);
...
h = h + hash(2,3,10)
...
h = h + hash(2,1)
where + is any operator which combines hashes of tuples.
b) Can such a hash be independent of the "direction" of the sequence?
In the above example sequences 1. and 4. consist of the same tuples but the order is reversed, but I like to identify them as duplicates.
For hashing you can use std::hash<std::size_t> or whatever (unsigned) integer type you use. The collision probability is somewhere around 1.0/std::numeric_limits<std::size_t>::max(), which is very small. To make the usability a bit better you can write your own tuple hasher:
namespace hash_tuple
{
std::size_t hash_combine(std::size_t l, std::size_t r) noexcept
{
constexpr static const double phi = 1.6180339887498949025257388711906969547271728515625;
static const double val = std::pow(2ULL, 4ULL * sizeof(std::size_t));
static const std::size_t magic_number = val / phi;
l ^= r + magic_number + (l << 6) + (l >> 2);
return l;
}
template <typename TT>
struct hash
{
std::size_t operator()(TT const& tt) const noexcept
{
return std::hash<TT>()(tt);
}
};
namespace
{
template <class TupleT, std::size_t Index = std::tuple_size<TupleT>::value - 1ULL>
struct HashValueImpl
{
static std::size_t apply(std::size_t seed, TupleT const& tuple) noexcept
{
seed = HashValueImpl<TupleT, Index - 1ULL>::apply(seed, tuple);
seed = hash_combine(seed, std::get<Index>(tuple));
return seed;
}
};
template <class TupleT>
struct HashValueImpl<TupleT, 0ULL>
{
static std::size_t apply(size_t seed, TupleT const& tuple) noexcept
{
seed = hash_combine(seed, std::get<0>(tuple));
return seed;
}
};
}
template <typename ... TT>
struct hash<std::tuple<TT...>>
{
std::size_t operator()(std::tuple<TT...> const& tt) const noexcept
{
std::size_t seed = 0;
seed = HashValueImpl<std::tuple<TT...> >::apply(seed, tt);
return seed;
}
};
}
Thus you can write code like
using hash_tuple::hash;
auto mytuple = std::make_tuple(3, 2, 1, 0);
auto hasher = hash<decltype(mytuple)>();
std::size_t mytuple_hash = hasher(mytuple);
To fulfill your constraint b we need for each tuple 2 hashes, the normal hash and the hash of the reversed tuple.
So at first we need to deal with how to reverse one:
template<typename T, typename TT = typename std::remove_reference<T>::type, size_t... I>
auto reverse_impl(T&& t, std::index_sequence<I...>)
-> std::tuple<typename std::tuple_element<sizeof...(I) - 1 - I, TT>::type...>
{
return std::make_tuple(std::get<sizeof...(I) - 1 - I>(std::forward<T>(t))...);
}
template<typename T, typename TT = typename std::remove_reference<T>::type>
auto reverse(T&& t)
-> decltype(reverse_impl(std::forward<T>(t),
std::make_index_sequence<std::tuple_size<TT>::value>()))
{
return reverse_impl(std::forward<T>(t),
std::make_index_sequence<std::tuple_size<TT>::value>());
}
Then we can calculate our hashes
auto t0 = std::make_tuple(1, 2, 3, 4, 5, 6);
auto t1 = std::make_tuple(6, 5, 4, 3, 2, 1);
using hash_tuple::hash;
auto hasher = hash<decltype(t0)>();
std::size_t t0hash = hasher(t0);
std::size_t t1hash = hasher(t1);
std::size_t t0hsah = hasher(reverse(t0));
std::size_t t1hsah = hasher(reverse(t1));
And if hash_combine(t0hash, t1hash) == hash_combine(t1hsah, t0hsah) you found what you want. You can apply this "inner-tuple-hashing-mechanic" to the hashes of many tuples pretty easily. Play around with this online!

More efficient way with multi-dimensional arrays (matrices) using C++

I coded a small simple program that compares the performances of several ways to fill a simple 8x8 matrix with different kind of containers. Here's the following code :
#define MATRIX_DIM 8
#define OCCUR_MAX 100000
static void genHeapAllocatedMatrix(void)
{
int **pPixels = new Pixel *[MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
pPixels[idy] = new Pixel[MATRIX_DIM];
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++)
pPixels[idy][idx] = 42;
}
}
static void genStackAllocatedMatrix(void)
{
std::array<std::array<int, 8>, 8> matrix;
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
matrix[idy][idx] = 42;
}
}
}
static void genStackAllocatedMatrixBasic(void)
{
int matrix[MATRIX_DIM][MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
matrix[idy][idx] = 42;
}
}
}
int main(void)
{
clock_t begin, end;
double time_spent;
begin = clock();
for (type::uint32 idx = 0; idx < OCCUR_MAX; idx++)
{
//genHeapAllocatedMatrix();
genStackAllocatedMatrix();
//genStackAllocatedMatrixBasic();
}
end = clock();
time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
std::cout << "Elapsed time = " << time_spent << std::endl;
return (0);
}
As you can guess the more efficient way is the last one with a simple two-dimentional C array (hard-coded). Of course the worse choice is the number one using heap allocations.
My problem is I want to stock this 2-dimensional array as an attribute in a class. Here's a definition of a custom class that handle a matrix :
template <typename T>
class Matrix
{
public:
Matrix(void);
Matrix(type::uint32 column, type::uint32 row);
Matrix(Matrix const &other);
virtual ~Matrix(void);
public:
Matrix &operator=(Matrix const &other);
bool operator!=(Matrix const &other);
bool operator==(Matrix const &other);
type::uint32 rowCount(void) const;
type::uint32 columnCount(void) const;
void printData(void) const;
T **getData(void) const;
void setData(T **matrix);
private:
type::uint32 m_ColumnCount;
type::uint32 m_RowCount;
T **m_pMatrix;
};
To do the job done I tried the following thing using a cast :
Matrix<int> matrix;
int tab[MATRIX_DIM][MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
tab[idy][idx] = 42;
}
}
matrix.setData((int**)&tab[0][0]);
This code compiles correctly but if I want to print it there's a segmentation fault.
int tab[MATRIX_DIM][MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
tab[idy][idx] = 42;
}
}
int **matrix = (int**)&tab[0][0];
std::cout << matrix[0][0] << std::endl; //Segmentation fault
Is there a possible way to stock this kind of two dimentional array as an attribute without heap allocation?
That's because a two-dimensional array is not an array of pointers.
So, you should use int * for your matrix type, but then of course you will not be able to index it by two dimensions.
Another option is to store a pointer to the array:
int (*matrix)[MATRIX_DIM][MATRIX_DIM];
matrix = &tab;
std::cout << (*matrix)[0][0] << std::endl;
But that doesn't suit well an idea of incapsulating matrix in a class. F better idea would be for the class to allocate the storage itself (possibly in a single heap allocation) and to provide an access to the matrix through methods only (e.g. GetCell(row, col) etc.), without exposing raw pointers.
Measuring the speed of operations on an 8 x 8 array is largely pointless. For a data set as small as that, the cost of the operation will be close to zero and you are mostly measuring setup time, etc.
Timings become important for larger data sets, but you cannot sensibly extrapolate the small set results to the larger set. With larger data sets you will often find that the data exists on multiple memory pages. There is a danger that paging costs will dominate other costs. Very large improvements in efficiency are possible by ensuring that your algorithm processes all (or most) of the data on one page before moving to the next page, rather than constantly swapping pages.
In general, you are best to use the simplest data structures with the least liklihood of programming error and optimising processing algorithms. I say "in general" as extreme cases do exist where small differences in access time matter, but they are rare.
Use a single array to represent the matrix instead of allocating for each index.
I've written a class for this already. Feel free to use it:
#include <vector>
template<typename T, typename Allocator = std::allocator<T>>
class DimArray
{
private:
int Width, Height;
std::vector<T, Allocator> Data;
public:
DimArray(int Width, int Height);
DimArray(T* Data, int Width, int Height);
DimArray(T** Data, int Width, int Height);
virtual ~DimArray() {}
DimArray(const DimArray &da);
DimArray(DimArray &&da);
inline std::size_t size() {return Data.size();}
inline std::size_t size() const {return Data.size();}
inline int width() {return Width;}
inline int width() const {return Width;}
inline int height() {return Height;}
inline int height() const {return Height;}
inline T* operator [](const int Index) {return Data.data() + Height * Index;}
inline const T* operator [](const int Index) const {return Data.data() + Height * Index;}
inline DimArray& operator = (DimArray da);
};
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(int Width, int Height) : Width(Width), Height(Height), Data(Width * Height, 0) {}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(T* Data, int Width, int Height) : Width(Width), Height(Height), Data(Width * Height, 0) {std::copy(&Data[0], &Data[0] + Width * Height, const_cast<T*>(this->Data.data()));}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(T** Data, int Width, int Height) : Width(Width), Height(Height), Data(Width * Height, 0) {std::copy(Data[0], Data[0] + Width * Height, const_cast<T*>(this->Data.data()));}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(const DimArray &da) : Width(da.Width), Height(da.Height), Data(da.Data) {}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(DimArray &&da) : Width(std::move(da.Width)), Height(std::move(da.Height)), Data(std::move(da.Data)) {}
template<typename T, typename Allocator>
DimArray<T, Allocator>& DimArray<T, Allocator>::operator = (DimArray<T, Allocator> da)
{
this->Width = da.Width;
this->Height = da.Height;
this->Data.swap(da.Data);
return *this;
}
Usage:
int main()
{
DimArray<int> Matrix(1000, 1000); //creates a 1000 * 1000 matrix.
Matrix[0][0] = 100; //ability to index it like a multi-dimensional array.
}
More usage:
template<typename T, std::size_t size>
class uninitialised_stack_allocator : public std::allocator<T>
{
private:
alignas(16) T data[size];
public:
typedef typename std::allocator<T>::pointer pointer;
typedef typename std::allocator<T>::size_type size_type;
typedef typename std::allocator<T>::value_type value_type;
template<typename U>
struct rebind {typedef uninitialised_stack_allocator<U, size> other;};
pointer allocate(size_type n, const void* hint = 0) {return static_cast<pointer>(&data[0]);}
void deallocate(void* ptr, size_type n) {}
size_type max_size() const {return size;}
};
int main()
{
DimArray<int, uninitialised_stack_allocator<int, 1000 * 1000>> Matrix(1000, 1000);
}