How to rewrite this code without using boost?

How to rewrite this code without using boost? - c++

My task is to modify Sergiu Dotenco's Well Equidistributed Long-period Linear (WELL) algorithm code to not use boost (not saying boost is bad, but due to some company's policy i have to remove it).
now, Sergiu's WELL is using boost's mpl library, there are quite some logic behind it. So one way is to read up all those, then naturally i would be able to finish the task. The other way is, replacing bit by bit with some best guess.
I'm on the 2nd way to hope this try-and-error approach would be faster. So far I've successfully replaced boost::mpl::if_ and if_c with std::conditional, but hit error when try to update IsPowerOfTwo and Power2Modulo etc, that's why i'm seeking help there.
Below is the code, how to rewrite it without boost, but only c++17?
/**
* Conditional expression of type (r & (r - 1)) == 0 which allows to check
* whether a number #f$r#f$ is of type #f$2^n#f$.
*/
typedef boost::mpl::equal_to<
boost::mpl::bitand_<
boost::mpl::_,
boost::mpl::minus<boost::mpl::_, boost::mpl::int_<1>
>
>,
boost::mpl::int_<0>
> IsPowerOfTwo;
template<class UIntType, UIntType r>
struct Power2Modulo
{
typedef typename boost::mpl::apply<
IsPowerOfTwo,
boost::mpl::integral_c<UIntType, r>
>::type type;
BOOST_STATIC_ASSERT(type::value);
template<class T>
static T calc(T value)
{
return value & (r - 1);
}
};
If possible, pls give a short example on how to call it? I tried to instantiate IsPowerOfTwo or Power2Modulo in main with
Detail::IsPowerOfTwo p0;
or
Detail::Power2Modulo<int, 3> p1;
but got compilation error.
I asked a relevant question before and got some suggestion. However, not familiar to metaprogramming and boost, I don't quite get it.

So, I looked at that library, and created a no-boost fork adapting the WELL pseudo-random-number-generator to pure c++11.
See here on my github: https://github.com/sehe/well-random (the default branch is no-boost).
What is well-random?
well-random is a c++11 fork from
random, a collection of various
pseudo-random number generators and distributions that were intended
to accompany the Boost Random Number Library.
This fork currently only adopted the WELL generator and its tests.
Getting started
The no-boost branch no longer requires any boost library. Instead it
requires c++11. To compile the tests make sure first CMake 2.8 is
installed, then enter :
$ cmake . -DCMAKE_BUILD_TYPE=Release
in your terminal or command prompt on Windows inside project's
directory to generate the appropriate configuration that can be used
to compile the tests using make/nmake or inside an IDE.
What Was Refactored
BOOST_STATIC_ASSERT to STATIC_ASSERT (this becomes obsolete with c++17: http://en.cppreference.com/w/cpp/language/static_assert)
BOOST_STATIC_CONSTANT to static constexpr
BOOST_PREVENT_MACRO_SUBSTITUTION -> PREVENT_MACRO_SUBSTITUTION (trivial macro)
BOOST_THROW_EXCEPTION dropped. NOTE This implies the code cannot be compiled with exception support disabled.
All things related to Boost Test
BOOST_CHECK -> CHECK
#define MESSAGE_PREAMBLE() (std::cerr << __FILE__ << ":" << __LINE__ << " ")
#define CHECK(test) do { if (!(test)) MESSAGE_PREAMBLE() << #test << "\n"; } while (0)
BOOST_CHECK_EQUAL -> CHECK_EQUAL
#define CHECK_EQUAL(expected,actual) do { \
auto&& _e = expected; \
auto&& _a = actual; \
if (_e != _a) \
MESSAGE_PREAMBLE() << "expected:" << #expected << " = " << _e << "\n" \
<< "\tactual:" << #actual << " = " << _a << "\n"; \
} while (0)
BOOST_AUTO_TEST_CASE - dropped. The test driver is main now:
int main() {
//CHECK_EQUAL(16, Detail::shift<2>(64));
//CHECK_EQUAL(64, Detail::shift<-2>(16));
//CHECK_EQUAL(32, Detail::shift<0>(32));
//CHECK(Detail::is_powerof2(512u));
//CHECK(not Detail::is_powerof2(0u));
WellTestCase<Well512a, 0x2b3fe99e>::run();
WellTestCase<Well521a, 0xc9878363>::run();
WellTestCase<Well521b, 0xb75867f6>::run();
WellTestCase<Well607a, 0x7b5043ea>::run();
WellTestCase<Well607b, 0xaedee7da>::run();
WellTestCase<Well800a, 0x2bfe686f>::run();
WellTestCase<Well800b, 0xf009e1bd>::run();
WellTestCase<Well1024a, 0xd07f528c>::run();
WellTestCase<Well1024b, 0x867f7993>::run();
WellTestCase<Well19937a, 0xb33a2cd5>::run();
WellTestCase<Well19937b, 0x191de86a>::run();
WellTestCase<Well19937c, 0x243eaed5>::run();
WellTestCase<Well21701a, 0x7365a269>::run();
WellTestCase<Well23209a, 0x807dacb >::run();
WellTestCase<Well23209b, 0xf1a77751>::run();
WellTestCase<Well44497a, 0xfdd7c07b>::run();
WellTestCase<Well44497b, 0x9406547b>::run();
}
boost::ref -> std::ref (from <functional>)
Boost Range helpers replaced by standard c++ (boost::size, boost::end for arrays)
using ulong_long_type = unsigned long long;
Conditional operators shift and mod have been re-implemented with straight-up SFINAE based on std::enable_if instead of using MPL meta-programming:
template<class UIntType, unsigned N>
struct Left
{
static UIntType shift(UIntType a)
{
return a << N;
}
};
template<class UIntType, unsigned N>
struct Right
{
static UIntType shift(UIntType a)
{
return a >> N;
}
};
template<int N, class UIntType>
inline UIntType shift(UIntType a)
{
return boost::mpl::if_c<(N < 0),
Left<UIntType, -N>,
Right<UIntType, N>
>::type::shift(a);
}
became:
template <typename UIntType, signed N, typename Enable = void> struct Shift;
template <typename UIntType, signed N>
struct Shift<UIntType, N, typename std::enable_if<(N>=0)>::type> {
static UIntType apply(UIntType a) { return a >> N; }
};
template <typename UIntType, signed N>
struct Shift<UIntType, N, typename std::enable_if<(N<0)>::type> {
static UIntType apply(UIntType a) { return a << -N; }
};
template<int N, class UIntType>
inline UIntType shift(UIntType a) { return Shift<UIntType, N>::apply(a); }
Likewise, the Modulo switch (Power2Modulo and GenericModulo) that looked like this:
/**
* Conditional expression of type (r & (r - 1)) == 0 which allows to check
* whether a number #f$r#f$ is of type #f$2^n#f$.
*/
typedef boost::mpl::equal_to<
boost::mpl::bitand_<
boost::mpl::_,
boost::mpl::minus<boost::mpl::_, boost::mpl::int_<1>
>
>,
boost::mpl::int_<0>
> IsPowerOfTwo;
template<class UIntType, UIntType r>
struct Power2Modulo
{
typedef typename boost::mpl::apply<
IsPowerOfTwo,
boost::mpl::integral_c<UIntType, r>
>::type type;
BOOST_STATIC_ASSERT(type::value);
template<class T>
static T calc(T value)
{
return value & (r - 1);
}
};
template<class UIntType, UIntType r>
struct GenericModulo
{
/**
* #brief Determines #a value modulo #a r.
*
* #pre value >= 0 and value < 2 * r
* #post value >= 0 and value < r
*/
template<class T>
static T calc(T value)
{
BOOST_STATIC_ASSERT(!std::numeric_limits<UIntType>::is_signed);
assert(value < 2 * r);
if (value >= r)
value -= r;
return value;
}
};
template<class UIntType, UIntType r>
struct Modulo
{
typedef typename boost::mpl::apply<
IsPowerOfTwo,
boost::mpl::integral_c<UIntType, r>
>::type rIsPowerOfTwo;
static UIntType calc(UIntType value)
{
// Use the bitwise AND for power 2 modulo arithmetic, or subtraction
// otherwise. Subtraction is about two times faster than direct modulo
// calculation.
return boost::mpl::if_<
rIsPowerOfTwo,
Power2Modulo<UIntType, r>,
GenericModulo<UIntType, r>
>::type::calc(value);
}
};
became much simpler with a little bit of c++11 (constexpr!) goodness:
template <typename T, typename = typename std::enable_if<!std::is_signed<T>()>::type>
constexpr static bool is_powerof2(T v) { return v && ((v & (v - 1)) == 0); }
template<class UIntType, UIntType r>
struct Modulo {
template<class T> static T calc(T value) { return calc(value, std::integral_constant<bool, is_powerof2(r)>{}); }
/**
* #brief Determines #a value modulo #a r.
*
* #pre value >= 0 and value < 2 * r
* #post value >= 0 and value < r
*/
template<class T> static T calc(T value, std::true_type) { return value & (r - 1); }
template<class T> static T calc(T value, std::false_type) {
STATIC_ASSERT(!std::numeric_limits<UIntType>::is_signed);
assert(value < 2 * r);
if (value >= r)
value -= r;
return value;
}
};
<boost/cstdint.hpp> -> <cstdint> (replacing ::boost by ::std for uint_least32_t and uint32_t)
Well_quoted type function replaced by an alias template (template<...> using T = ... see http://en.cppreference.com/w/cpp/language/type_alias ad 2)
typedefs rewritten as type aliases.
Full Listing
Live On Coliru
// Copyright (c) Sergiu Dotenco 2010, 2011, 2012
// Copyright (c) Seth Heeren - made independent of BOOST using C++11 - 2017
//
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
/**
* #brief Implementation of the Well Equidistributed Long-period Linear (WELL)
* pseudo-random number generator.
* #file well.hpp
*/
#ifndef WELL_HPP
#define WELL_HPP
#include <algorithm>
#include <cassert>
#include <cstddef>
#include <iomanip>
#include <istream>
#include <limits>
#include <ostream>
#include <functional>
#include <stdexcept>
#define STATIC_ASSERT(x) static_assert(x, #x)
#define PREVENT_MACRO_SUBSTITUTION
//! #cond hide_private
namespace Detail {
using ulong_long_type = unsigned long long;
template <typename UIntType, signed N, typename Enable = void> struct Shift;
template <typename UIntType, signed N>
struct Shift<UIntType, N, typename std::enable_if<(N>=0)>::type> {
static UIntType apply(UIntType a) { return a >> N; }
};
template <typename UIntType, signed N>
struct Shift<UIntType, N, typename std::enable_if<(N<0)>::type> {
static UIntType apply(UIntType a) { return a << -N; }
};
template<int N, class UIntType>
inline UIntType shift(UIntType a) {
return Shift<UIntType, N>::apply(a);
}
/**
* #name Transformation matrices #f$M0,\dotsc,M6#f$ from Table I
* #{
*/
struct M0
{
template<class T>
static T transform(T)
{
return T(0);
}
};
struct M1
{
template<class T>
static T transform(T x)
{
return x;
}
};
template<int N>
struct M2
{
template<class T>
static T transform(T x)
{
return shift<N>(x);
}
};
template<int N>
struct M3
{
template<class T>
static T transform(T x)
{
return x ^ shift<N>(x);
}
};
template<std::uint_least32_t a>
struct M4
{
template<class T>
static T transform(T x)
{
T result = x >> 1;
if ((x & 1) == 1)
result ^= a;
return result;
}
};
template<int N, std::uint_least32_t b>
struct M5
{
template<class T>
static T transform(T x)
{
return x ^ (shift<N>(x) & b);
}
};
template
<
std::size_t w,
std::uint_least32_t q,
std::uint_least32_t a,
std::uint_least32_t ds,
std::uint_least32_t dt
>
struct M6
{
template<class T>
static T transform(T x)
{
T result = ((x << q) ^ (x >> (w - q))) & ds;
if ((x & dt) != 0)
result ^= a;
return result;
}
};
//! #}
template <typename T, typename = typename std::enable_if<!std::is_signed<T>()>::type>
constexpr static bool is_powerof2(T v) { return v && ((v & (v - 1)) == 0); }
template<class UIntType, UIntType r>
struct Modulo {
template<class T> static T calc(T value) { return calc(value, std::integral_constant<bool, is_powerof2(r)>{}); }
/**
* #brief Determines #a value modulo #a r.
*
* #pre value >= 0 and value < 2 * r
* #post value >= 0 and value < r
*/
template<class T> static T calc(T value, std::true_type) { return value & (r - 1); }
template<class T> static T calc(T value, std::false_type) {
STATIC_ASSERT(!std::numeric_limits<UIntType>::is_signed);
assert(value < 2 * r);
if (value >= r)
value -= r;
return value;
}
};
template<std::uint_least32_t b, std::uint_least32_t c>
struct MatsumotoKuritaTempering
{
template<std::size_t r, class UIntType, std::size_t N>
static UIntType apply(UIntType x, UIntType (&)[N], std::size_t)
{
x ^= (x << 7) & b;
x ^= (x << 15) & c;
return x;
}
};
template<std::uint_least32_t mask>
struct HaraseTempering
{
template<std::size_t r, class UIntType, std::size_t N>
static UIntType apply(UIntType x, UIntType (&s)[N], std::size_t m2)
{
return x ^ (s[Modulo<UIntType, r>::calc(m2 + 1)] & mask);
}
};
struct NoTempering
{
template<std::size_t r, class UIntType, std::size_t N>
static UIntType apply(UIntType x, UIntType (&)[N], std::size_t)
{
return x;
}
};
} // namespace Detail
//! #endcond
/**
* #brief Well Equidistributed Long-period Linear (WELL) pseudo-random number
* generator.
*
* The implementation is based on the "Improved Long-Period Generators Based on
* Linear Recurrences Modulo 2" paper by Francois Panneton, Pierre L'Ecuyer and
* Makoto Matsumoto from ACM Transactions on Mathematical Software, 32 (1,
* March) 2006, pp. 1-16.
*
* #tparam UIntType The unsigned integer type.
* #tparam w Word size.
* #tparam r State size.
*/
template
<
class UIntType,
std::size_t w,
std::size_t r,
std::size_t p,
std::size_t m1,
std::size_t m2,
std::size_t m3,
class T0,
class T1,
class T2,
class T3,
class T4,
class T5,
class T6,
class T7,
class Tempering // mpl pluggable
>
class Well
{
STATIC_ASSERT(!std::numeric_limits<UIntType>::is_signed);
STATIC_ASSERT(w <= static_cast<std::size_t>(std::numeric_limits<UIntType>::digits));
STATIC_ASSERT(r > 0 && p < w);
STATIC_ASSERT(m1 > 0 && m1 < r);
STATIC_ASSERT(m2 > 0 && m2 < r);
STATIC_ASSERT(m3 > 0 && m3 < r);
public:
//! The unsigned integer type.
typedef UIntType result_type;
//! Word size.
static constexpr std::size_t word_size = w;
//! State size.
static constexpr std::size_t state_size = r;
//! Number of mask bits.
static constexpr std::size_t mask_bits = p;
//! Default seed value.
static constexpr UIntType default_seed = 5489U;
/**
* #brief Initializes the class using the specified seed #a value.
*
* #param value The seed value to be used for state initialization.
*/
explicit Well(result_type value = default_seed)
{
seed(value);
}
template<class InputIterator>
Well(InputIterator& first, InputIterator last)
{
seed(first, last);
}
template<class Generator>
explicit Well(Generator& g)
{
seed(g);
}
template<class Generator>
void seed(Generator& g)
{
// Ensure std::generate_n doesn't copy the generator g by using
// std::reference_wrapper
std::generate_n(state_, state_size, std::ref(g));
}
void seed(result_type value = default_seed)
{
if (value == 0U)
value = default_seed;
state_[0] = value;
std::size_t i = 1;
UIntType *const s = state_;
// Same generator used to seed Mersenne twister
for ( ; i != state_size; ++i)
s[i] = (1812433253U * (s[i - 1] ^ (s[i - 1] >> (w - 2))) + i);
index_ = i;
}
template<class InputIterator>
void seed(InputIterator& first, InputIterator last)
{
index_ = 0;
std::size_t i = 0;
for ( ; i != state_size && first != last; ++i, ++first)
state_[i] = *first;
if (first == last && i != state_size)
throw std::invalid_argument("Seed sequence too short");
}
/**
* #brief Generates a random number.
*/
result_type operator()()
{
const UIntType upper_mask = ~0U << p;
const UIntType lower_mask = ~upper_mask;
// v[i,j] = state[(r-i+j) mod r]
std::size_t i = index_;
// Equivalent to r-i but allows to avoid negative values in the
// following two expressions
std::size_t j = i + r;
std::size_t k = mod(j - 1); // [i,r-1]
std::size_t l = mod(j - 2); // [i,r-2]
std::size_t im1 = i + m1;
std::size_t im2 = i + m2;
std::size_t im3 = i + m3;
UIntType z0, z1, z2, z3, z4;
z0 = (state_[k] & upper_mask) | (state_[l] & lower_mask);
z1 = T0::transform(state_[i]) ^
T1::transform(state(im1));
z2 = T2::transform(state(im2)) ^
T3::transform(state(im3));
z3 = z1 ^ z2;
z4 = T4::transform(z0) ^ T5::transform(z1) ^
T6::transform(z2) ^ T7::transform(z3);
state_[i] = z3; // v[i+1,1]
state_[k] = z4; // v[i+1,0]
index_ = k;
return Tempering::template apply<r>(z4, state_, im2);
}
result_type min PREVENT_MACRO_SUBSTITUTION () const
{
return 0U;
}
result_type max PREVENT_MACRO_SUBSTITUTION () const
{
return ~0U >> (std::numeric_limits<UIntType>::digits - w);
}
void discard(Detail::ulong_long_type z)
{
while (z-- > 0) {
operator()();
}
}
/**
* #brief Compares the state of two generators for equality.
*/
friend bool operator==(const Well& lhs, const Well& rhs)
{
for (std::size_t i = 0; i != state_size; ++i)
if (lhs.compute(i) != rhs.compute(i))
return false;
return true;
}
/**
* #brief Compares the state of two generators for inequality.
*/
friend bool operator!=(const Well& lhs, const Well& rhs)
{
return !(lhs == rhs);
}
/**
* #brief Writes the state to the specified stream.
*/
template<class E, class T>
friend std::basic_ostream<E, T>&
operator<<(std::basic_ostream<E, T>& out, const Well& well)
{
E space = out.widen(' ');
for (std::size_t i = 0; i != state_size; ++i)
out << well.compute(i) << space;
return out;
}
/**
* #brief Reads the generator state from the specified input stream.
*/
template<class E, class T>
friend std::basic_istream<E, T>&
operator>>(std::basic_istream<E, T>& in, Well& well)
{
for (std::size_t i = 0; i != state_size; ++i)
in >> well.state_[i] >> std::ws;
well.index_ = state_size;
return in;
}
private:
template<class T>
static T mod(T value)
{
return Detail::Modulo<T, r>::calc(value);
}
UIntType state(std::size_t index) const
{
return state_[mod(index)];
}
UIntType compute(std::size_t index) const
{
return state_[(index_ + index + r) % r];
}
UIntType state_[r];
std::size_t index_;
};
namespace Detail {
/**
* #name Base definitions with pluggable tempering method
* #{
*/
template <typename Tempering>
using Well512a_base = Well<
std::uint32_t, 32, 16, 0, 13, 9, 5, M3<-16>, M3<-15>, M3<11>, M0, M3<-2>, M3<-18>, M2<-28>,
M5<-5, 0xda442d24>, Tempering>;
template <typename Tempering>
using Well521a_base = Well<
std::uint32_t, 32, 17, 23, 13, 11, 10, M3<-13>, M3<-15>, M1, M2<-21>,
M3<-13>, M2<1>, M0, M3<11>, Tempering>;
template <typename Tempering>
using Well521b_base = Well<
std::uint32_t, 32, 17, 23, 11, 10, 7, M3<-21>, M3<6>, M0, M3<-13>, M3<13>,
M2<-10>, M2<-5>, M3<13>, Tempering>;
template <typename Tempering>
using Well607a_base = Well<
std::uint32_t, 32, 19, 1, 16, 15, 14, M3<19>, M3<11>, M3<-14>, M1, M3<18>,
M1, M0, M3<-5>, Tempering>;
template <typename Tempering>
using Well607b_base = Well<
std::uint32_t, 32, 19, 1, 16, 18, 13, M3<-18>, M3<-14>, M0, M3<18>,
M3<-24>, M3<5>, M3<-1>, M0, Tempering>;
template <typename Tempering>
using Well800a_base = Well<
std::uint32_t, 32, 25, 0, 14, 18, 17, M1, M3<-15>, M3<10>, M3<-11>, M3<16>,
M2<20>, M1, M3<-28>, Tempering>;
template <typename Tempering>
using Well800b_base = Well<
std::uint32_t, 32, 25, 0, 9, 4, 22, M3<-29>, M2<-14>, M1, M2<19>, M1,
M3<10>, M4<0xd3e43ffd>, M3<-25>, Tempering>;
template <typename Tempering>
using Well1024a_base = Well<
std::uint32_t, 32, 32, 0, 3, 24, 10, M1, M3<8>, M3<-19>, M3<-14>, M3<-11>,
M3<-7>, M3<-13>, M0, Tempering>;
template <typename Tempering>
using Well1024b_base = Well<
std::uint32_t, 32, 32, 0, 22, 25, 26, M3<-21>, M3<17>, M4<0x8bdcb91e>,
M3<15>, M3<-14>, M3<-21>, M1, M0, Tempering>;
template <typename Tempering>
using Well19937a_base = Well<
std::uint32_t, 32, 624, 31, 70, 179, 449, M3<-25>, M3<27>, M2<9>, M3<1>,
M1, M3<-9>, M3<-21>, M3<21>, Tempering>;
template <typename Tempering>
using Well19937b_base = Well<
std::uint32_t, 32, 624, 31, 203, 613, 123, M3<7>, M1, M3<12>, M3<-10>,
M3<-19>, M2<-11>, M3<4>, M3<-10>, Tempering>;
template <typename Tempering>
using Well21701a_base = Well<
std::uint32_t, 32, 679, 27, 151, 327, 84, M1, M3<-26>, M3<19>, M0, M3<27>,
M3<-11>, M6<32, 15, 0x86a9d87e, 0xffffffef, 0x00200000>, M3<-16>,
Tempering>;
template <typename Tempering>
using Well23209a_base = Well<
std::uint32_t, 32, 726, 23, 667, 43, 462, M3<28>, M1, M3<18>, M3<3>,
M3<21>, M3<-17>, M3<-28>, M3<-1>, Tempering>;
template <typename Tempering>
using Well23209b_base = Well<
std::uint32_t, 32, 726, 23, 610, 175, 662, M4<0xa8c296d1>, M1, M6<32, 15,
0x5d6b45cc, 0xfffeffff, 0x00000002>, M3<-24>, M3<-26>, M1, M0, M3<16>,
Tempering>;
template <typename Tempering>
using Well44497a_base = Well<
std::uint32_t, 32, 1391, 15, 23, 481, 229, M3<-24>, M3<30>, M3<-10>,
M2<-26>, M1, M3<20>, M6<32, 9, 0xb729fcec, 0xfbffffff, 0x00020000>, M1, Tempering>;
//! #}
} // namespace Detail
using Well512a = Detail::Well512a_base<Detail::NoTempering>;
using Well521a = Detail::Well521a_base<Detail::NoTempering>;
using Well521b = Detail::Well521b_base<Detail::NoTempering>;
using Well607a = Detail::Well607a_base<Detail::NoTempering>;
using Well607b = Detail::Well607b_base<Detail::NoTempering>;
using Well800a = Detail::Well800a_base<Detail::NoTempering>;
using Well800b = Detail::Well800b_base<Detail::NoTempering>;
using Well1024a = Detail::Well1024a_base<Detail::NoTempering>;
using Well1024b = Detail::Well1024b_base<Detail::NoTempering>;
using Well19937a = Detail::Well19937a_base<Detail::NoTempering>;
using Well19937b = Detail::Well19937b_base<Detail::NoTempering>;
using Well19937c = Detail::Well19937a_base<Detail::MatsumotoKuritaTempering<0xe46e1700, 0x9b868000>>;
using Well21701a = Detail::Well21701a_base<Detail::NoTempering>;
using Well23209a = Detail::Well23209a_base<Detail::NoTempering>;
using Well23209b = Detail::Well23209b_base<Detail::NoTempering>;
using Well44497a = Detail::Well44497a_base<Detail::NoTempering>;
using Well44497b = Detail::Well44497a_base<Detail::MatsumotoKuritaTempering<0x93dd1400, 0xfa118000>>;
/**
* #name Maximally equidistributed versions using Harase's tempering method
* #{
*/
using Well800a_ME = Detail::Well800a_base<Detail::HaraseTempering<0x4880>>;
using Well800b_ME = Detail::Well800b_base<Detail::HaraseTempering<0x17030806>>;
using Well19937a_ME = Detail::Well19937a_base<Detail::HaraseTempering<0x4118000>>;
using Well19937b_ME = Detail::Well19937b_base<Detail::HaraseTempering<0x30200010>>;
using Well21701a_ME = Detail::Well21701a_base<Detail::HaraseTempering<0x1002>>;
using Well23209a_ME = Detail::Well23209a_base<Detail::HaraseTempering<0x5100000>>;
using Well23209b_ME = Detail::Well23209b_base<Detail::HaraseTempering<0x34000300>>;
using Well44497a_ME = Detail::Well44497a_base<Detail::HaraseTempering<0x48000000>>;
//! #}
#endif // WELL_HPP
// Copyright (c) Sergiu Dotenco 2010
// Copyright (c) Seth Heeren - made independent of BOOST using C++11 - 2017
//
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
/**
* #brief WELL PRNG implementation unit test.
* #file welltest.cpp
*/
#include <algorithm>
#include <memory>
#include <iostream>
// #include "well.hpp"
#define MESSAGE_PREAMBLE() (std::cerr << __FILE__ << ":" << __LINE__ << " ")
#define CHECK_EQUAL(expected,actual) do { \
auto&& _e = expected; \
auto&& _a = actual; \
if (_e != _a) \
MESSAGE_PREAMBLE() << "expected:" << #expected << " = " << _e << "\n" \
<< "\tactual:" << #actual << " = " << _a << "\n"; \
} while (0)
#define CHECK(test) do { if (!(test)) MESSAGE_PREAMBLE() << #test << "\n"; } while (0)
/**
* #brief Generic WELL test case.
*
* The test case performs the following checks:
* -# The last generated value is equal to the value generate by the reference
* implementation after #f$10^9#f$ iterations. The generator is seeded using
* an array filled with 1s.
* -# The #c min and #c max methods of the #ref Well generator return 0 and
* #f$2^{32}-1#f$ respectively.
*
* #tparam RandomNumberGenerator WELL PRNG implementation type.
* #tparam Expected The expected result after #f$10^9#f$ iterations.
*/
template
<
class RandomNumberGenerator,
typename RandomNumberGenerator::result_type Expected
>
class WellTestCase
{
RandomNumberGenerator rng;
typedef typename RandomNumberGenerator::result_type result_type;
result_type generate()
{
unsigned state[RandomNumberGenerator::state_size];
std::uninitialized_fill_n(state, RandomNumberGenerator::state_size, 1);
unsigned* p = state;
rng.seed(p, p + RandomNumberGenerator::state_size);
result_type x = 0;
int iterations = 1000000000;
while (iterations-- > 0)
x = rng();
return x;
}
public:
static void run()
{
WellTestCase c;
CHECK_EQUAL(c.generate(), Expected);
CHECK_EQUAL(c.rng.min(), 0U);
CHECK_EQUAL(c.rng.max(), ~0U);
CHECK_EQUAL(c.rng, c.rng);
CHECK(c.rng == c.rng);
}
};
/**
* #brief Defines the actual test case.
*
* #param name The name of the test case.
* #param type WELL pseudo-random generator type.
* #param expected The expected result after #f$10^9#f$ iterations.
*
* #hideinitializer
*/
int main() {
CHECK_EQUAL(16, Detail::shift<2>(64));
CHECK_EQUAL(64, Detail::shift<-2>(16));
CHECK_EQUAL(32, Detail::shift<0>(32));
CHECK(Detail::is_powerof2(512u));
CHECK(not Detail::is_powerof2(0u));
WellTestCase<Well512a, 0x2b3fe99e>::run();
#ifndef COLIRU // stay in execution time limits
WellTestCase<Well521a, 0xc9878363>::run();
WellTestCase<Well521b, 0xb75867f6>::run();
WellTestCase<Well607a, 0x7b5043ea>::run();
WellTestCase<Well607b, 0xaedee7da>::run();
WellTestCase<Well800a, 0x2bfe686f>::run();
WellTestCase<Well800b, 0xf009e1bd>::run();
WellTestCase<Well1024a, 0xd07f528c>::run();
WellTestCase<Well1024b, 0x867f7993>::run();
WellTestCase<Well19937a, 0xb33a2cd5>::run();
WellTestCase<Well19937b, 0x191de86a>::run();
WellTestCase<Well19937c, 0x243eaed5>::run();
WellTestCase<Well21701a, 0x7365a269>::run();
WellTestCase<Well23209a, 0x807dacb >::run();
WellTestCase<Well23209b, 0xf1a77751>::run();
WellTestCase<Well44497a, 0xfdd7c07b>::run();
WellTestCase<Well44497b, 0x9406547b>::run();
#endif
}

Using C++17, this code becomes way simpler and error messages are friendlier on the eye.
This is a sample implementation of Power2Modulo:
#include <type_traits>
template<class UIntType, UIntType r>
struct Power2Modulo
{
static_assert(std::is_unsigned_v<UIntType>);
static_assert((r & (r - 1)) == 0,
"The second parameter of this struct is required to be a power of 2");
template<class T>
[[nodiscard]] static constexpr T calc(T value)
{
return value & (r - 1);
}
};
You can use it like this:
int main()
{
/* This code fails to compile with friendly error message
Power2Modulo<unsigned, 12> x;
*/
// Using the static function
using Mod16 = Power2Modulo<unsigned, 16>;
static_assert(Mod16::calc(15) == 15);
static_assert(Mod16::calc(16) == 0);
static_assert(Mod16::calc(17) == 1);
// Using it like a member function
Power2Modulo<unsigned, 4> mod4;
static_assert(mod4.calc(15) == 3);
static_assert(mod4.calc(16) == 0);
static_assert(mod4.calc(17) == 1);
}
Tested with clang-6 and gcc-8 and VisualC++ (via http://webcompiler.cloudapp.net/).

Related

merging multiple arrays using boost::join

Is it a better idea to use boost::join to access and change the values of different arrays?
I have defined a member array inside class element.
class element
{
public:
element();
int* get_arr();
private:
int m_arr[4];
}
At different place, I'm accessing these arrays and joined together using boost::join and changing the array values.
//std::vector<element> elem;
auto temp1 = boost::join(elem[0].get_arr(),elem[1].get_arr());
auto joined_arr = boost::join(temp1,elem[2].get_arr());
//now going to change the values of the sub array
for(auto& it:joined_arr)
{
it+= sample[i];
i++;
}
Is this a good idea to modify the values of array in the class as above?

In your code you probably want to join the 4-elements arrays. To do that change the signature of get_arr to:
typedef int array[4];
array& get_arr() { return m_arr; }
So that the array size does not get lost.
Performance-wise there is a non-zero cost for accessing elements through the joined view. A double for loop is going to be most efficient, and easily readable too, e.g.:
for(auto& e : elem)
for(auto& a : e.get_arr())
a += sample[i++];

boost::join returns a more complicated type every composition step. At some point you might exceed the compiler's limits on inlining so that you're going to have a runtime cost¹.
Thinking outside the box, it really looks like you are creating a buffer abstraction that allows you to do scatter/gather like IO with few allocations.
As it happens, Boost Asio has nice abstractions for this², and you could use that: http://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/MutableBufferSequence.html
As I found out in an earlier iteration of this answer code that abstraction sadly only works for buffers accessed through native char-type elements. That was no good.
So, in this rewrite I present a similar abstraction, which consists of nothing than an "hierarchical iterator" that knows how to iterate a sequence of "buffers" (in this implementation, any range will do).
You can choose to operate on a sequence of ranges directly, e.g.:
std::vector<element> seq(3); // tie 3 elements together as buffer sequence
element& b = seq[1];
Or, without any further change, by reference:
element a, b, c;
std::vector<std::reference_wrapper<element> > seq {a,b,c}; // tie 3 elements together as buffer sequence
The C++ version presented at the bottom demonstrates this approach Live On Coliru
The Iterator Implementation
I've used Boost Range and Boost Iterator:
template <typename Seq,
typename WR = typename Seq::value_type,
typename R = typename detail::unwrap<WR>::type,
typename V = typename boost::range_value<R>::type
>
struct sequence_iterator : boost::iterator_facade<sequence_iterator<Seq,WR,R,V>, V, boost::forward_traversal_tag> {
using OuterIt = typename boost::range_iterator<Seq>::type;
using InnerIt = typename boost::range_iterator<R>::type;
// state
Seq& _seq;
OuterIt _ocur, _oend;
InnerIt _icur, _iend;
static sequence_iterator begin(Seq& seq) { return {seq, boost::begin(seq), boost::end(seq)}; }
static sequence_iterator end(Seq& seq) { return {seq, boost::end(seq), boost::end(seq)}; }
// the 3 facade operations
bool equal(sequence_iterator const& rhs) const {
return ((_ocur==_oend) && (rhs._ocur==rhs._oend))
|| (std::addressof(_seq) == std::addressof(rhs._seq) &&
_ocur == rhs._ocur && _oend == rhs._oend &&
_icur == rhs._icur && _iend == rhs._iend);
}
void increment() {
if (++_icur == _iend) {
++_ocur;
setup();
}
}
V& dereference() const {
assert(_ocur != _oend);
assert(_icur != _iend);
return *_icur;
}
private:
void setup() { // to be called after entering a new sub-range in the sequence
while (_ocur != _oend) {
_icur = boost::begin(detail::get(*_ocur));
_iend = boost::end(detail::get(*_ocur));
if (_icur != _iend)
break;
++_ocur; // skid over, this enables simple increment() logic
}
}
sequence_iterator(Seq& seq, OuterIt cur, OuterIt end)
: _seq(seq), _ocur(cur), _oend(end) { setup(); }
};
That's basically the same kind of iterator as boost::asio::buffers_iterator but it doesn't assume an element type. Now, creating sequence_iterators for any sequence of ranges is as simple as:
template <typename Seq> auto buffers_begin(Seq& seq) { return sequence_iterator<Seq>::begin(seq); }
template <typename Seq> auto buffers_end(Seq& seq) { return sequence_iterator<Seq>::end(seq); }
Implementing Your Test Program
Live On Coliru
// DEMO
struct element {
int peek_first() const { return m_arr[0]; }
auto begin() const { return std::begin(m_arr); }
auto end() const { return std::end(m_arr); }
auto begin() { return std::begin(m_arr); }
auto end() { return std::end(m_arr); }
private:
int m_arr[4] { };
};
namespace boost { // range adapt
template <> struct range_iterator<element> { using type = int*; };
// not used, but for completeness:
template <> struct range_iterator<element const> { using type = int const*; };
template <> struct range_const_iterator<element> : range_iterator<element const> {};
}
#include <algorithm>
#include <iostream>
#include <vector>
template <typename Output, typename Input, typename Operation>
size_t process(Output& output, Input const& input, Operation op) {
auto ib = boost::begin(input), ie = boost::end(input);
auto ob = boost::begin(output), oe = boost::end(output);
size_t n = 0;
for (;ib!=ie && ob!=oe; ++n) {
op(*ob++, *ib++);
}
return n;
}
int main() {
element a, b, c;
std::vector<std::reference_wrapper<element> > seq {a,b,c}; // tie 3 elements together as buffer sequence
//// Also supported, container of range objects directly:
// std::list<element> seq(3); // tie 3 elements together as buffer sequence
// element& b = seq[1];
std::vector<int> const samples {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32
};
using boost::make_iterator_range;
auto input = make_iterator_range(samples);
auto output = make_iterator_range(buffers_begin(seq), buffers_end(seq));
while (auto n = process(output, input, [](int& el, int sample) { el += sample; })) {
std::cout << "Copied " << n << " samples, b starts with " << b.peek_first() << "\n";
input.advance_begin(n);
}
}
Prints
Copied 12 samples, b starts with 5
Copied 12 samples, b starts with 22
Copied 8 samples, b starts with 51
Full Listing, C++11 Compatible
Live On Coliru
#include <boost/iterator/iterator_facade.hpp>
#include <boost/range/iterator_range.hpp>
#include <functional> // std::reference_wrapper
namespace detail {
template<typename T> constexpr T& get(T &t) { return t; }
template<typename T> constexpr T const& get(T const &t) { return t; }
template<typename T> constexpr T& get(std::reference_wrapper<T> rt) { return rt; }
template <typename T> struct unwrap { using type = T; };
template <typename T> struct unwrap<std::reference_wrapper<T> > { using type = T; };
}
template <typename Seq,
typename WR = typename Seq::value_type,
typename R = typename detail::unwrap<WR>::type,
typename V = typename boost::range_value<R>::type
>
struct sequence_iterator : boost::iterator_facade<sequence_iterator<Seq,WR,R,V>, V, boost::forward_traversal_tag> {
using OuterIt = typename boost::range_iterator<Seq>::type;
using InnerIt = typename boost::range_iterator<R>::type;
// state
Seq& _seq;
OuterIt _ocur, _oend;
InnerIt _icur, _iend;
static sequence_iterator begin(Seq& seq) { return {seq, boost::begin(seq), boost::end(seq)}; }
static sequence_iterator end(Seq& seq) { return {seq, boost::end(seq), boost::end(seq)}; }
// the 3 facade operations
bool equal(sequence_iterator const& rhs) const {
return ((_ocur==_oend) && (rhs._ocur==rhs._oend))
|| (std::addressof(_seq) == std::addressof(rhs._seq) &&
_ocur == rhs._ocur && _oend == rhs._oend &&
_icur == rhs._icur && _iend == rhs._iend);
}
void increment() {
if (++_icur == _iend) {
++_ocur;
setup();
}
}
V& dereference() const {
assert(_ocur != _oend);
assert(_icur != _iend);
return *_icur;
}
private:
void setup() { // to be called after entering a new sub-range in the sequence
while (_ocur != _oend) {
_icur = boost::begin(detail::get(*_ocur));
_iend = boost::end(detail::get(*_ocur));
if (_icur != _iend)
break;
++_ocur; // skid over, this enables simple increment() logic
}
}
sequence_iterator(Seq& seq, OuterIt cur, OuterIt end)
: _seq(seq), _ocur(cur), _oend(end) { setup(); }
};
template <typename Seq> auto buffers_begin(Seq& seq) { return sequence_iterator<Seq>::begin(seq); }
template <typename Seq> auto buffers_end(Seq& seq) { return sequence_iterator<Seq>::end(seq); }
// DEMO
struct element {
int peek_first() const { return m_arr[0]; }
auto begin() const { return std::begin(m_arr); }
auto end() const { return std::end(m_arr); }
auto begin() { return std::begin(m_arr); }
auto end() { return std::end(m_arr); }
private:
int m_arr[4] { };
};
namespace boost { // range adapt
template <> struct range_iterator<element> { using type = int*; };
// not used, but for completeness:
template <> struct range_iterator<element const> { using type = int const*; };
template <> struct range_const_iterator<element> : range_iterator<element const> {};
}
#include <algorithm>
#include <iostream>
#include <vector>
template <typename Output, typename Input, typename Operation>
size_t process(Output& output, Input const& input, Operation op) {
auto ib = boost::begin(input), ie = boost::end(input);
auto ob = boost::begin(output), oe = boost::end(output);
size_t n = 0;
for (;ib!=ie && ob!=oe; ++n) {
op(*ob++, *ib++);
}
return n;
}
int main() {
element a, b, c;
std::vector<std::reference_wrapper<element> > seq {a,b,c}; // tie 3 elements together as buffer sequence
//// Also supported, container of range objects directly:
// std::list<element> seq(3); // tie 3 elements together as buffer sequence
// element& b = seq[1];
std::vector<int> const samples {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32
};
using boost::make_iterator_range;
auto input = make_iterator_range(samples);
auto output = make_iterator_range(buffers_begin(seq), buffers_end(seq));
while (auto n = process(output, input, [](int& el, int sample) { el += sample; })) {
std::cout << "Copied " << n << " samples, b starts with " << b.peek_first() << "\n";
input.advance_begin(n);
}
}
¹ I'm ignoring the compile-time cost and the lurking dangers with the pattern of auto x = complicated_range_composition when that complicated range composition contains references to temporaries: this is a frequent source of UB bugs
² which have been adopted by various other libraries, like Boost Beast, Boost Process and seem to have found their way into the Networking TS for C++20: Header <experimental/buffer> synopsis (PDF)

Eigen - Concatenated matrix as a reference

The following code concatenates a vector of ones to a matrix:
using Eigen::VectorXd;
using Eigen::MatrixXd;
MatrixXd cbind1(const Eigen::Ref<const MatrixXd> X) {
const unsigned int n = X.rows();
MatrixXd Y(n, 1 + X.cols());
Y << VectorXd::Ones(n), X;
return Y;
}
The function copies the contents of X to Y. How do I define the function so that it avoids doing the copy and returns a reference to a matrix containing VectorXd::Ones(n), X?
Thanks.

If you had followed and read ggael's answer to your previous question (and it appears you did, as you accepted his answer), you would have read this page of the docs. By modifying the example slightly, you could have written as part of an MCVE:
#include <iostream>
#include <Eigen/Core>
using namespace Eigen;
template<class ArgType>
struct ones_col_helper {
typedef Matrix<typename ArgType::Scalar,
ArgType::SizeAtCompileTime,
ArgType::SizeAtCompileTime,
ColMajor,
ArgType::MaxSizeAtCompileTime,
ArgType::MaxSizeAtCompileTime> MatrixType;
};
template<class ArgType>
class ones_col_functor
{
const typename ArgType::Nested m_mat;
public:
ones_col_functor(const ArgType& arg) : m_mat(arg) {};
const typename ArgType::Scalar operator() (Index row, Index col) const {
if (col == 0) return typename ArgType::Scalar(1);
return m_mat(row, col - 1);
}
};
template <class ArgType>
CwiseNullaryOp<ones_col_functor<ArgType>, typename ones_col_helper<ArgType>::MatrixType>
cbind1(const Eigen::MatrixBase<ArgType>& arg)
{
typedef typename ones_col_helper<ArgType>::MatrixType MatrixType;
return MatrixType::NullaryExpr(arg.rows(), arg.cols()+1, ones_col_functor<ArgType>(arg.derived()));
}
int main()
{
MatrixXd mat(4, 4);
mat << 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16;
auto example = cbind1(mat);
std::cout << example << std::endl;
return 0;
}

function as template parameter : if(T receive 2 param)T(a,b); else T(a);

How to make template class Collection<K,T> receive a function T - that can either has signature T(K) or T(K,int) - as template argument, then conditionally compile base on the signature of the function?
Here is the existing code that can receive 1 signature : Collection<K,HashFunction(K)>.
template<typename AA> using HashFunction= HashStruct& (*)(AA );
/** This class is currently used in so many places in codebase. */
template<class K,HashFunction<K> T> class Collection{
void testCase(){
K k=K();
HashStruct& hh= T(k); /*Collection1*/
//.... something complex ...
}
};
I want it to also support Collection<K,HashFunction(K,int)>.
template<class K,HashFunction<K> T /* ??? */> class Collection{
int indexHash=1245323;
void testCase(){
K k=K();
if(T receive 2 parameter){ // ???
HashStruct& hh=T(k,this->indexHash); /*Collection2*/ // ???
//^ This is the heart of what I really want to achieve.
//.... something complex (same) ...
}else{
HashStruct& hh=T(k); /*Collection1*/
//.... something complex (same) ...
}
}
};
Do I have no choice but to create 2 different classes : Collection1 & Collection2?
Answer that need more than c++11 is ok but less preferable.
I feel that it might be solvable by using "default parameter" trick.

Variadic templates, partial specialization and SFINAE can help you.
If you accept to duplicate the test() method, you can do something like
#include <iostream>
using HashStruct = std::size_t;
template<typename ... AA>
using HashFunction = HashStruct & (*)(AA ... );
HashStruct & hf1 (std::size_t s)
{ static HashStruct val {0U}; return val = s; }
HashStruct & hf2 (std::size_t s, int i)
{ static HashStruct val {0U}; return val = s + std::size_t(i); }
template <typename Tf, Tf F>
class Collection;
template <typename K, typename ... I, HashFunction<K, I...> F>
class Collection<HashFunction<K, I...>, F>
{
public:
template <std::size_t N = sizeof...(I)>
typename std::enable_if<N == 0U, void>::type test ()
{
K k=K();
HashStruct & hh = F(k);
std::cout << "case 0 (" << hh << ")" << std::endl;
}
template <std::size_t N = sizeof...(I)>
typename std::enable_if<N == 1U, void>::type test ()
{
K k=K();
HashStruct & hh = F(k, 100);
std::cout << "case 1 (" << hh << ")" << std::endl;
}
};
int main ()
{
Collection<HashFunction<std::size_t>, hf1> c1;
Collection<HashFunction<std::size_t, int>, hf2> c2;
c1.test(); // print "case 0 (0)"
c2.test(); // print "case 1 (100)"
}
But, if you can pass the extra argument to test(), you don't need SFINAE, you can create a single test() method and all is simpler
#include <iostream>
using HashStruct = std::size_t;
template<typename ... AA>
using HashFunction = HashStruct & (*)(AA ... );
HashStruct & hf1 (std::size_t s)
{ static HashStruct val {0U}; return val = s; }
HashStruct & hf2 (std::size_t s, int i)
{ static HashStruct val {0U}; return val = s + std::size_t(i); }
template <typename Tf, Tf F>
class Collection;
template <typename K, typename ... I, HashFunction<K, I...> F>
class Collection<HashFunction<K, I...>, F>
{
public:
void test (I ... i)
{
K k=K();
HashStruct & hh = F(k, i...);
std::cout << hh << std::endl;
}
};
int main ()
{
Collection<HashFunction<std::size_t>, hf1> c1;
Collection<HashFunction<std::size_t, int>, hf2> c2;
c1.test(); // print "0"
c2.test(100); // print "100"
}

CUDA Thrust - Counting matching subarrays

I'm trying to figure out if it's possible to efficiently calculate the conditional entropy of a set of numbers using CUDA. You can calculate the conditional entropy by dividing an array into windows, then counting the number of matching subarrays/substrings for different lengths. For each subarray length, you calculate the entropy by adding together the matching subarray counts times the log of those counts. Then, whatever you get as the minimum entropy is the conditional entropy.
To give a more clear example of what I mean, here is full calculation:
The initial array is [1,2,3,5,1,2,5]. Assuming the window size is 3, this must be divided into five windows: [1,2,3], [2,3,5], [3,5,1], [5,1,2], and [1,2,5].
Next, looking at each window, we want to find the matching subarrays for each length.
The subarrays of length 1 are [1],[2],[3],[5],[1]. There are two 1s, and one of each other number. So the entropy is log(2)2 + 4(log(1)*1) = 0.6.
The subarrays of length 2 are [1,2], [2,3], [3,5], [5,1], and [1,2]. There are two [1,2]s, and four unique subarrays. The entropy is the same as length 1, log(2)2 + 4(log(1)*1) = 0.6.
The subarrays of length 3 are the full windows: [1,2,3], [2,3,5], [3,5,1], [5,1,2], and [1,2,5]. All five windows are unique, so the entropy is 5*(log(1)*1) = 0.
The minimum entropy is 0, meaning it is the conditional entropy for this array.
This can also be presented as a tree, where the counts at each node represent how many matches exist. The entropy for each subarray length is equivalent to the entropy for each level of the tree.
If possible, I'd like to perform this calculation on many arrays at once, and also perform the calculation itself in parallel. Does anyone have suggestions on how to accomplish this? Could thrust be useful? Please let me know if there is any additional information I should provide.

I tried solving your problem using thrust. It works, but it results in a lot of thrust calls.
Since your input size is rather small, you should process multiple arrays in parallel.
However, doing this results in a lot of book-keeping effort, you will see this in the following code.
Your input range is limited to [1,5], which is equivalent to [0,4]. The general idea is that (theoretically) any tuple out of this range (e.g. {1,2,3} can be represented as a number in base 4 (e.g. 1+2*4+3*16 = 57).
In practice we are limited by the size of the integer type. For a 32bit unsigned integer this will lead to a maximum tuple size of 16. This is also the maximum window size the following code can handle (changing to a 64bit unsigned integer will lead to a maximum tuple size of 32).
Let's assume the input data is structured like this:
We have 2 arrays we want to process in parallel, each array is of size 5 and window size is 3.
{{0,0,3,4,4},{0,2,1,1,3}}
We can now generate all windows:
{{0,0,3},{0,3,4},{3,4,4}},{{0,2,1},{2,1,1},{1,1,3}}
Using a per tuple prefix sum and applying the aforementioned representation of each tuple as a single base-4 number, we get:
{{0,0,48},{0,12,76},{3,19,83}},{{0,8,24},{2,6,22},{1,5,53}}
Now we reorder the values so we have the numbers which represent a subarray of a specific length next to each other:
{{0,0,3},{0,12,19},{48,76,83}},{0,2,1},{8,6,5},{24,22,53}}
We then sort within each group:
{{0,0,3},{0,12,19},{48,76,83}},{0,1,2},{5,6,8},{22,24,53}}
Now we can count how often a number occurs in each group:
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Applying the log-formula results in
0.60206,0,0,0,0,0
Now we fetch the minimum value per array:
0,0
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/functional.h>
#include <thrust/random.h>
#include <iostream>
#include <thrust/tuple.h>
#include <thrust/reduce.h>
#include <thrust/scan.h>
#include <thrust/gather.h>
#include <thrust/sort.h>
#include <math.h>
#include <chrono>
#ifdef PRINT_ENABLED
#define PRINTER(name) print(#name, (name))
#else
#define PRINTER(name)
#endif
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
std::cout << name << ":\t";
thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
std::cout << std::endl;
}
template <typename Integer, Integer Min, Integer Max>
struct random_filler
{
__device__
Integer operator()(std::size_t index) const
{
thrust::default_random_engine rng;
thrust::uniform_int_distribution<Integer> dist(Min, Max);
rng.discard(index);
return dist(rng);
}
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename T,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
__device__ __inline__
thrust::tuple<T,T,T,T> calc_indices(const T& i0)
{
const T i1 = i0 / PerArrayCount;
const T i2 = i0 % PerArrayCount;
const T i3 = i2 / WindowSize;
const T i4 = i2 % WindowSize;
return thrust::make_tuple(i1,i2,i3,i4);
}
template <typename Iterator,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount,
std::size_t TotalCount = PerArrayCount * ArrayCount
>
class sliding_window
{
public:
typedef typename thrust::iterator_difference<Iterator>::type difference_type;
struct window_functor : public thrust::unary_function<difference_type,difference_type>
{
__host__ __device__
difference_type operator()(const difference_type& i0) const
{
auto t = calc_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::get<0>(t) * ArraySize + thrust::get<2>(t) + thrust::get<3>(t);
}
};
typedef typename thrust::counting_iterator<difference_type> CountingIterator;
typedef typename thrust::transform_iterator<window_functor, CountingIterator> TransformIterator;
typedef typename thrust::permutation_iterator<Iterator,TransformIterator> PermutationIterator;
typedef PermutationIterator iterator;
sliding_window(Iterator first) : first(first){}
iterator begin(void) const
{
return PermutationIterator(first, TransformIterator(CountingIterator(0), window_functor()));
}
iterator end(void) const
{
return begin() + TotalCount;
}
protected:
Iterator first;
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename Iterator>
sliding_window<Iterator, ArraySize, ArrayCount, WindowSize>
make_sliding_window(Iterator first)
{
return sliding_window<Iterator, ArraySize, ArrayCount, WindowSize>(first);
}
template <typename KeyType,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize>
struct key_generator : thrust::unary_function<KeyType, thrust::tuple<KeyType,KeyType> >
{
__device__
thrust::tuple<KeyType,KeyType> operator()(std::size_t i0) const
{
auto t = calc_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::make_tuple(thrust::get<0>(t),thrust::get<2>(t));
}
};
template <typename Integer,
std::size_t Base,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize>
struct base_n : thrust::unary_function<thrust::tuple<Integer, Integer>, Integer>
{
__host__ __device__
Integer operator()(const thrust::tuple<Integer, Integer> t) const
{
const auto i = calc_indices<ArraySize, ArrayCount, WindowSize>(thrust::get<0>(t));
// ipow could be optimized by precomputing a lookup table at compile time
const auto result = thrust::get<1>(t)*ipow(Base, thrust::get<3>(i));
return result;
}
// taken from http://stackoverflow.com/a/101613/678093
__host__ __device__ __inline__
Integer ipow(Integer base, Integer exp) const
{
Integer result = 1;
while (exp)
{
if (exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
};
template <std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
typename T,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
__device__ __inline__
thrust::tuple<T,T,T,T> calc_sort_indices(const T& i0)
{
const T i1 = i0 % PerArrayCount;
const T i2 = i0 / PerArrayCount;
const T i3 = i1 % WindowCount;
const T i4 = i1 / WindowCount;
return thrust::make_tuple(i1,i2,i3,i4);
}
template <typename Integer,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
struct pre_sort : thrust::unary_function<Integer, Integer>
{
__device__
Integer operator()(Integer i0) const
{
auto t = calc_sort_indices<ArraySize, ArrayCount,WindowSize>(i0);
const Integer i_result = ( thrust::get<2>(t) * WindowSize + thrust::get<3>(t) ) + thrust::get<1>(t) * PerArrayCount;
return i_result;
}
};
template <typename Integer,
std::size_t ArraySize,
std::size_t ArrayCount,
std::size_t WindowSize,
std::size_t WindowCount = ArraySize - (WindowSize-1),
std::size_t PerArrayCount = WindowSize * WindowCount>
struct generate_sort_keys : thrust::unary_function<Integer, Integer>
{
__device__
thrust::tuple<Integer,Integer> operator()(Integer i0) const
{
auto t = calc_sort_indices<ArraySize, ArrayCount,WindowSize>(i0);
return thrust::make_tuple( thrust::get<1>(t), thrust::get<3>(t));
}
};
template<typename... Iterators>
__host__ __device__
thrust::zip_iterator<thrust::tuple<Iterators...>> zip(Iterators... its)
{
return thrust::make_zip_iterator(thrust::make_tuple(its...));
}
struct calculate_log : thrust::unary_function<std::size_t, float>
{
__host__ __device__
float operator()(std::size_t i) const
{
return i*log10f(i);
}
};
int main()
{
typedef int Integer;
typedef float Real;
const std::size_t array_count = ARRAY_COUNT;
const std::size_t array_size = ARRAY_SIZE;
const std::size_t window_size = WINDOW_SIZE;
const std::size_t window_count = array_size - (window_size-1);
const std::size_t input_size = array_count * array_size;
const std::size_t base = 4;
thrust::device_vector<Integer> input_arrays(input_size);
thrust::counting_iterator<Integer> counting_it(0);
thrust::transform(counting_it,
counting_it + input_size,
input_arrays.begin(),
random_filler<Integer,0,base>());
PRINTER(input_arrays);
const int runs = 100;
auto start = std::chrono::high_resolution_clock::now();
for (int k = 0 ; k < runs; ++k)
{
auto sw = make_sliding_window<array_size, array_count, window_size>(input_arrays.begin());
const std::size_t total_count = window_size * window_count * array_count;
thrust::device_vector<Integer> result(total_count);
thrust::copy(sw.begin(), sw.end(), result.begin());
PRINTER(result);
auto ti_begin = thrust::make_transform_iterator(counting_it, key_generator<Integer, array_size, array_count, window_size>());
auto base_4_ti = thrust::make_transform_iterator(zip(counting_it, sw.begin()), base_n<Integer, base, array_size, array_count, window_size>());
thrust::inclusive_scan_by_key(ti_begin, ti_begin+total_count, base_4_ti, result.begin());
PRINTER(result);
thrust::device_vector<Integer> result_2(total_count);
auto ti_pre_sort = thrust::make_transform_iterator(counting_it, pre_sort<Integer, array_size, array_count, window_size>());
thrust::gather(ti_pre_sort,
ti_pre_sort+total_count,
result.begin(),
result_2.begin());
PRINTER(result_2);
thrust::device_vector<Integer> sort_keys_1(total_count);
thrust::device_vector<Integer> sort_keys_2(total_count);
auto zip_begin = zip(sort_keys_1.begin(),sort_keys_2.begin());
thrust::transform(counting_it,
counting_it+total_count,
zip_begin,
generate_sort_keys<Integer, array_size, array_count, window_size>());
thrust::stable_sort_by_key(result_2.begin(), result_2.end(), zip_begin);
thrust::stable_sort_by_key(zip_begin, zip_begin+total_count, result_2.begin());
PRINTER(result_2);
thrust::device_vector<Integer> key_counts(total_count);
thrust::device_vector<Integer> sort_keys_1_reduced(total_count);
thrust::device_vector<Integer> sort_keys_2_reduced(total_count);
// count how often each sub array occurs
auto zip_count_begin = zip(sort_keys_1.begin(), sort_keys_2.begin(), result_2.begin());
auto new_end = thrust::reduce_by_key(zip_count_begin,
zip_count_begin + total_count,
thrust::constant_iterator<Integer>(1),
zip(sort_keys_1_reduced.begin(), sort_keys_2_reduced.begin(), thrust::make_discard_iterator()),
key_counts.begin()
);
std::size_t new_size = new_end.second - key_counts.begin();
key_counts.resize(new_size);
sort_keys_1_reduced.resize(new_size);
sort_keys_2_reduced.resize(new_size);
PRINTER(key_counts);
PRINTER(sort_keys_1_reduced);
PRINTER(sort_keys_2_reduced);
auto log_ti = thrust::make_transform_iterator (key_counts.begin(), calculate_log());
thrust::device_vector<Real> log_result(new_size);
auto zip_keys_reduced_begin = zip(sort_keys_1_reduced.begin(), sort_keys_2_reduced.begin());
auto log_end = thrust::reduce_by_key(zip_keys_reduced_begin,
zip_keys_reduced_begin + new_size,
log_ti,
zip(sort_keys_1.begin(),thrust::make_discard_iterator()),
log_result.begin()
);
std::size_t final_size = log_end.second - log_result.begin();
log_result.resize(final_size);
sort_keys_1.resize(final_size);
PRINTER(log_result);
thrust::device_vector<Real> final_result(final_size);
auto final_end = thrust::reduce_by_key(sort_keys_1.begin(),
sort_keys_1.begin() + final_size,
log_result.begin(),
thrust::make_discard_iterator(),
final_result.begin(),
thrust::equal_to<Integer>(),
thrust::minimum<Real>()
);
final_result.resize(final_end.second-final_result.begin());
PRINTER(final_result);
}
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start);
std::cout << "took " << duration.count()/runs << "milliseconds" << std::endl;
return 0;
}
compile using
nvcc -std=c++11 conditional_entropy.cu -o benchmark -DARRAY_SIZE=1000 -DARRAY_COUNT=1000 -DWINDOW_SIZE=10 && ./benchmark
This configuration takes 133 milliseconds on my GPU (GTX 680), so around 0.1 milliseconds per array.
The implementation can definitely be optimized, e.g. using a precomputed lookup table for the base-4 conversion and maybe some of the thrust calls can be avoided.

Slicing std::array

Is there an easy way to get a slice of an array in C++?
I.e., I've got
array<double, 10> arr10;
and want to get array consisting of five first elements of arr10:
array<double, 5> arr5 = arr10.???
(other than populating it by iterating through first array)

The constructors for std::array are implicitly defined so you can't initialize it with a another container or a range from iterators. The closest you can get is to create a helper function that takes care of the copying during construction. This allows for single phase initialization which is what I believe you're trying to achieve.
template<class X, class Y>
X CopyArray(const Y& src, const size_t size)
{
X dst;
std::copy(src.begin(), src.begin() + size, dst.begin());
return dst;
}
std::array<int, 5> arr5 = CopyArray<decltype(arr5)>(arr10, 5);
You can also use something like std::copy or iterate through the copy yourself.
std::copy(arr10.begin(), arr10.begin() + 5, arr5.begin());

Sure. Wrote this:
template<int...> struct seq {};
template<typename seq> struct seq_len;
template<int s0,int...s>
struct seq_len<seq<s0,s...>>:
std::integral_constant<std::size_t,seq_len<seq<s...>>::value> {};
template<>
struct seq_len<seq<>>:std::integral_constant<std::size_t,0> {};
template<int Min, int Max, int... s>
struct make_seq: make_seq<Min, Max-1, Max-1, s...> {};
template<int Min, int... s>
struct make_seq<Min, Min, s...> {
typedef seq<s...> type;
};
template<int Max, int Min=0>
using MakeSeq = typename make_seq<Min,Max>::type;
template<std::size_t src, typename T, int... indexes>
std::array<T, sizeof...(indexes)> get_elements( seq<indexes...>, std::array<T, src > const& inp ) {
return { inp[indexes]... };
}
template<int len, typename T, std::size_t src>
auto first_elements( std::array<T, src > const& inp )
-> decltype( get_elements( MakeSeq<len>{}, inp ) )
{
return get_elements( MakeSeq<len>{}, inp );
}
Where the compile time indexes... does the remapping, and MakeSeq makes a seq from 0 to n-1.
Live example.
This supports both an arbitrary set of indexes (via get_elements) and the first n (via first_elements).
Use:
std::array< int, 10 > arr = {0,1,2,3,4,5,6,7,8,9};
std::array< int, 6 > slice = get_elements(arr, seq<2,0,7,3,1,0>() );
std::array< int, 5 > start = first_elements<5>(arr);
which avoids all loops, either explicit or implicit.

2018 update, if all you need is first_elements:
Less boilerplaty solution using C++14 (building up on Yakk's pre-14 answer and stealing from "unpacking" a tuple to call a matching function pointer)
template < std::size_t src, typename T, int... I >
std::array< T, sizeof...(I) > get_elements(std::index_sequence< I... >, std::array< T, src > const& inp)
{
return { inp[I]... };
}
template < int N, typename T, std::size_t src >
auto first_elements(std::array<T, src > const& inp)
-> decltype(get_elements(std::make_index_sequence<N>{}, inp))
{
return get_elements(std::make_index_sequence<N>{}, inp);
}
Still cannot explain why this works, but it does (for me on Visual Studio 2017).

This answer might be late... but I was just toying around with slices - so here is my little home brew of std::array slices.
Of course, this comes with a few restrictions and is not ultimately general:
The source array from which a slice is taken must not go out of scope. We store a reference to the source.
I was looking for constant array slices first and did not try to expand this code to both const and non const slices.
But one nice feature of the code below is, that you can take slices of slices...
// ParCompDevConsole.cpp : This file contains the 'main' function. Program execution begins and ends there.
//
#include "pch.h"
#include <cstdint>
#include <iostream>
#include <array>
#include <stdexcept>
#include <sstream>
#include <functional>
template <class A>
class ArraySliceC
{
public:
using Array_t = A;
using value_type = typename A::value_type;
using const_iterator = typename A::const_iterator;
ArraySliceC(const Array_t & source, size_t ifirst, size_t length)
: m_ifirst{ ifirst }
, m_length{ length }
, m_source{ source }
{
if (source.size() < (ifirst + length))
{
std::ostringstream os;
os << "ArraySliceC::ArraySliceC(<source>,"
<< ifirst << "," << length
<< "): out of bounds. (ifirst + length >= <source>.size())";
throw std::invalid_argument( os.str() );
}
}
size_t size() const
{
return m_length;
}
const value_type& at( size_t index ) const
{
return m_source.at( m_ifirst + index );
}
const value_type& operator[]( size_t index ) const
{
return m_source[m_ifirst + index];
}
const_iterator cbegin() const
{
return m_source.cbegin() + m_ifirst;
}
const_iterator cend() const
{
return m_source.cbegin() + m_ifirst + m_length;
}
private:
size_t m_ifirst;
size_t m_length;
const Array_t& m_source;
};
template <class T, size_t SZ>
std::ostream& operator<<( std::ostream& os, const std::array<T,SZ>& arr )
{
if (arr.size() == 0)
{
os << "[||]";
}
else
{
os << "[| " << arr.at( 0 );
for (auto it = arr.cbegin() + 1; it != arr.cend(); it++)
{
os << "," << (*it);
}
os << " |]";
}
return os;
}
template<class A>
std::ostream& operator<<( std::ostream& os, const ArraySliceC<A> & slice )
{
if (slice.size() == 0)
{
os << "^[||]";
}
else
{
os << "^[| " << slice.at( 0 );
for (auto it = slice.cbegin() + 1; it != slice.cend(); it++)
{
os << "," << (*it);
}
os << " |]";
}
return os;
}
template<class A>
A unfoldArray( std::function< typename A::value_type( size_t )> producer )
{
A result;
for (size_t i = 0; i < result.size(); i++)
{
result[i] = producer( i );
}
return result;
}
int main()
{
using A = std::array<float, 10>;
auto idf = []( size_t i ) -> float { return static_cast<float>(i); };
const auto values = unfoldArray<A>(idf);
std::cout << "values = " << values << std::endl;
// zero copy slice of values array.
auto sl0 = ArraySliceC( values, 2, 4 );
std::cout << "sl0 = " << sl0 << std::endl;
// zero copy slice of the sl0 (the slice of values array)
auto sl01 = ArraySliceC( sl0, 1, 2 );
std::cout << "sl01 = " << sl01 << std::endl;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to rewrite this code without using boost? - c++

Related

merging multiple arrays using boost::join

Eigen - Concatenated matrix as a reference

function as template parameter : if(T receive 2 param)T(a,b); else T(a);

CUDA Thrust - Counting matching subarrays

Slicing std::array

Categories

Resources