Understanding on User Defined function - c++

Create a UserArray of bit fields which can be declared as follows: The size occupied by our Array will be less then a normal array. Suppose we want an ARRAY of 20 FLAGs (TRUE/FALSE). A bool FLAG[20] will take 20 bytes of memory, while UserArray<bool,bool,0,20> will take 4 bytes of memory.
Use class Template to create user array.
Use Bit wise operators to pack the array.
Equality operation should also be implemented.
template<class T,int W,int L,int H>//i have used template<class T>
//but never used such way
class UserArray{
//....
};
typedef UserArray<bool,4,0,20> MyType;
where:
T = type of an array element
W = width of an array element, 0 < W < 8
L = low bound of array index (preferably zero)
H = high bound of array index
A main program:
int main() {
MyType Display; //typedef UserArray<T,W,L,H> MyType; defined above
Display[0] = FALSE; //need to understand that how can we write this?
Display[1] = TRUE; //need to understand that how can we write this?
//assert(Display[0]);//commented once, need to understand above code first
//assert(Display[1]);//commented once..
//cout << “Size of the Display” << sizeof(Display);//commented once..
}
My doubt is how those parameters i.e T,L,W & H are used in class UserArray and how can we write instance of UserArray as Display[0] & Display[1] what does it represent?
Short & simple example of similar type will be easy for me to understand.

W, L and H are non-type template parameters. You can instantiate a template (at compile-time) with constant values, e.g.:
template <int N>
class MyArray
{
public:
float data[N];
void print() { std::cout << "MyArray of size " << N << std::endl; }
};
MyArray<7> foo;
MyArray<8> bar;
foo.print(); // "MyArray of size 7"
bar.print(); // "MyArray of size 8"
In the example above, everywhere that N appears in the template definition, it will be replaced at compile-time by the supplied constant.
Note that MyArray<7> and MyArray<8> are completely different types as far as the compile is concerned.
I have no idea what the solution to your specific problem is. But your code won't compile, currently, as you have not provided values for the template parameters.

This is not simple, particularly as you can have variable bit widths.
<limits.h> has a constant CHAR_BIT, which is the number of bits in a byte. Usually this is 8, but it could be greater than 8 (not less though).
I suggest the number of elements per byte be CHAR_BIT / W. This might waste a few bits for example, if width is 3 and CHAR_BIT is 8, but this is complicated enough as is.
You'll then need to define operator[] to access the elements, and likely need to do some bit fiddling to do this. For the non-const version of operator[], you'll probably have to return some sort of proxy object when there are more than one elements in a byte, and have its operator= overridden so it writes back to the appropriate spot in the array.
It's a good exercise though to figure this one out though.

Here's some code that implements what you ask for, except the lower bound is fixed at 0. It also shows a rare use case for the address_of operator. You could take this further and make this container compatible with STL algorithms if you liked.
#include <iostream>
#include <limits.h>
#include <stddef.h>
template<class T, size_t WIDTH, size_t SIZE>
class UserArray;
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayProxy;
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayAddressProxy
{
public:
typedef UserArray<T, WIDTH, SIZE> array_type;
typedef UserArrayProxy<T, WIDTH, SIZE> proxy_type;
typedef UserArrayAddressProxy<T, WIDTH, SIZE> this_type;
UserArrayAddressProxy(array_type& a_, size_t i_) : a(a_), i(i_) {}
UserArrayAddressProxy(const this_type& x) : a(x.a), i(x.i) {}
proxy_type operator*() { return proxy_type(a, i); }
this_type& operator+=(size_t n) { i += n; return *this; }
this_type& operator-=(size_t n) { i -= n; return *this; }
this_type& operator++() { ++i; return *this; }
this_type& operator--() { --i; return *this; }
this_type operator++(int) { this_type x = *this; ++i; return x; }
this_type operator--(int) { this_type x = *this; --i; return x; }
this_type operator+(size_t n) const { this_type x = *this; x += n; return x; }
this_type operator-(size_t n) const { this_type x = *this; x -= n; return x; }
bool operator==(const this_type& x) { return (&a == &x.a) && (i == x.i); }
bool operator!=(const this_type& x) { return !(*this == x); }
private:
array_type& a;
size_t i;
};
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayProxy
{
public:
static const size_t BITS_IN_T = sizeof(T) * CHAR_BIT;
static const size_t ELEMENTS_PER_T = BITS_IN_T / WIDTH;
static const size_t NUMBER_OF_TS = (SIZE - 1) / ELEMENTS_PER_T + 1;
static const T MASK = (1 << WIDTH) - 1;
typedef UserArray<T, WIDTH, SIZE> array_type;
typedef UserArrayProxy<T, WIDTH, SIZE> this_type;
typedef UserArrayAddressProxy<T, WIDTH, SIZE> address_proxy_type;
UserArrayProxy(array_type& a_, int i_) : a(a_), i(i_) {}
this_type& operator=(T x)
{
a.write(i, x);
return *this;
}
address_proxy_type operator&() { return address_proxy_type(a, i); }
operator T()
{
return a.get(i);
}
private:
array_type& a;
size_t i;
};
template<class T, size_t WIDTH, size_t SIZE>
class UserArray
{
public:
typedef UserArrayAddressProxy<T, WIDTH, SIZE> ptr_t;
static const size_t BITS_IN_T = sizeof(T) * CHAR_BIT;
static const size_t ELEMENTS_PER_T = BITS_IN_T / WIDTH;
static const size_t NUMBER_OF_TS = (SIZE - 1) / ELEMENTS_PER_T + 1;
static const T MASK = (1 << WIDTH) - 1;
T operator[](size_t i) const
{
return get(i);
}
UserArrayProxy<T, WIDTH, SIZE> operator[](size_t i)
{
return UserArrayProxy<T, WIDTH, SIZE>(*this, i);
}
friend class UserArrayProxy<T, WIDTH, SIZE>;
private:
void write(size_t i, T x)
{
T& element = data[i / ELEMENTS_PER_T];
int offset = (i % ELEMENTS_PER_T) * WIDTH;
x &= MASK;
element &= ~(MASK << offset);
element |= x << offset;
}
T get(size_t i)
{
return (data[i / ELEMENTS_PER_T] >> ((i % ELEMENTS_PER_T) * WIDTH)) & MASK;
}
T data[NUMBER_OF_TS];
};
int main()
{
typedef UserArray<int, 6, 20> myarray_t;
myarray_t a;
std::cout << "Sizeof a in bytes: " << sizeof(a) << std::endl;
for (size_t i = 0; i != 20; ++i) { a[i] = i; }
for (size_t i = 0; i != 20; ++i) { std::cout << a[i] << std::endl; }
std::cout << "We can even use address_of operator: " << std::endl;
for (myarray_t::ptr_t e = &a[0]; e != &a[20]; ++e) { std::cout << *e << std::endl; }
}

Related

Combining multiple for loops into single iterator

Say I have a nest for loop like
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
function_doing_stuff(std::make_tuple(x, y, z));
}
}
}
and would like to transform it into
MyRange range(xstart,xend,ystart,yend, zstart,zend);
for (auto point : range){
function_doing_stuff(point);
}
How would I write the MyRange class to be as efficient as the nested for loops?
The motivation for this is to be able to use std algorithms (such as transform, accumulate, etc), and to create code that is largely dimension agnostic.
By having an iterator, it would be easy to create templated functions that operate over a range of 1d, 2d or 3d points.
Code base is currently C++14.
EDIT:
Writing clear questions is hard. I'll try to clarify.
My problem is not writing an iterator, that I can do. Instead, the problem is one of performance: Is it possible to make an iterator that is as fast as the nested for loops?
With range/v3, you may do
auto xs = ranges::view::iota(xstart, xend);
auto ys = ranges::view::iota(ystart, yend);
auto zs = ranges::view::iota(zstart, zend);
for (const auto& point : ranges::view::cartesian_product(xs, ys, zs)){
function_doing_stuff(point);
}
You can introduce your own class as
class myClass {
public:
myClass (int x, int y, int z):m_x(x) , m_y(y), m_z(z){};
private:
int m_x, m_y, m_z;
}
and then initialize a std::vector<myClass> with your triple loop
std::vector<myClass> myVec;
myVec.reserve((xend-xstart)*(yend-ystart)*(zend-zstart)); // alloc memory only once;
for (int x = ystart; x < xend; x++){
for (int y = xstart; y < yend; y++){ // I assume you have a copy paste error here
for (int z = zstart; z < zend; z++){
myVec.push_back({x,y,z})
}
}
}
Finally, you can use all the nice std algorithms with the std::vector<myClass> myVec. With the syntactic sugar
using MyRange = std::vector<MyClass>;
and
MyRange makeMyRange(int xstart, int xend, int ystart, int yend, int zstart,int zend) {
MyRange myVec;
// loop from above
return MyRange;
}
you can write
const MyRange range = makeMyRange(xstart, xend, ystart, yend, zstart, zend);
for (auto point : range){
function_doing_stuff(point);
}
With the new move semantics this wont create unneeded copies. Please note, that the interface to this function is rather bad. Perhaps rather use 3 pairs of int, denoting the x,y,z interval.
Perhaps you change the names to something meaningful (e.g.myClass could be Point).
Another option, which directly transplants whatever looping code, is to use a Coroutine. This emulates yield from Python or C#.
using point = std::tuple<int, int, int>;
using coro = boost::coroutines::asymmetric_coroutine<point>;
coro::pull_type points(
[&](coro::push_type& yield){
for (int x = xstart; x < xend; x++){
for (int y = ystart; y < yend; y++){
for (int z = zstart; z < zend; z++){
yield(std::make_tuple(x, y, z));
}
}
}
});
for(auto p : points)
function_doing_stuff(p);
Since you care about performance, you should forget about combining iterators for the foreseeable future. The central problem is that compilers cannot yet untangle the mess and figure out that there are 3 independent variables in it, much less perform any loop interchange or unrolling or fusion.
If you must use ranges, use simple ones that the compiler can see through:
for (int const x : boost::irange<int>(xstart,xend))
for (int const y : boost::irange<int>(ystart,yend))
for (int const z : boost::irange<int>(zstart,zend))
function_doing_stuff(x, y, z);
Alternatively, you can actually pass your functor and the boost ranges to a template:
template <typename Func, typename Range0, typename Range1, typename Range2>
void apply_ranges (Func func, Range0 r0, Range1 r1, Range2 r2)
{
for (auto const i0 : r0)
for (auto const i1 : r1)
for (auto const i2 : r2)
func (i0, i1, i2);
}
If you truly care about performance, then you should not contort your code with complicated ranges, because they make it harder to untangle later when you want to rewrite them in AVX intrinsics.
Here's a bare-bones implementation that does not use any advanced language features or other libraries. The performance should be pretty close to the for loop version.
#include <tuple>
class MyRange {
public:
typedef std::tuple<int, int, int> valtype;
MyRange(int xstart, int xend, int ystart, int yend, int zstart, int zend): xstart(xstart), xend(xend), ystart(ystart), yend(yend), zstart(zstart), zend(zend) {
}
class iterator {
public:
iterator(MyRange &c): me(c) {
curvalue = std::make_tuple(me.xstart, me.ystart, me.zstart);
}
iterator(MyRange &c, bool end): me(c) {
curvalue = std::make_tuple(end ? me.xend : me.xstart, me.ystart, me.zstart);
}
valtype operator*() {
return curvalue;
}
iterator &operator++() {
if (++std::get<2>(curvalue) == me.zend) {
std::get<2>(curvalue) = me.zstart;
if (++std::get<1>(curvalue) == me.yend) {
std::get<1>(curvalue) = me.ystart;
++std::get<0>(curvalue);
}
}
return *this;
}
bool operator==(const iterator &other) const {
return curvalue == other.curvalue;
}
bool operator!=(const iterator &other) const {
return curvalue != other.curvalue;
}
private:
MyRange &me;
valtype curvalue;
};
iterator begin() {
return iterator(*this);
}
iterator end() {
return iterator(*this, true);
}
private:
int xstart, xend;
int ystart, yend;
int zstart, zend;
};
And an example of usage:
#include <iostream>
void display(std::tuple<int, int, int> v) {
std::cout << "(" << std::get<0>(v) << ", " << std::get<1>(v) << ", " << std::get<2>(v) << ")\n";
}
int main() {
MyRange c(1, 4, 2, 5, 7, 9);
for (auto v: c) {
display(v);
}
}
I've left off things like const iterators, possible operator+=, decrementing, post increment, etc. They've been left as an exercise for the reader.
It stores the initial values, then increments each value in turn, rolling it back and incrementing the next when it get to the end value. It's a bit like incrementing a multi-digit number.
Using boost::iterator_facade for simplicity, you can spell out all the required members.
First we have a class that iterates N-dimensional indexes as std::array<std::size_t, N>
template<std::size_t N>
class indexes_iterator : public boost::iterator_facade<indexes_iterator, std::array<std::size_t, N>>
{
public:
template<typename... Dims>
indexes_iterator(Dims... dims) : dims{ dims... }, values{} {}
private:
friend class boost::iterator_core_access;
void increment() { advance(1); }
void decrement() { advance(-1); }
void advance(int n)
{
for (std::size_t i = 0; i < N; ++i)
{
int next = ((values[i] + n) % dims[i]);
n = (n \ dims[i]) + (next < value);
values[i] = next;
}
}
std::size_t distance(indexes_iterator const & other) const
{
std::size_t result = 0, mul = 1;
for (std::size_t i = 0; i < dims; ++i)
{
result += mul * other[i] - values[i];
mul *= ends[i];
}
}
bool equal(indexes_iterator const& other) const
{
return values == other.values;
}
std::array<std::size_t, N> & dereference() const { return values; }
std::array<std::size_t, N> ends;
std::array<std::size_t, N> values;
}
Then we use that to make something similar to a boost::zip_iterator, but instead of advancing all together we add our indexes.
template <typename... Iterators>
class product_iterator : public boost::iterator_facade<product_iterator<Iterators...>, const std::tuple<decltype(*std::declval<Iterators>())...>, boost::random_access_traversal_tag>
{
using ref = std::tuple<decltype(*std::declval<Iterators>())...>;
public:
product_iterator(Iterators ... ends) : indexes() , iterators(std::make_tuple(ends...)) {}
template <typename ... Sizes>
product_iterator(Iterators ... begins, Sizes ... sizes)
: indexes(sizes...),
iterators(begins...)
{}
private:
friend class boost::iterator_core_access;
template<std::size_t... Is>
ref dereference_impl(std::index_sequence<Is...> idxs) const
{
auto offs = offset(idxs);
return { *std::get<Is>(offs)... };
}
ref dereference() const
{
return dereference_impl(std::index_sequence_for<Iterators...>{});
}
void increment() { ++indexes; }
void decrement() { --indexes; }
void advance(int n) { indexes += n; }
template<std::size_t... Is>
std::tuple<Iterators...> offset(std::index_sequence<Is...>) const
{
auto idxs = *indexes;
return { (std::get<Is>(iterators) + std::get<Is>(idxs))... };
}
bool equal(product_iterator const & other) const
{
return offset(std::index_sequence_for<Iterators...>{})
== other.offset(std::index_sequence_for<Iterators...>{});
}
indexes_iterator<sizeof...(Iterators)> indexes;
std::tuple<Iterators...> iterators;
};
Then we wrap it up in a boost::iterator_range
template <typename... Ranges>
auto make_product_range(Ranges&&... rngs)
{
product_iterator<decltype(begin(rngs))...> b(begin(rngs)..., std::distance(std::begin(rngs), std::end(rngs))...);
product_iterator<decltype(begin(rngs))...> e(end(rngs)...);
return boost::iterator_range<product_iterator<decltype(begin(rngs))...>>(b, e);
}
int main()
{
using ranges::view::iota;
for (auto p : make_product_range(iota(xstart, xend), iota(ystart, yend), iota(zstart, zend)))
// ...
return 0;
}
See it on godbolt
Just a very simplified version that will be as efficient as a for loop:
#include <tuple>
struct iterator{
int x;
int x_start;
int x_end;
int y;
int y_start;
int y_end;
int z;
constexpr auto
operator*() const{
return std::tuple{x,y,z};
}
constexpr iterator&
operator++ [[gnu::always_inline]](){
++x;
if (x==x_end){
x=x_start;
++y;
if (y==y_end) {
++z;
y=y_start;
}
}
return *this;
}
constexpr iterator
operator++(int){
auto old=*this;
operator++();
return old;
}
};
struct sentinel{
int z_end;
friend constexpr bool
operator == (const iterator& x,const sentinel& s){
return x.z==s.z_end;
}
friend constexpr bool
operator == (const sentinel& a,const iterator& x){
return x==a;
}
friend constexpr bool
operator != (const iterator& x,const sentinel& a){
return !(x==a);
}
friend constexpr bool
operator != (const sentinel& a,const iterator& x){
return !(x==a);
}
};
struct range{
iterator start;
sentinel finish;
constexpr auto
begin() const{
return start;
}
constexpr auto
end()const{
return finish;
}
};
void func(int,int,int);
void test(const range& r){
for(auto [x,y,z]: r)
func(x,y,z);
}
void test(int x_start,int x_end,int y_start,int y_end,int z_start,int z_end){
for(int z=z_start;z<z_end;++z)
for(int y=y_start;y<y_end;++y)
for(int x=x_start;x<x_end;++x)
func(x,y,z);
}
The advantage over 1201ProgramAlarm answer is the faster test performed at each iteration thanks to the use of a sentinel.

How to implement a tensor class for Kronecker-Produkt [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently I came across an interesting article what's called the Kronecker-Produkt. At the same time I'm working on my neural network library.
So that my algorithm works, I need a tensor class, where I can get the product of two tensor's with an overloaded * operator.
Consider the following example/questions:
How to efficiently construct/store the nested matrices?
How to perform the product of two tensor's?
How to visualize tensor c as simply as possible?
My class 3 tensor which currently only supports 3 dimensions:
#pragma once
#include <iostream>
#include <sstream>
#include <random>
#include <cmath>
#include <iomanip>
template<typename T>
class tensor {
public:
const unsigned int x, y, z, s;
tensor(unsigned int x, unsigned int y, unsigned int z, T val) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
for (unsigned int i = 0; i < s; i++) p_data[i] = val;
}
tensor(const tensor<T> & other) : x(other.x), y(other.y), z(other.z), s(other.s) {
p_data = new T[s];
memcpy(p_data, other.get_data(), s * sizeof(T));
}
~tensor() {
delete[] p_data;
p_data = nullptr;
}
T * get_data() {
return p_data;
}
static tensor<T> * random(unsigned int x, unsigned int y, unsigned int z, T val, T min, T max) {
tensor<T> * p_tensor = new tensor<T>(x, y, z, val);
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<T> dist(min, max);
for (unsigned int i = 0; i < p_tensor->s; i++) {
T rnd = dist(mt);
while (abs(rnd) < 0.001) rnd = dist(mt);
p_tensor->get_data()[i] = rnd;
}
return p_tensor;
}
static tensor<T> * from(std::vector<T> * p_data, T val) {
tensor<T> * p_tensor = new tensor<T>(p_data->size(), 1, 1, val);
for (unsigned int i = 0; i < p_tensor->get_x(); i++) p_tensor->set_data(i + 0 * p_tensor->get_x() * + 0 * p_tensor->get_x() * p_tensor->get_y(), p_data->at(i));
return p_tensor;
}
friend std::ostream & operator <<(std::ostream & stream, tensor<T> & tensor) {
stream << "(" << tensor.x << "," << tensor.y << "," << tensor.z << ") Tensor\n";
for (unsigned int i = 0; i < tensor.x; i++) {
for (unsigned int k = 0; k < tensor.z; k++) {
stream << "[";
for (unsigned int j = 0; j < tensor.y; j++) {
stream << std::setw(5) << roundf(tensor(i, j, k) * 1000) / 1000;
if (j + 1 < tensor.y) stream << ",";
}
stream << "]";
}
stream << std::endl;
}
return stream;
}
tensor<T> & operator +(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator -(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
tensor<T> & operator *(tensor<T> & other) {
tensor<T> result(*this);
return result;
}
T & operator ()(unsigned int i, unsigned int j, unsigned int k) {
return p_data[i + (j * x) + (k * x * y)];
}
T & operator ()(unsigned int i) {
return p_data[i];
}
private:
T * p_data = nullptr;
};
int main() {
tensor<double> * p_tensor_input = tensor<double>::random(6, 2, 3, 0.0, 0.0, 1.0);
tensor<double> * p_tensor_weight = tensor<double>::random(2, 6, 3, 0.0, 0.0, 1.0);
std::cout << *p_tensor_input << std::endl;
std::cout << *p_tensor_weight << std::endl;
tensor<double> p_tensor_output = *p_tensor_input + *p_tensor_weight;
return 0;
}
Your first step is #2 -- and get it correct.
After that, optimize.
Start with a container C<T>.
Define some operations on it. wrap(T) returns a C<T> containing that T. map takes a C<T> and a function on T U f(T) and returns C<U>. flatten takes a C<C<U>> and returns a C<U>.
Define scale( T, C<T> ) which takes a T and a C<T> and returns a C<T> with the elements scaled. Aka, scalar multiplication.
template<class T>
C<T> scale( T scalar, C<T> container ) {
return map( container, [&](T t){ return t*scalar; } );
}
Then we have:
template<class T>
C<T> tensor( C<T> lhs, C<T> rhs ) {
return flatten( map( lhs, [&](T t) { return scale( t, rhs ); } ) );
}
is your tensor product. And yes, that can be your actual code. I would tweak it a bit for efficiency.
(Note I used different terms, but I'm basically describing monadic operations using different words.)
After you have this, test, optimize, and iterate.
As for 3, the result of tensor products get large and complex, there is no simple visualization for a large tensor.
Oh, and keep things simple and store data in a std::vector to start.
Here are some tricks for efficient vectors i learned in class, but they should be equally good for a tensor.
Define an empty constructor and assignment operator. For example
tensor(unsigned int x, unsigned int y, unsigned int z) : x(x), y(y), z(z), s(x * y * z) {
p_data = new T[s];
}
tensor& operator=( tensor const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
template <typename T>
tensor& operator=( T const& that ) {
for (int i=0; i<size(); ++i) {
p_data[i] = that(i) ;
}
return *this ;
}
Now we can implement things like addition and scaling with deferred evaluation. For example:
template<typename T1, typename T2>
class tensor_sum {
//add value_type to base tensor class for this to work
typedef decltype( typename T1::value_type() + typename T2::value_type() ) value_type ;
//also add function to get size of tensor
value_type operator()( int i, int j, int k ) const {
return t1_(i,j,k) + v2_(i,j,k) ;
}
value_type operator()( int i ) const {
return t1_(i) + v2_(i) ;
}
private:
T1 const& t1_;
T2 const& t2_;
}
template <typename T1, typename T2>
tensor_sum<T1,T2> operator+(T1 const& t1, T2 const& t2 ) {
return vector_sum<T1,T2>(t1,t2) ;
}
This tensor_sum behaves exactly like any normal tensor, except that we don't have to allocate memory to store the result. So we can do something like this:
tensor<double> t0(...);
tensor<double> t1(...);
tensor<double> t2(...);
tensor<double> result(...); //define result to be empty, we will fill it later
result = t0 + t1 + 5.0*t2;
The compiler should optimize this to be just one loop, without storing intermediate results or modifying the original tensors. You can do the same thing for scaling and the kronecker product. Depending on what you want to do with the tensors, this can be a big advantage. But be careful, this isn't always the best option.
When implementing the kronecker product you should be careful of the of the ordering of your loop, try to go through the tensors in the order they are stored for cache efficiency.

character string to bitset in C++

I am still kind of new to C++ and am trying to figure out what I am not able to pass a value correctly to a bitset, at least I suspect that is what the problem is.
I wrote a small function to assist in flipping the bits of a hex value to reverse the endian. So example would be input 0x01 and it would return 0x80.
This is the code I wrote.
int flipBits(char msd, char lsd) {
char ch[5];
sprintf_s(ch, "0x%d%d", msd, lsd);
char buffer[5];
strncpy_s(buffer, ch, 4);
cout << ch << endl;
cout << buffer << endl;
bitset<8> x(buffer);
bitset<8> y;
for (int i = 0; i < 8; i++) {
y[i] = x[7 - i];
}
cout << y << endl; // print the reversed bit order
int b = y.to_ulong(); // convert the binary to int
cout << b << endl; // print the int
cout << hex << b << endl; // print the hex
return b;
}
I tried adding the strncpy because I thought maybe the null terminator from sprintf was not working properly with the bitset. If in the line
bitset<8> x(buffer);
I replace buffer with a hex value, say for example 0x01, then it works and prints out 0x80 as I would expect, but if I try to pass in the value with the buffer it doesn't work.
We can write a stl-like container wrapper such that we can write:
int main() {
std::bitset<8> x(0x01);
auto container = make_bit_range(x);
std::reverse(container.begin(), container.end());
std::cout << x << std::endl;
}
and expect the output:
10000000
full code:
#include <iostream>
#include <bitset>
#include <algorithm>
template<std::size_t N>
struct bit_reference {
bit_reference(std::bitset<N>& data, int i) : data_(data), i_(i) {}
operator bool() const { return data_[i_]; }
bit_reference& operator=(bool x) {
data_[i_] = x;
return *this;
}
std::bitset<N>& data_;
int i_;
};
template<std::size_t N>
void swap(bit_reference<N> l, bit_reference<N> r) {
auto lv = bool(l);
auto rv = bool(r);
std::swap(lv, rv);
l = lv;
r = rv;
}
template<std::size_t N>
struct bit_range {
using bitset_type = std::bitset<N>;
bit_range(bitset_type &data) : data_(data) {}
struct iterator {
using iterator_category = std::bidirectional_iterator_tag;
using value_type = bit_reference<N>;
using difference_type = int;
using pointer = value_type *;
using reference = value_type &;
iterator(bitset_type &data, int i) : data_(data), i_(i) {}
bool operator==(iterator const &r) const { return i_ == r.i_; }
bool operator!=(iterator const &r) const { return i_ != r.i_; }
iterator &operator--() {
return update(i_ - 1);
}
iterator &operator++() {
return update(i_ + 1);
}
value_type operator*() const {
return bit_reference<N>(data_, i_);
}
private:
auto update(int pos) -> iterator & {
i_ = pos;
return *this;
}
private:
bitset_type &data_;
int i_;
};
auto begin() const { return iterator(data_, 0); }
auto end() const { return iterator(data_, int(data_.size())); }
private:
bitset_type &data_;
};
template<std::size_t N>
auto make_bit_range(std::bitset<N> &data) {
return bit_range<N>(data);
}
int main() {
std::bitset<8> x(0x01);
auto container = make_bit_range(x);
std::reverse(container.begin(), container.end());
std::cout << x << std::endl;
}
also plenty of fun algorithms here: Best Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C

Different return and coordinate types in nanoflann radius search

I'm trying to use nanoflann in a project and am looking at the vector-of-vector and radius search examples.
I can't find a way to perform a radius search with a different data type than the coordinate type. For example, my coordinates are vectors of uint8_t; I am trying to input a radius of type uint32_t with little success.
I see in the source that the metric_L2 struct (which I am using for distance) uses the L2_Adaptor with two template parameters. L2_Adaptor itself takes three parameters, with the third defaulted to the first, which seems to be the problem if I am understanding the code correctly. However, trying to force use of the third always results in 0 matches in the radius search.
Is there a way to do this?
Edit: In the same code below, everything works. However, if I change the search_radius (and ret_matches) to uint32_t, the radiusSearch method doesn't work.
#include <iostream>
#include <Eigen/Dense>
#include <nanoflann.hpp>
typedef Eigen::Matrix<uint8_t, Eigen::Dynamic, 1> coord_t;
using namespace nanoflann;
struct Point
{
coord_t address;
Point() {}
Point(uint8_t coordinates) : address(coord_t::Random(coordinates)) {}
};
struct Container
{
std::vector<Point> points;
Container(uint8_t coordinates, uint32_t l)
: points(l)
{
for(auto& each_location: points)
{
each_location = Point(coordinates);
}
}
};
struct ContainerAdaptor
{
typedef ContainerAdaptor self_t;
typedef nanoflann::metric_L2::traits<uint8_t, self_t>::distance_t metric_t;
typedef KDTreeSingleIndexAdaptor<metric_t, self_t, -1, size_t> index_t;
index_t *index;
const Container &container;
ContainerAdaptor(const int dimensions, const Container &container, const int leaf_max_size = 10)
: container(container)
{
assert(container.points.size() != 0 && container.points[0].address.rows() != 0);
const size_t dims = container.points[0].address.rows();
index = new index_t(dims, *this, nanoflann::KDTreeSingleIndexAdaptorParams(leaf_max_size));
index->buildIndex();
}
~ContainerAdaptor()
{
delete index;
}
inline void query(const uint8_t *query_point, const size_t num_closest, size_t *out_indices, uint32_t *out_distances_sq, const int ignoreThis = 10) const
{
nanoflann::KNNResultSet<uint32_t, size_t, size_t> resultSet(num_closest);
resultSet.init(out_indices, out_distances_sq);
index->findNeighbors(resultSet, query_point, nanoflann::SearchParams());
}
const self_t& derived() const
{
return *this;
}
self_t& derived()
{
return *this;
}
inline size_t kdtree_get_point_count() const
{
return container.points.size();
}
inline size_t kdtree_distance(const uint8_t *p1, const size_t idx_p2, size_t size) const
{
size_t s = 0;
for (size_t i = 0; i < size; i++)
{
const uint8_t d = p1[i] - container.points[idx_p2].address[i];
s += d * d;
}
return s;
}
inline coord_t::Scalar kdtree_get_pt(const size_t idx, int dim) const
{
return container.points[idx].address[dim];
}
template <class BBOX>
bool kdtree_get_bbox(BBOX & bb) const
{
for(size_t i = 0; i < bb.size(); i++)
{
bb[i].low = 0;
bb[i].high = UINT8_MAX;
}
return true;
}
};
void container_demo(const size_t points, const size_t coordinates)
{
Container s(coordinates, points);
coord_t query_pt(coord_t::Random(coordinates));
typedef ContainerAdaptor my_kd_tree_t;
my_kd_tree_t mat_index(coordinates, s, 25);
mat_index.index->buildIndex();
const uint8_t search_radius = static_cast<uint8_t>(100);
std::vector<std::pair<size_t, uint8_t>> ret_matches;
nanoflann::SearchParams params;
const size_t nMatches = mat_index.index->radiusSearch(query_pt.data(), search_radius, ret_matches, params);
for (size_t i = 0; i < nMatches; i++)
{
std::cout << "idx[" << i << "]=" << +ret_matches[i].first << " dist[" << i << "]=" << +ret_matches[i].second << std::endl;
}
std::cout << std::endl;
std::cout << "radiusSearch(): radius=" << +search_radius << " -> " << +nMatches << " matches" << std::endl;
}
int main()
{
container_demo(1e6, 32);
return 0;
}
More info: so it seems that the distance type, which the third parameter of the L2_Adaptor, must be a signed type. Changing the metric_t typedef to the following solves the problem if search_radius and ret_matches are also changed to int64_t.
typedef L2_Adaptor<uint8_t, self_t, int64_t> metric_t;

More efficient way with multi-dimensional arrays (matrices) using C++

I coded a small simple program that compares the performances of several ways to fill a simple 8x8 matrix with different kind of containers. Here's the following code :
#define MATRIX_DIM 8
#define OCCUR_MAX 100000
static void genHeapAllocatedMatrix(void)
{
int **pPixels = new Pixel *[MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
pPixels[idy] = new Pixel[MATRIX_DIM];
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++)
pPixels[idy][idx] = 42;
}
}
static void genStackAllocatedMatrix(void)
{
std::array<std::array<int, 8>, 8> matrix;
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
matrix[idy][idx] = 42;
}
}
}
static void genStackAllocatedMatrixBasic(void)
{
int matrix[MATRIX_DIM][MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
matrix[idy][idx] = 42;
}
}
}
int main(void)
{
clock_t begin, end;
double time_spent;
begin = clock();
for (type::uint32 idx = 0; idx < OCCUR_MAX; idx++)
{
//genHeapAllocatedMatrix();
genStackAllocatedMatrix();
//genStackAllocatedMatrixBasic();
}
end = clock();
time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
std::cout << "Elapsed time = " << time_spent << std::endl;
return (0);
}
As you can guess the more efficient way is the last one with a simple two-dimentional C array (hard-coded). Of course the worse choice is the number one using heap allocations.
My problem is I want to stock this 2-dimensional array as an attribute in a class. Here's a definition of a custom class that handle a matrix :
template <typename T>
class Matrix
{
public:
Matrix(void);
Matrix(type::uint32 column, type::uint32 row);
Matrix(Matrix const &other);
virtual ~Matrix(void);
public:
Matrix &operator=(Matrix const &other);
bool operator!=(Matrix const &other);
bool operator==(Matrix const &other);
type::uint32 rowCount(void) const;
type::uint32 columnCount(void) const;
void printData(void) const;
T **getData(void) const;
void setData(T **matrix);
private:
type::uint32 m_ColumnCount;
type::uint32 m_RowCount;
T **m_pMatrix;
};
To do the job done I tried the following thing using a cast :
Matrix<int> matrix;
int tab[MATRIX_DIM][MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
tab[idy][idx] = 42;
}
}
matrix.setData((int**)&tab[0][0]);
This code compiles correctly but if I want to print it there's a segmentation fault.
int tab[MATRIX_DIM][MATRIX_DIM];
for (type::uint32 idy = 0; idy < MATRIX_DIM; idy++) {
for (type::uint32 idx = 0; idx < MATRIX_DIM; idx++) {
tab[idy][idx] = 42;
}
}
int **matrix = (int**)&tab[0][0];
std::cout << matrix[0][0] << std::endl; //Segmentation fault
Is there a possible way to stock this kind of two dimentional array as an attribute without heap allocation?
That's because a two-dimensional array is not an array of pointers.
So, you should use int * for your matrix type, but then of course you will not be able to index it by two dimensions.
Another option is to store a pointer to the array:
int (*matrix)[MATRIX_DIM][MATRIX_DIM];
matrix = &tab;
std::cout << (*matrix)[0][0] << std::endl;
But that doesn't suit well an idea of incapsulating matrix in a class. F better idea would be for the class to allocate the storage itself (possibly in a single heap allocation) and to provide an access to the matrix through methods only (e.g. GetCell(row, col) etc.), without exposing raw pointers.
Measuring the speed of operations on an 8 x 8 array is largely pointless. For a data set as small as that, the cost of the operation will be close to zero and you are mostly measuring setup time, etc.
Timings become important for larger data sets, but you cannot sensibly extrapolate the small set results to the larger set. With larger data sets you will often find that the data exists on multiple memory pages. There is a danger that paging costs will dominate other costs. Very large improvements in efficiency are possible by ensuring that your algorithm processes all (or most) of the data on one page before moving to the next page, rather than constantly swapping pages.
In general, you are best to use the simplest data structures with the least liklihood of programming error and optimising processing algorithms. I say "in general" as extreme cases do exist where small differences in access time matter, but they are rare.
Use a single array to represent the matrix instead of allocating for each index.
I've written a class for this already. Feel free to use it:
#include <vector>
template<typename T, typename Allocator = std::allocator<T>>
class DimArray
{
private:
int Width, Height;
std::vector<T, Allocator> Data;
public:
DimArray(int Width, int Height);
DimArray(T* Data, int Width, int Height);
DimArray(T** Data, int Width, int Height);
virtual ~DimArray() {}
DimArray(const DimArray &da);
DimArray(DimArray &&da);
inline std::size_t size() {return Data.size();}
inline std::size_t size() const {return Data.size();}
inline int width() {return Width;}
inline int width() const {return Width;}
inline int height() {return Height;}
inline int height() const {return Height;}
inline T* operator [](const int Index) {return Data.data() + Height * Index;}
inline const T* operator [](const int Index) const {return Data.data() + Height * Index;}
inline DimArray& operator = (DimArray da);
};
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(int Width, int Height) : Width(Width), Height(Height), Data(Width * Height, 0) {}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(T* Data, int Width, int Height) : Width(Width), Height(Height), Data(Width * Height, 0) {std::copy(&Data[0], &Data[0] + Width * Height, const_cast<T*>(this->Data.data()));}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(T** Data, int Width, int Height) : Width(Width), Height(Height), Data(Width * Height, 0) {std::copy(Data[0], Data[0] + Width * Height, const_cast<T*>(this->Data.data()));}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(const DimArray &da) : Width(da.Width), Height(da.Height), Data(da.Data) {}
template<typename T, typename Allocator>
DimArray<T, Allocator>::DimArray(DimArray &&da) : Width(std::move(da.Width)), Height(std::move(da.Height)), Data(std::move(da.Data)) {}
template<typename T, typename Allocator>
DimArray<T, Allocator>& DimArray<T, Allocator>::operator = (DimArray<T, Allocator> da)
{
this->Width = da.Width;
this->Height = da.Height;
this->Data.swap(da.Data);
return *this;
}
Usage:
int main()
{
DimArray<int> Matrix(1000, 1000); //creates a 1000 * 1000 matrix.
Matrix[0][0] = 100; //ability to index it like a multi-dimensional array.
}
More usage:
template<typename T, std::size_t size>
class uninitialised_stack_allocator : public std::allocator<T>
{
private:
alignas(16) T data[size];
public:
typedef typename std::allocator<T>::pointer pointer;
typedef typename std::allocator<T>::size_type size_type;
typedef typename std::allocator<T>::value_type value_type;
template<typename U>
struct rebind {typedef uninitialised_stack_allocator<U, size> other;};
pointer allocate(size_type n, const void* hint = 0) {return static_cast<pointer>(&data[0]);}
void deallocate(void* ptr, size_type n) {}
size_type max_size() const {return size;}
};
int main()
{
DimArray<int, uninitialised_stack_allocator<int, 1000 * 1000>> Matrix(1000, 1000);
}