Implementing an iterator over combinations of many vectors - c++

I am working on a problem that requires iterating over all combinations of elements of K vectors taken one at a time. So for example for K=2 vectors v1 = [0 1] and v2 = [3 4], I would iterate over (0,3), (0,4), (1,3), (1,4).
Since K is determined at run-time, I cannot use explicit for loops. My current approach is based on this solution that implements an "odometer" incrementing an index for each vector.
#include <vector>
#include <iostream>
int main(int argc, char * argv[])
{
std::vector<int> v1( {1, 2, 3} );
std::vector<int> v2( {-2, 5} );
std::vector<int> v3( {0, 1, 2} );
std::vector<std::vector<int> > vv( {v1, v2 ,v3} );
// Iterate combinations of elems in v1, v2, v3, one at a time
std::vector<std::vector<int>::iterator> vit;
for (auto& v : vv)
vit.push_back(v.begin());
int K = vv.size();
while (vit[0] != vv[0].end())
{
std::cout << "Processing combination: [";
for (auto& i : vit)
std::cout << *i << " ";
std::cout << "]\n";
// increment "odometer" by 1
++vit[K-1];
for (int i = K-1; (i > 0) && (vit[i] == vv[i].end()); --i)
{
vit[i] = vv[i].begin();
++vit[i-1];
}
}
return 0;
}
Output:
Processing combination: [1 -2 0 ]
Processing combination: [1 -2 1 ]
Processing combination: [1 -2 2 ]
Processing combination: [1 5 0 ]
Processing combination: [1 5 1 ]
Processing combination: [1 5 2 ]
Processing combination: [2 -2 0 ]
Processing combination: [2 -2 1 ]
Processing combination: [2 -2 2 ]
Processing combination: [2 5 0 ]
Processing combination: [2 5 1 ]
Processing combination: [2 5 2 ]
Processing combination: [3 -2 0 ]
Processing combination: [3 -2 1 ]
Processing combination: [3 -2 2 ]
Processing combination: [3 5 0 ]
Processing combination: [3 5 1 ]
Processing combination: [3 5 2 ]
However, this is somewhat messy and requires a lot of boilerplate code that I'd rather move elsewhere for clarity. Ideally I would like to have a custom iterator class, say my_combination_iterator, that would allow me to do things much cleaner, e.g.:
for (my_combination_iterator it = vv.begin(); it != vv.end(); ++it)
// process combination
So far, I have looked at Boost iterator_facade. But my case seems more complicated than the one in the tutorial since I would need an iterator over a vector of Values as opposed to a single value type to define the required operators for the custom iterator.
How could such an iterator be implemented?

Why would you like to use custom iterators?
One could instead implement a very simple class that will iterate through all combinations:
class Combinator
{
public:
Combinator(std::vector<std::vector<int> >& vectors)
: m_vectors(vectors)
{
m_combination.reserve(m_vectors.size());
for(auto& v : m_vectors)
m_combination.push_back(v.begin());
}
bool next()
{
// iterate through vectors in reverse order
for(long long i = m_vectors.size() - 1; i >= 0; --i)
{
std::vector<int>& v = m_vectors[i];
std::vector<int>::iterator& it = m_combination[i];
if(++it != v.end())
return true;
it = v.begin();
}
return false;
}
std::vector<std::vector<int>::iterator> combination() const
{
return m_combination;
}
private:
std::vector<std::vector<int> >& m_vectors; // reference to data
std::vector<std::vector<int>::iterator> m_combination;
};
Live Demo
UPDATE:
If you would still like to use iterators, I suggest iterating over combinations. One can put all the combinations from Combinator into a container and then work with container's own iterators. In my opinion it's a cleaner solution. The only drawback is the extra-memory needed to store all combinations explicitly.

Related

Pybind11 memory appears to be changed between calls

I am using pybind11 to convert scipy's csr_matrix into a C++ object through pybind11. To this end, I've defined a Vector class, which is converted from the numpy data, indices, and indptr fields.
template<typename T>
class Vector {
public:
T *data;
const std::array<ssize_t, 1> shape;
Vector() = delete;
Vector(T *data, const std::array<ssize_t, 1> shape) : data(data), shape(shape) {};
};
I later register several instantiations of the class using a template function which does this.
py::class_<Vector<T>>(m, "...Vector", py::buffer_protocol())
.def("__init__", [](Vector<T> &v, py::array_t<T, py::array::c_style | py::array::forcecast> data) {
py::buffer_info info = data.request();
if (info.ndim != 1) throw std::invalid_argument("must be a 1d array!");
std::array<ssize_t, 1> shape_ = {info.shape[0]};
new(&v) Vector<T>(static_cast<T *>(info.ptr), shape_);
})
.def_buffer([](Vector<T> &v) -> py::buffer_info {
return py::buffer_info(
v.data, sizeof(T), py::format_descriptor<T>::format(), 1, v.shape, {sizeof(T)}
);
});
So the vector class works fine, however, I then define a csr_matrix class like so
template<typename T>
class csr_matrix {
public:
Vector<T> data;
Vector<ssize_t> indices;
Vector<ssize_t> indptr;
const std::array<ssize_t, 2> shape;
csr_matrix() = delete;
csr_matrix(Vector<T>& data, Vector<ssize_t>& indices, Vector<ssize_t>& indptr, const std::array<ssize_t, 2>& shape)
: data(data), indices(indices), indptr(indptr), shape(shape) {}
};
which is then registered in the same way as Vector using templates, so I can register csr_matrices for floats, doubles, and ints.
py::class_<csr_matrix<T>>(m, "...csr_matrix"))
.def("__init__", [](
csr_matrix<T> &matrix,
py::array_t<T, py::array::c_style> data,
py::array_t<ssize_t, py::array::c_style> indices,
py::array_t<ssize_t, py::array::c_style> indptr,
py::array_t<ssize_t, py::array::c_style | py::array::forcecast> shape
) {
py::buffer_info data_info = data.request();
py::buffer_info indices_info = indices.request();
py::buffer_info indptr_info = indptr.request();
// ... some validity checks
auto vec_data = new Vector<T>(static_cast<T *>(data_info.ptr), {data_info.shape[0]});
auto vec_indices = new Vector<ssize_t>(static_cast<ssize_t *>(indices_info.ptr), {indices_info.shape[0]});
auto vec_indptr = new Vector<ssize_t>(static_cast<ssize_t *>(indptr_info.ptr), {indptr_info.shape[0]});
std::array<ssize_t, 2> shape_ = {*shape.data(0), *shape.data(1)};
new(&matrix) csr_matrix<T>(*vec_data, *vec_indices, *vec_indptr, shape_);
})
.def_readonly("data", &csr_matrix<T>::data)
.def_readonly("indices", &csr_matrix<T>::indices)
.def_readonly("indptr", &csr_matrix<T>::indptr);
Now, I write a simple unit test in Python to make sure everything is working properly and I get the most baffling error
x = sp.csr_matrix([
[0, 0, 1, 2],
[0, 0, 0, 0],
[3, 0, 4, 0],
], dtype=np.float32)
cx = matrix.Float32CSRMatrix(x.data, x.indices, x.indptr, x.shape)
np.testing.assert_equal(x.data, np.asarray(cx.data, dtype=np.float32))
np.testing.assert_equal(x.indices, np.asarray(cx.indices, dtype=np.uint64))
np.testing.assert_equal(x.indptr, np.asarray(cx.indptr, dtype=np.uint64)) # fails here!
I throw a couple of prints after each line and this is the output
print(1, np.asarray(cx.data, dtype=np.float32), np.asarray(cx.indices, dtype=np.uint64), np.asarray(cx.indptr, dtype=np.uint64))
# data indices indptr
- [1. 2. 3. 4.] [2 3 0 2] [0 2 2 4] # original, python object
0 [1. 2. 3. 4.] [2 3 0 2] [0 2 2 4] # after matrix.Float32CSRMatrix(...)
1 [1. 2. 3. 4.] [2 3 0 2] [0 2 2 4] # after assert on data
2 [1. 2. 3. 4.] [2 3 0 2] [2 3 0 2] # after assert on indices
# Fails at assert on indptr!
So somewhere, something changed the values of indptr into indices, and I have no clue what and where. What's even more baffling is that if I change the order of the asserts so that indptr is checked before indices, like this
np.testing.assert_equal(x.data, np.asarray(cx.data, dtype=np.float32))
np.testing.assert_equal(x.indptr, np.asarray(cx.indptr, dtype=np.uint64))
np.testing.assert_equal(x.indices, np.asarray(cx.indices, dtype=np.uint64)) # fails here!
then this is the output
- [1. 2. 3. 4.] [2 3 0 2] [0 2 2 4]
0 [1. 2. 3. 4.] [2 3 0 2] [0 2 2 4]
1 [1. 2. 3. 4.] [2 3 0 2] [0 2 2 4]
2 [1. 2. 3. 4.] [0 2 2 4] [0 2 2 4]
# Now fails at assert indices; we do the assert on indptr before, and it passes
So now, it's indices that are being overridden with indptr, and not the other way around. I've been banging my head against this wall for well over a day now and I have no clue what's going on. Object lifetime is not an issue, vectors are constructed at the start and destructed when the csr_matrix goes away. And I have pasted here all the relevant code, and I am not doing anything that might cause this.
Any and all help would be greatly appreciated.

Extend/Pad matrix in Eigen

Say I have
A = [1 2 3]
[4 5 6]
[7 8 9]
I want to pad it with the first row and first column or last row and last column as many times as needed to create A nxn. For example, A 4x4 would be
A = [1 1 2 3]
[1 1 2 3]
[4 4 5 6]
[7 7 8 9]
and A 5x5 would be
A = [1 1 2 3 3]
[1 1 2 3 3]
[4 4 5 6 6]
[7 7 8 9 9]
[7 7 8 9 9]
I'm aware that I could do A.conservativeResize(4,4) which gets me
A = [1 2 3 0]
[4 5 6 0]
[7 8 9 0]
[0 0 0 0]
then I could copy things around one by one, but is there a more efficient way to do this using Eigen?
You can workaround using a nullary-expression:
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
using namespace std;
int main()
{
Matrix3i A;
A.reshaped() = VectorXi::LinSpaced(9,1,9);
cout << A << "\n\n";
int N = 5;
MatrixXi B(N,N);
B = MatrixXi::NullaryExpr(N, N, [&A,N] (Index i,Index j) {
return A( std::max<Index>(0,i-(N-A.rows())),
std::max<Index>(0,j-(N-A.cols())) ); } );
cout << B << "\n\n";
}
Another approach would be to create a clamped sequence of indices like [0 0 0 1 2]:
struct pad {
Index size() const { return m_out_size; }
Index operator[] (Index i) const { return std::max<Index>(0,i-(m_out_size-m_in_size)); }
Index m_in_size, m_out_size;
};
B = A(pad{3,N}, pad{3,N});
This version requires the head of Eigen.
You can easily build on those examples to make them even more general and/or wrap them within functions.
Just as a note, it's not true that A.conservativeResize(4,4) will get you a matrix with the added rows filled with zeros. The Eigen documentation says,
In case values need to be appended to the matrix they will be uninitialized.
The new rows and columns will be filled with garbage, and seeing zeros is only a coincidence (unless you are compiling with a special preprocessor directive to Eigen). But this means that no unnecessary time is wasted writing zeros that you will overwrite anyway.
Note: this code demonstrates how to get a matrix with your original matrix in the top left corner:
The best way to fill multiple values at once is to use Eigen's block operations and setConstant. For example, if A is a matrix of size old_sizexold_size:
A.conservativeResize(n, n);
for (int i = 0; i < n; ++i) {
// Fill the end of each row and column
A.row(i).tail(n - old_size).setConstant(A(i, old_size - 1));
A.col(i).tail(n - old_size).setConstant(A(old_size - 1, i));
}
// Fill the bottom right block
A.bottomRightCorner(n - old_size, n - old_size).setConstant(A(old_size - 1, old_size - 1));
More importantly than being "efficient", these functions express your intent as a programmer.
Edit: To get a padded matrix with your original matrix in the middle:
I just noticed your example pads around the original matrix in the middle, not in the top left. In this case, there is little point to using conservativeResize(), because the original values will only be copied to the top left corner. An outline of the solution is:
Construct a new nxn matrix B of the desired size
Copy your original matrix to the middle using
int start = (n - old_size + 1)/2;
B.block(start, start, old_size, old_size) = A;
Fill in the outside values using block operations similar to my example above.

Sorting vector based on another sorted vector

I have got a vector1 of pair, sorted by an int key:
std::vector <pair <int,string> > vector1;
//6 5 1 8 4 2
Then I have another vector2 consisting of numbers contained in vector1:
std::vector <string> vector2;
//1 5 6 8
How do I sort vector2 with the same keys as in vector1? I want to get:
unsorted: 1 5 6 8
sorted: 6 5 1 8
unsorted: 6 5 1 2 4
sorted: 6 5 1 2 4
If vector2 consists of numbers that always appear in vector1, you can map numbers from vector1, for example vector1 is [3, 2, 4] and vector2 is [4, 3];
Map all elements to indexes for exampe 3->0, 2->1, 4->2
(keys are numbers and values are indexes). Use map or hashmap for
it.
Now loop through vector2, for each element search for it in the map and replace it with value from map: 4 becomes 2, 3 becomes 0 so now vector2 becomes [2, 0]
Now use sort(vector2.begin(), vector2.end()); vector2 becomes [0, 2]
Now loop through vector2 and for each element i replace it with vector1[i]:
0->3 (because number at the 0th index in vector1 is 3), 2->4 (because number at the 2nd index in vector1 is 4).
Hope this helps.
You might create a map of values, something like
void SortAccording(std::vector<int>& vec, const std::vector<std::pair<int, string>>& ref)
{
std::map<int, int> m;
int counter = 0;
for (auto& p : ref) {
m[p.first] = counter++;
}
std::sort(vec.begin(),
vec.end(),
[&](int lhs, int rhs) { return m.at(lhs) < m.at(rhs); });
}

How to detect all possible subsets from a given vector?

I'm writing a function which should detect all possible subsets from a main vector and push them to another vector. The elements in the subsets are also added to each other before being pushed into the new vector(s1).
At the moment what my code does is the following..
For example, lets say myvec = {1,2,3}, then v1 = {1,3,6,2,5,3}. It only sums consecutive numbers. However I also want it to sum up combinations like 1 & 3 which would add a 4 to the vector v1. At the moment, I have not been able to modify my algorithm in a way that I can achieve that. Any help will be appreciated!
for (k=0; k<myvec.size(); k++) {
total = 0;
for (m=k; m<int_vec.size(); m++) {
total += myvec[m];
v1.push_back(total);
}
}
One way to think about the power set of a given (ordered) set is to think of its elements (the subsets) as bit vectors where the n-th bit is set to 1 if and only if the n-th element from the set was chosen for this subset.
So in your example, you'd have a 3 bit vector that could be represented as an unsigned integer. You'd “count the bit vector” up from 0 (the empty set) to 7 (the entire set). Then, in each iteration, you pick those elements for which the respective bit is set.
As can be readily seen, the power set explodes rapidly which will make it impractical to compute explicitly for any set with more than a dozen or so elements.
Casting these thoughts into C++, we get the following.
#include <climits> // CHAR_BIT
#include <iostream> // std::cout, std::endl
#include <stdexcept> // std::invalid_argument
#include <type_traits> // std::is_arithmetic
#include <vector> // std::vector
template<typename T>
std::vector<T>
get_subset_sums(const std::vector<T>& elements)
{
static_assert(std::is_arithmetic<T>::value, "T must be arithmetic");
if (elements.size() > CHAR_BIT * sizeof(unsigned long))
throw std::invalid_argument {"too many elements"};
const std::size_t power_size {1UL << elements.size()};
std::vector<T> subset_sums {};
subset_sums.reserve(power_size);
for (unsigned long mask = 0UL; mask < power_size; ++mask)
{
T sum {};
for (std::size_t i = 0; i < elements.size(); ++i)
{
if (mask & (1UL << i))
sum += elements.at(i);
}
subset_sums.push_back(sum);
}
return subset_sums;
}
int
main()
{
std::vector<int> elements {1, 2, 3};
for (const int sum : get_subset_sums(elements))
std::cout << sum << std::endl;
return 0;
}
You might want to use a std::unordered_set for the subset-sums instead of a std::vector to save the space (and redundant further processing) for duplicates.
The program outputs the numbers 0 (the empty sum), 1 (= 1), 2 (= 2), 3 (= 1 + 2), 3 (= 3), 4 (= 1 + 3), 5 (= 2 + 3) and 6 (= 1 + 2 + 3). We can make this more visual.
mask mask
(decimal) (binary) subset sum
–––––––––––––––––––––––––––––––––––––––––––––––––
0 000 {} 0
1 001 {1} 1
2 010 {2} 2
3 011 {1, 2} 3
4 100 {3} 3
5 101 {1, 3} 4
6 110 {2, 3} 5
7 111 {1, 2, 3} 6

How can I sort a map according to a property of its values?

I've created a map having a vector as below:
map<int,vector<int>> mymap;
How can I sort this map according to the nth value of the vector contained by map?
You can't. You can provide a custom comparator to make the underlying data get sorted another way than the default, but this only relates to keys, not values. If you have a requirement for your container's elements to exist in some specific, value-defined order, then you're using the wrong container.
You can switch to a set, and take advantage of the fact that there is no distinction there between "key" and "value", and hack the underlying sorting yourself:
template <std::size_t N>
struct MyComparator
{
typedef std::pair<int, std::vector<int>> value_type;
bool operator()(const value_type& lhs, const value_type& rhs)
{
return lhs.second.at(N) < rhs.second.at(N);
}
};
/**
* A set of (int, int{2,}) pairs, sorted by the 2nd element in
* the 2nd item of each pair.
*/
std::set<std::pair<int, std::vector<int>>, MyComparator<1>> my_data;
int main()
{
my_data.insert(std::make_pair(1, std::vector<int>{0,5,0,0}));
my_data.insert(std::make_pair(2, std::vector<int>{0,2,0,0}));
my_data.insert(std::make_pair(3, std::vector<int>{0,1,0,0}));
my_data.insert(std::make_pair(4, std::vector<int>{0,9,0,0}));
for (const auto& el : my_data)
std::cout << el.first << ' ';
}
// Output: 3 2 1 4
(live demo)
However, if you still need to perform lookup on key as well, then you're really in trouble and need to rethink some things. You may need to duplicate your data or provide an indexing vector.
map<int,vector<int>> mymap;
How can i sort this map according to the nth value of the vector contained by map?
That's only possible if you're prepared to use that nth value as the integer key too, as in consistently assigning:
mymap[v[n - 1]] = v;
If you're doing that, you might consider a set<vector<int>>, which removes the redundant storage of that "key" element - you would then need to provide a custom comparison though....
If you envisage taking an existing populated map that doesn't have that ordering, then sorting its elements - that's totally impossible. You'll have to copy the elements out to another container, such as a set that's ordered on the nth element, or a vector that you std::sort after populating.
If I have understood correctly you can (build) add elements to the map the following way
std::vector<int> v = { 1, 2, 3 };
std::vector<int>::size_type n = 2;
mymap[v[n]] = v;
Here is an example
#include <iostream>
#include <vector>
#include <map>
#include <algorithm>
#include <cstdlib>
#include <ctime>
int main()
{
std::srand( ( unsigned )time( 0 ) );
const size_t N = 10;
std::map<int, std::vector<int>> m;
for ( size_t i = 0; i < N; i++ )
{
std::vector<int> v( N );
std::generate( v.begin(), v.end(), []{ return std::rand() % N; } );
m[v[0]] = v;
}
for ( auto &p : m )
{
for ( int x : p.second ) std::cout << x << ' ';
std::cout << std::endl;
}
return 0;
}
The output is
0 1 7 8 1 2 9 0 0 9
1 6 3 1 3 5 0 3 1 5
3 8 0 0 0 7 1 2 9 7
5 9 5 0 7 1 2 0 6 3
6 4 7 5 4 0 0 4 2 0
7 9 8 6 5 5 9 9 4 5
8 3 8 0 5 9 6 6 8 3
9 5 4 7 4 0 3 5 1 9
Take into account that as there can be duplicated vectors (that is that have the same value of the n-th element (in my example n is equal to 0) then some vectors will not be added to the map. If you want to have duplicates then you should use for example std::multimap
Also you can build a new map according to the criteria based on an existent map.
You can abuse the fact a c++ map uses a tree sorted by its keys. This means that you can either create a new map, with as keys the values you wish it to be sorted on, but you can also create a vector with references to the items in your map, and sort that vector (or the other way around: you could have a sorted vector, and use a map to create an index on your vector). Be sure to use a multimap in the case of duplicate keys.