How can I sort a Boost matrix by column? - c++

I have a 2d boost matrix (boost::numeric::ublas::matrix) of shape (n,m), with the first column being the timestamp. However, the data I'm getting is out of order. How can I sort it with respect to the first column, and what would be the most efficient way to do so? Speed is critical in this particular application.

As I commented ublas::matrix might not be the most natural choice for a task like this. Trying the naive approach using matrix_row and some range magic:
Live on Coliru
#define _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS
#include <boost/numeric/ublas/io.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/matrix_proxy.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/irange.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
using namespace boost::adaptors;
using Matrix = boost::numeric::ublas::matrix<float>;
using Row = boost::numeric::ublas::matrix_row<Matrix>;
static auto by_col0 = [](Row const& a, Row const& b) { return a(0) < b(0); };
int main()
{
constexpr int nrows = 3, ncols = 4;
Matrix m(nrows, ncols);
for (unsigned i = 0; i < m.size1(); ++i)
for (unsigned j = 0; j < m.size2(); ++j)
m(i, j) = (10 - 3.f * i) + j;
std::cout << "before: " << m << "\n";
auto getrow = [&](int i) { return Row(m, i); };
sort(boost::irange(nrows) | transformed(getrow), by_col0);
std::cout << "after: " << m << "\n";
}
Does sadly confirm that the abstraction of the proxy doesn't hold:
before: [3,4]((10,11,12,13),(7,8,9,10),(4,5,6,7))
after: [3,4]((10,11,12,13),(10,11,12,13),(10,11,12,13))|
Oops.
Analysis?
I can't say I know what's wrong. std::sort is defined in terms of ValueSwappable which at first glance seems to work fine for matrix_row:
auto r0 = Row(m, 0);
auto r1 = Row(m, 1);
using std::swap;
swap(r0, r1);
Prints Live On Coliru
Maybe this starting point gives you something helpful. Since it's tricky like this, I'd highly consider using another data structure that is more conducive to your task (boost::multi_array[_ref] comes to mind).

Related

c++ - Fill a symmetric matrix using an array stored on the heap

I am trying to build a code where I have to declare a large array in the heap.
At the same time I will use the boost library to perform some matrix calculations (as can be seen in Fill a symmetric matrix using an array
).
My limitations here are two : I will deal with large arrays and matrices so I have to declare everything on the heap and I have to work with arrays and not with vectors.
However I am facing a rather trivial for many people problem... When filling the matrix, the last element doesn't get filled in correctly. So although I expect to get
[3,3]((0,1,3),(1,2,4),(3,4,5))
the output of the code is
[3,3]((0,1,3),(1,2,4),(3,4,2.6681e-315))
I am compiling this code in ROOT6. I don't think it's related to that, I am just mentioning it for completion.
A small sample of the code follows
#include <iterator>
#include <iostream>
#include <fstream>
#include </usr/include/boost/numeric/ublas/matrix.hpp>
#include </usr/include/boost/numeric/ublas/matrix_sparse.hpp>
#include </usr/include/boost/numeric/ublas/symmetric.hpp>
#include </usr/include/boost/numeric/ublas/io.hpp>
using namespace std;
int test_boost () {
using namespace boost::numeric::ublas;
symmetric_matrix<double, upper> m_sym1 (3, 3);
float* filler = new float[6];
for (int i = 0; i<6; ++i) filler[i] = i;
float const* in1 = filler;
for (size_t i = 0; i < m_sym1.size1(); ++ i)
for (size_t j = 0; j <= i && in1 != &filler[5]; ++ j)
m_sym1 (i, j) = *in1++;
delete[] filler;
std::cout << m_sym1 << std::endl;
return 0;
}
Any idea on how to solve that?
Arrays and pointers are not objects of class type, they don't have members. You already have a float *, it is filler.
float const* in1 = filler; // adding const is always allowed
I've manged to finally solve it by changing &filler[5] to &filler[6].
So a version that works is seen below
#include <iterator>
#include <iostream>
#include <fstream>
#include </usr/include/boost/numeric/ublas/matrix.hpp>
#include </usr/include/boost/numeric/ublas/matrix_sparse.hpp>
#include </usr/include/boost/numeric/ublas/symmetric.hpp>
#include </usr/include/boost/numeric/ublas/io.hpp>
using namespace std;
int test_boost () {
using namespace boost::numeric::ublas;
symmetric_matrix<double, upper> m_sym1 (3, 3);
float* filler = new float[6];
for (int i = 0; i<6; ++i) filler[i] = i;
float const* in1 = filler;
for (size_t i = 0; i < m_sym1.size1(); ++ i)
for (size_t j = 0; j <= i && in1 != &filler[6]; ++ j)
m_sym1 (i, j) = *in1++;
delete[] filler;
std::cout << m_sym1 << std::endl;
return 0;
}
Running this code yields the following output
[3,3]((0,1,3),(1,2,4),(3,4,5))

merging a collection of `Eigen::VectorXd`s into one large `Eigen::VectorXd`

If you go to this Eigen page, you'll see you can initialize VectorXd objects with the << operator. You can also dump a few vector objects into one big VectorXd object (e.g. look at the third example in the section called "The comma initializer").
I want to dump a few vectors into a big vector, but I'm having a hard time writing code that will work for an arbitrarily sized collection of vectors. The following doesn't work, and I'm having a hard time writing it in a way that does (that isn't a double for loop). Any suggestions?
#include <iostream>
#include <Eigen/Dense>
#include <vector>
int main(int argc, char **argv)
{
// make some random VectorXds
std::vector<Eigen::VectorXd> vOfV;
Eigen::VectorXd first(3);
Eigen::VectorXd second(4);
first << 1,2,3;
second << 4,5,6,7;
vOfV.push_back(first);
vOfV.push_back(second);
// here is the problem
Eigen::VectorXd flattened(7);
for(int i = 0; i < vOfV.size(); ++i)
flattened << vOfV[i];
//shows that this doesn't work
for(int i = 0; i < 7; ++i)
std::cout << flattened(i) << "\n";
return 0;
}
The comma initializer does not work like that. You have to fully initialize the matrix from that. Instead, allocate a large enough vector and iterate and assign the blocks.
#include <iostream>
#include <vector>
#include <Eigen/Dense>
// http://eigen.tuxfamily.org/dox/group__TopicStlContainers.html
#include <Eigen/StdVector>
EIGEN_DEFINE_STL_VECTOR_SPECIALIZATION(Eigen::VectorXd)
int main()
{
// make some random VectorXds
std::vector<Eigen::VectorXd> vOfV;
Eigen::VectorXd first(3);
Eigen::VectorXd second(4);
first << 1,2,3;
second << 4,5,6,7;
vOfV.push_back(first);
vOfV.push_back(second);
int len = 0;
for (auto const &v : vOfV)
len += v.size();
Eigen::VectorXd flattened(len);
int offset = 0;
for (auto const &v : vOfV)
{
flattened.middleRows(offset,v.size()) = v;
offset += v.size();
}
std::cout << flattened << "\n";
}

Any efficient building function in C++/C to fast uniformly sample b entries without replacement from n entries?

It seems that, using shuffle(Index, Index+n, g) before getting the first b entries is still not efficient when n is very big but b is very small, where Index is a vector/array storing 0 ... (n-1).
You can take the standard shuffle algorithm and modify it to stop after shuffling just the first b entries.
Here is a different idea than what sh1 suggests.
Unlike a previous answer it is mathematically sound. No % BS.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <ostream>
#include <random>
#include <set>
#include <vector>
int main()
{
// Configuration.
auto constexpr n = 100;
auto constexpr b = 10;
// Building blocks for randomness.
auto const seed = std::mt19937::result_type{ std::random_device{}() };
std::cout << "seed = " << seed << std::endl;
auto engine = std::mt19937{ seed };
auto distribution = std::uniform_int_distribution<>{};
// Creating the data source. Not part of the solution.
auto reservoir = std::vector<int>{};
std::generate_n(std::back_inserter(reservoir), n,
[i = 0]() mutable { return i++; });
// Creating the sample.
// Idea attributed to Bob Floyd by Jon Bentley in Programming Pearls 2nd Edition.
auto sample = std::set<int>{};
for (auto i = n - b; i != n; ++i)
{
auto const param = std::uniform_int_distribution<>::param_type(0, i);
auto const j = distribution(engine, param);
(sample.find(j) == sample.cend())
? sample.insert(j)
: sample.insert(i);
}
// Converting the sample to an output vector and shuffle it, if necessary.
auto output = std::vector<int>(std::cbegin(sample), std::cend(sample));
std::shuffle(std::begin(output), std::end(output), engine);
// Print out the result.
for (auto const x : output) { std::cout << x << " "; }
std::cout << std::endl;
return 0;
}

How do I sum all elements in a ublas matrix?

According to this page there should be a sum function provided in ublas, but I can't get the following to compile:
boost::numeric::ublas::matrix<double> mymatrix;
std::cout << boost::numeric::ublas::sum(mymatrix);
error is:
testcpp:146:144: error: no matching function for call to
‘sum(boost::numeric::ublas::matrix&)’
I'm #includeing:
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/matrix_proxy.hpp>
Am I missing an include, or did I misunderstand the docs? How would I achieve this (I'm trying to sum up all elements of a matrix and produce a single double)?
As pointed out in comments, sum only applies to vectors (see documentation)
You could certainly get at m.data() and sum the values that way, but you are using a linear algebra library! Multiply a row vector of 1's by your matrix, and sum the result:
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>
namespace bls = boost::numeric::ublas;
int main()
{
bls::matrix<double> m(3, 3);
for (unsigned i = 0; i < m.size1(); ++i)
for (unsigned j = 0; j < m.size2(); ++j)
m(i, j) = 3 * i + j;
std::cout << "Sum of all elements of " << m << " is "
<< sum(prod(bls::scalar_vector<double>(m.size1()), m)) << '\n';
}
A more reusable approach would be to define a sum that takes a matrix_expression, as the shark library did.

boost zip_iterator and std::sort

I have two arrays values and keys both of the same length.
I want to sort-by-key the values array using the keys array as keys.
I have been told the boost's zip iterator is just the right tool for locking two arrays together and doing stuff to them at the same time.
Here is my attempt at using the boost::zip_iterator to solve sorting problem which fails to compile with gcc. Can someone help me fix this code?
The problem lies in the line
std::sort ( boost::make_zip_iterator( keys, values ), boost::make_zip_iterator( keys+N , values+N ));
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include <vector>
#include <algorithm>
#include <boost/iterator/zip_iterator.hpp>
#include <boost/tuple/tuple.hpp>
#include <boost/tuple/tuple_comparison.hpp>
int main(int argc, char *argv[])
{
int N=10;
int keys[N];
double values[N];
int M=100;
//Create the vectors.
for (int i = 0; i < N; ++i)
{
keys[i] = rand()%M;
values[i] = 1.0*rand()/RAND_MAX;
}
//Now we use the boost zip iterator to zip the two vectors and sort them "simulatneously"
//I want to sort-by-key the keys and values arrays
std::sort ( boost::make_zip_iterator( keys, values ),
boost::make_zip_iterator( keys+N , values+N )
);
//The values array and the corresponding keys in ascending order.
for (int i = 0; i < N; ++i)
{
std::cout << keys[i] << "\t" << values[i] << std::endl;
}
return 0;
}
NOTE:Error message on compilation
g++ -g -Wall boost_test.cpp
boost_test.cpp: In function ‘int main(int, char**)’:
boost_test.cpp:37:56: error: no matching function for call to ‘make_zip_iterator(int [(((unsigned int)(((int)N) + -0x00000000000000001)) + 1)], double [(((unsigned int)(((int)N) + -0x00000000000000001)) + 1)])’
boost_test.cpp:38:64: error: no matching function for call to ‘make_zip_iterator(int*, double*)’
You can't sort a pair of zip_iterators.
Firstly, make_zip_iterator takes a tuple of iterators as input, so you could call:
boost::make_zip_iterator(boost::make_tuple( ... ))
but that won't compile either, because keys and keys+N doesn't have the same type. We need to force keys to become a pointer:
std::sort(boost::make_zip_iterator(boost::make_tuple(+keys, +values)),
boost::make_zip_iterator(boost::make_tuple(keys+N, values+N)));
this will compile, but the sorted result is still wrong, because a zip_iterator only models a Readable iterator, but std::sort also needs the input to be Writable as described here, so you can't sort using zip_iterator.
A very good discussion of this problem can be found here: https://web.archive.org/web/20120422174751/http://www.stanford.edu/~dgleich/notebook/2006/03/sorting_two_arrays_simultaneou.html
Here's a possible duplicate of this question: Sorting zipped (locked) containers in C++ using boost or the STL
The approach in the link above uses std::sort, and no extra space. It doesn't employ boost::zip_iterator, just boost tuples and the boost iterator facade. Std::tuples should also work if you have an up to date compiler.
If you are happy to have one extra vector (of size_t elements), then the following approach will work in ~ o(n log n) time average case. It's fairly simple, but there will be better approaches out there if you search for them.
#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>
using namespace std;
template <typename T1, typename T2>
void sortByPerm(vector<T1>& list1, vector<T2>& list2) {
const auto len = list1.size();
if (!len || len != list2.size()) throw;
// create permutation vector
vector<size_t> perms;
for (size_t i = 0; i < len; i++) perms.push_back(i);
sort(perms.begin(), perms.end(), [&](T1 a, T1 b){ return list1[a] < list1[b]; });
// order input vectors by permutation
for (size_t i = 0; i < len - 1; i++) {
swap(list1[i], list1[perms[i]]);
swap(list2[i], list2[perms[i]]);
// adjust permutation vector if required
if (i < perms[i]) {
auto d = distance(perms.begin(), find(perms.begin() + i, perms.end(), i));
swap(perms[i], perms[d]);
}
}
}
int main() {
vector<int> ints = {32, 12, 40, 8, 9, 15};
vector<double> doubles = {55.1, 33.3, 66.1, 11.1, 22.1, 44.1};
sortByPerm(ints, doubles);
copy(ints.begin(), ints.end(), ostream_iterator<int>(cout, " ")); cout << endl;
copy(doubles.begin(), doubles.end(), ostream_iterator<double>(cout, " ")); cout << endl;
}
After seeing another of your comments in another answer.
I though I would enlighten you to the std::map. This is a key value container, that preserves key order. (it is basically a binary tree, usually red black tree, but that isn't important).
size_t elements=10;
std::map<int, double> map_;
for (size_t i = 0; i < 10; ++i)
{
map_[rand()%M]=1.0*rand()/RAND_MAX;
}
//for every element in map, if you have C++11 this can be much cleaner
for (std::map<int,double>::const_iterator it=map_.begin();
it!=map_.end(); ++it)
{
std::cout << it->first << "\t" << it->second << std::endl;
}
untested, but any error should be simple syntax errors
boost::make_zip_iterator take a boost::tuple.
#include <boost/iterator/zip_iterator.hpp>
#include <boost/tuple/tuple.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include <vector>
#include <algorithm>
int main(int argc, char *argv[])
{
std::vector<int> keys(10); //lets not waste time with arrays
std::vector<double> values(10);
const int M=100;
//Create the vectors.
for (size_t i = 0; i < values.size(); ++i)
{
keys[i] = rand()%M;
values[i] = 1.0*rand()/RAND_MAX;
}
//Now we use the boost zip iterator to zip the two vectors and sort them "simulatneously"
//I want to sort-by-key the keys and values arrays
std::sort ( boost::make_zip_iterator(
boost::make_tuple(keys.begin(), values.begin())),
boost::make_zip_iterator(
boost::make_tuple(keys.end(), values.end()))
);
//The values array and the corresponding keys in ascending order.
for (size_t i = 0; i < values.size(); ++i)
{
std::cout << keys[i] << "\t" << values[i] << std::endl;
}
return 0;
}