I have a 2d boost matrix (boost::numeric::ublas::matrix) of shape (n,m), with the first column being the timestamp. However, the data I'm getting is out of order. How can I sort it with respect to the first column, and what would be the most efficient way to do so? Speed is critical in this particular application.
As I commented ublas::matrix might not be the most natural choice for a task like this. Trying the naive approach using matrix_row and some range magic:
Live on Coliru
#define _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS
#include <boost/numeric/ublas/io.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/matrix_proxy.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/irange.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
using namespace boost::adaptors;
using Matrix = boost::numeric::ublas::matrix<float>;
using Row = boost::numeric::ublas::matrix_row<Matrix>;
static auto by_col0 = [](Row const& a, Row const& b) { return a(0) < b(0); };
int main()
{
constexpr int nrows = 3, ncols = 4;
Matrix m(nrows, ncols);
for (unsigned i = 0; i < m.size1(); ++i)
for (unsigned j = 0; j < m.size2(); ++j)
m(i, j) = (10 - 3.f * i) + j;
std::cout << "before: " << m << "\n";
auto getrow = [&](int i) { return Row(m, i); };
sort(boost::irange(nrows) | transformed(getrow), by_col0);
std::cout << "after: " << m << "\n";
}
Does sadly confirm that the abstraction of the proxy doesn't hold:
before: [3,4]((10,11,12,13),(7,8,9,10),(4,5,6,7))
after: [3,4]((10,11,12,13),(10,11,12,13),(10,11,12,13))|
Oops.
Analysis?
I can't say I know what's wrong. std::sort is defined in terms of ValueSwappable which at first glance seems to work fine for matrix_row:
auto r0 = Row(m, 0);
auto r1 = Row(m, 1);
using std::swap;
swap(r0, r1);
Prints Live On Coliru
Maybe this starting point gives you something helpful. Since it's tricky like this, I'd highly consider using another data structure that is more conducive to your task (boost::multi_array[_ref] comes to mind).
I have 5 vectors. I want to check how many times these vectors exist. I used the following code to compare if 2 vectors are equal, but now I have more than 2 vectors. I want to compare all these 5 vectors together and count how many times each vector exists.
How can I do it?
The output should be:
(0,0,1,2,3,0,0,0) = 2 time(s)
(0,0,1,2,3,4,0,0) = 1 time(s)
(0,0,2,4,3,0,0,0) = 1 time(s)
(0,0,6,2,3,5,6,0) = 1 time(s)
Here is my code:
#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;
void checkVec(vector<int> v){
vector<int> v0;
if(v0 == v){
cout << "Equal\n";
}
else{
cout << "Not Equal\n";
}
}
int main(){
vector<int> v1={0,0,1,2,3,0,0,0};
vector<int> v2={0,0,1,2,3,4,0,0};
vector<int> v3={0,0,2,4,3,0,0,0};
vector<int> v4={0,0,1,2,3,0,0,0};
vector<int> v5={0,0,6,2,3,5,6,0};
checkVec(v1);
return 0;
}
You can use std::map counting the number of occurences of each vector:
#include <map>
#include <vector>
#include <iostream>
using vec = std::vector<int>;
int main(){
vec v1={0,0,1,2,3,0,0,0};
vec v2={0,0,1,2,3,4,0,0};
vec v3={0,0,2,4,3,0,0,0};
vec v4={0,0,1,2,3,0,0,0};
vec v5={0,0,6,2,3,5,6,0};
std::map<vec,std::size_t> counter;
// Initializer list creates copies by default
// But you should not create vX variables anyway.
for(const auto& v: {v1,v2,v3,v4,v5}){
++counter[v];
}
std::cout<<"V1 is present " <<counter[v1]<<" times.\n";
return 0;
}
V1 is present 2 times.
Well, this is a contribution for Quimby answer, but if you know how many vectors you will get at compile time, use std::array to contain all that vector. If you know it at runtime instead, use std::vector as shown below
#include <map>
#include <vector>
#include <iostream>
#include <cstddef>
int main(){
std::vector<std::vector<int>> allVector
{
std::vector<int>{0,0,1,2,3,0,0,0},
std::vector<int>{0,0,1,2,3,4,0,0},
std::vector<int>{0,0,2,4,3,0,0,0},
std::vector<int>{0,0,1,2,3,0,0,0},
std::vector<int>{0,0,6,2,3,5,6,0},
};
std::map<std::vector<int>, std::size_t> counter;
for(const auto& v : allVector)
{
++counter[v];
}
// print out the array and it's frequency
for(const auto& pr : counter)
{
std::cout << '(';
for(std::size_t i {0}; i < pr.first.size(); ++i)
{
std::cout << pr.first[i];
if(i != pr.first.size() - 1)
std::cout << ' ';
}
std::cout << ") = " << pr.second << ", ";
}
return 0;
}
Consider the following code snippet:
#include <iostream>
#include <ctime>
#include <vector>
#include <list>
using namespace std;
#define NUM_ITER 100000
int main() {
clock_t t = clock();
std::list< int > my_list;
std::vector< std::list< int >::iterator > list_ptr;
list_ptr.reserve(NUM_ITER);
for(int i = 0; i < NUM_ITER; ++i) {
my_list.push_back(0);
list_ptr.push_back(--(my_list.end()));
}
while(my_list.size() > 0) {
my_list.erase(list_ptr[list_ptr.size()-1]);
list_ptr.pop_back();
}
cout << "Done in: " << 1000*(clock()-t)/CLOCKS_PER_SEC << " msec!" << endl;
}
When I compile and run it with visual studio, all optimizations enabled, I get the output:
Done in: 8 msec!
When I compile and run it with g++, using the flags
g++ main.cpp -pedantic -O2
I get the output
Done in: 7349 msec!
Which is rougly 1000 times slower. Why is that? According to the "cppreference" calling erase on a list is supposed to use up only constant time.
The code was compiled and executed on the same machine.
It might be that the implementation shipped by GCC doesn't store the size, and the one MSVC ships does. In this case the inner loop is O(n^2) with GCC, O(n) for MSVC.
Anyway, C++11 mandates that list::size is constant time, you may want to report this as a bug.
UPDATE Workaround:
You can avoid calling size() so many times:
size_t my_list_size = my_list.size();
while(my_list_size > 0) {
accum += *list_ptr[list_ptr.size()-1];
my_list.erase(list_ptr[list_ptr.size()-1]);
--my_list_size;
list_ptr.pop_back();
}
Now it reports 10 msec.
EDIT
Their list implementation isn't as efficient. I tried by replacing with:
#include <iostream>
#include <ctime>
#include <boost/container/vector.hpp>
#include <boost/container/list.hpp>
using namespace std;
#define NUM_ITER 100000
int main() {
clock_t t = clock();
boost::container::list< int > my_list;
boost::container::vector< boost::container::list< int >::iterator > list_ptr;
list_ptr.reserve(NUM_ITER);
for(int i = 0; i < NUM_ITER; ++i) {
my_list.push_back(rand());
list_ptr.push_back(--(my_list.end()));
}
unsigned long long volatile accum = 0;
while(my_list.size() > 0) {
accum += *list_ptr[list_ptr.size()-1];
my_list.erase(list_ptr[list_ptr.size()-1]);
list_ptr.pop_back();
}
cout << "Done in: " << 1000*(clock()-t)/CLOCKS_PER_SEC << " msec!" << endl;
cout << "Accumulated: " << accum << "\n";
}
This now runs in ~0ms on my machine, vs. ~7s using std::list on the same machine.
sehe#desktop:/tmp$ ./test
Done in: 0 msec!
Accumulated: 107345864261546