C++ iterate over subvectors of size N - c++

I have an input vector which can be of any size. What I want is to divide this vector into vectors of size 64 each and do something. The input vector's size should not necessarily be of size multiple to 64.
So let's say I have a vector of size 200, then I should divide it into 3 vectors of size 64 and 1 vector of size 8.
What I thought of so far is the following:
vector<double> inputVector;
vector<vector<double>> resultVector;
UInt16 length = inputVector.size();
int div = (length % 64) == 0 ? length / 64 : (length / 64) + 1;
for (int i = 0, j = 0; i < div; i++) {
vector<double> current
for (int k = 0; k < 64; k++) {
current.push_back(inputVector[j]);
if (j++ >= length) break;
}
resultVector.push_back(current);
if (j >= length) break;
}
I am sure there would be a better way of doing so but I could't find any example

You can use iterators to create a subvector:
vector<double> inputVector;
vector<vector<double>> resultVector;
for (auto it = inputVector.cbegin(), e = inputVector.cend(); it != inputVector.cend(); it = e) {
e = it + std::min<std::size_t>(inputVector.cend() - it, 64);
resultVector.emplace_back(it, e);
}

The simplest is just for each element push_back to some vector, keep track of them, and if the chunk size is reached then "flush" them to the output vector:
template<typename T>
std::vector<std::vector<T>> devide(const std::vector<T>& v, size_t chunk) {
// iterative algorithm
std::vector<T> tmp;
std::vector<std::vector<T>> ret;
size_t cnt = 0;
for (auto&& i : v) {
tmp.push_back(i);
++cnt;
if (cnt == chunk) {
cnt = 0;
ret.push_back(tmp);
tmp.clear();
}
}
if (cnt != 0) {
ret.push_back(tmp);
}
return ret;
}
but that iterative approach is not optimal - we could copy chunks of memory. So iterate over vector and copy up to chunk count of elements each loop - and copy less on the last loop.
template<typename T>
std::vector<std::vector<T>> devide2(const std::vector<T>& v, size_t chunk) {
// chunk algorithm
std::vector<std::vector<T>> ret;
const auto max = v.size();
for (size_t i = 0; i < max; ) {
const size_t chunkend = std::min(i + chunk, max);
ret.emplace_back(v.begin() + i, v.begin() + chunkend);
i = chunkend;
}
return ret;
}
Tested on godbolt.

More in STL style:
void even_slice(In b, In e, size_t n, F f)
{
while(std::distance(b, e) >= n) {
f(b, b + n);
b = b + n;
}
if (b != e) {
f(b, e);
}
}
template<typename In, typename Out>
Out even_slice_to_vetors(In b, In e, size_t n, Out out)
{
using ValueType = typename std::iterator_traits<In>::value_type;
using ItemResult = std::vector<ValueType>;
even_slice(b, e, n, [&out](auto x, auto y) { *out++ = ItemResult{x, y}; });
return out;
}
https://godbolt.org/z/zn9Ex1

Note that you know exactly how many subvectors have the wanted maximum size:
template<typename It>
auto subdivide_in_chunks(It first, It last, size_t chunk_size) {
using value_type = typename std::iterator_traits<It>::value_type;
size_t size{ std::distance(first, last) / chunk_size };
std::vector<std::vector<value_type>> ret;
ret.reserve(size);
auto last_chunk = std::next(first, size * chunk_size);
while ( first != last_chunk ) {
auto next = std::next(first, chunk_size);
ret.emplace_back(first, next);
first = next;
}
ret.emplace_back(first, last); // This is the last, shorter one.
return ret;
}

With range-v3, you could simply write:
namespace rs = ranges;
namespace rv = ranges::views;
auto resultVector = inputVector
| rv::chunk(64)
| rs::to<std::vector<std::vector<double>>>;
Here's a demo.

Related

Recursive Merge Sort Algorithm Implementation

I am a newbie to Algorithm. I try to implement recursive merge sorting using std::vector. But I am stuck. The code does not work.
I have looked at the algorithm from Introduction To Algorithms, Cormen/Leiserson/Rivest/Stein 3rd edition. The pseudocode which is I try to implement.
Here my merge function:
void merge(std::vector<int>& vec, size_t vec_init, size_t vec_mid, size_t vec_size) {
int leftLoop = 0;
int rightLoop = 0;
int vecLoop = 0;
size_t mid = vec_mid - vec_init + 1;
std::vector<int> Left_Vec(std::begin(vec), std::begin(vec) + mid);
std::vector<int> Right_Vec(std::begin(vec) + mid, std::end(vec));
for (size_t vecLoop = vec_init; vecLoop<vec_size; ++vecLoop) {
vec[vecLoop] = (Left_Vec[leftLoop] <= Right_Vec[rightLoop]) ? Left_Vec[leftLoop++] : Right_Vec[rightLoop++];
}
}
And here my Merge-Sort function
void merge_sort(std::vector<int>& vec, size_t vec_init, size_t vec_size) {
if (vec_init < vec_size) {
size_t vec_mid = (vec_init + vec_size) / 2;
merge_sort(vec, vec_init, vec_mid);
merge_sort(vec, vec_mid + 1, vec_size);
merge(vec, vec_init, vec_mid, vec_size);
}
}
When the input vec = {30,40,20,10}, the output vec = {10, 10, 0, 20}:
int main() {
auto data = std::vector{ 30, 40, 20, 10 };
merge_sort(data, 0, data.size());
for (auto e : data) std::cout << e << ", ";
std::cout << '\n';
// outputs 10, 10, 0, 20,
}
Where is my mistake about the algorithm or code?
There are a couple of problems. These changes will fix the code:
void merge(std::vector<int>& vec, size_t vec_start, size_t vec_mid, size_t vec_end) {
size_t leftLoop = 0;
size_t rightLoop = 0;
size_t vecLoop = 0;
// Not needed, much simpler if mid is relative to vec.begin()
//size_t mid = vec_mid - vec_init + 1;
// You didn't take vec_init and vec_size into account when calculating the ranges.
std::vector<int> Left_Vec(std::begin(vec) + vec_start, std::begin(vec) + vec_mid);
std::vector<int> Right_Vec(std::begin(vec) + vec_mid, std::begin(vec) + vec_end);
// Values are not uniformly distributed in the left and right vec. You have to check for
// running out of elements in any of them.
for (/*size_t*/ vecLoop = vec_start; leftLoop < Left_Vec.size() && rightLoop < Right_Vec.size(); ++vecLoop) {
// ^~~~~ shadowed outer vecLoop ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vec[vecLoop] = Left_Vec[leftLoop] <= Right_Vec[rightLoop] ? Left_Vec[leftLoop++] : Right_Vec[rightLoop++];
}
// Copy the rest of the values into vec.
if (leftLoop == Left_Vec.size())
std::copy(Right_Vec.begin() + rightLoop, Right_Vec.end(), vec.begin() + vecLoop);
else
std::copy(Left_Vec.begin() + leftLoop, Left_Vec.end(), vec.begin() + vecLoop);
}
void merge_sort(std::vector<int>& vec, size_t vec_start, size_t vec_end) {
// Should only run the function if there are at least 2 elements, otherwise vec_mid
// would be always at least vec_start + 1 and the recursion would never stop.
if (vec_end - vec_start >= 2) {
size_t vec_mid = (vec_start + vec_end) / 2;
merge_sort(vec, vec_start, vec_mid);
merge_sort(vec, vec_mid /* + 1 */, vec_end);
// ^~~ + 1 here would skip an element
merge(vec, vec_start, vec_mid, vec_end);
}
}

Find max position in a vector of vector of vector

I have a vector of vector of vector
std::vector<std::vector<std::vector<double>>> mountain_table
and I would like to find the coordinates i, j, k of this vector for which it is the highest. I know that I should use max_element but I don't know how to use it in a 3d vector.
How should I get those coordinates?
I'd suggest to linearize your data in order to be able to use standard algorithms. The idea is to provide a couple of functions to get an index from 3D coords and vice et versa:
template<class T>
class Matrix3D // minimal
{
public:
using value_type = T;
using iterator = std::vector<value_type>::iterator;
private:
std::vector<value_type> _data;
size_t _sizex, _sizey, _sizez;
size_t index_from_coords(size_t x, size_t y, size_t z) const
{
return x*_sizex*_sizey + y*_sizey + z;
}
std::tuple<size_t, size_t, size_t> coords_from_index(size_t index) const
{
const size_t x = index / (_sizex * _sizey);
index = index % x;
const size_t y = index / _sizey;
const size_t z = index % _sizey;
return make_tuple(x, y, z);
}
public:
Matrix3D(size_t sizex, sizey, sizez) : _sizex(sizex), ... {}
T& operator()(size_t x, size_t y, size_t z) // add const version
{
return _data[index_from_coords(x, y, z)];
}
std::tuple<size_t, size_t, size_t> coords(iterator it)
{
size_t index = std::distance(begin(_data), it);
return coords_from_index(index);
}
iterator begin() { return begin(_data); }
iterator end() { return end(_data); }
}
Usage:
Matrix3D<double> m(3, 3, 3);
auto it = std::max_element(m.begin(), m.end()); // or min, or whatever from http://en.cppreference.com/w/cpp/header/algorithm
auto coords = m.coords(it);
std::cout << "x=" << coords.get<0>() << ... << "\n";
This is untested and incomplete code to give you a kickstart into better data design. i'd be happy to answer further questions about this idea in the comment below ;)
Here is how I would do it, by looping over the matrix, checking for highest values, and recording its indexes.
size_t highestI = 0;
size_t highestJ = 0;
size_t highestK = 0;
double highestValue = -std::numeric_limits<double>::infinity(); // Default value (Include <limits>)
for (size_t i = 0; i < mountain_table.size(); ++i)
{
for (size_t j = 0; j < mountain_table[i].size(); ++j)
{
for (size_t k = 0; k < mountain_table[i][j].size(); ++k)
{
if (mountain_table[i][j][k] > highestValue)
{
highestValue = mountain_table[i][j][k]; // Highest
// value needed to figure out highest indexes
// Stores the current highest indexes
highestI = i;
highestJ = j;
highestK = k;
}
}
}
}
This may not be the most efficient algorithm, but it gets the job done in an understandable way.
Since the max_element function is pretty short and easy to implement, I would suggest to write something similar yourself to fit your exact scenario.
// For types like this I would suggest using a type alias
using Vector3d = std::vector<std::vector<std::vector<double>>>;
std::array<size_t, 3> max_element(const Vector3d& vector) {
std::std::array<size_t, 3> indexes;
double biggest = vector[0][0][0];
for (unsigned i = 0; i < vector.size(); ++i)
for (unsigned j = 0; j < vector[i].size(); ++j)
for (unsigned k = 0; k < vector[i][j].size(); ++k)
if (value > biggest) {
biggest = value;
indexes = { i, j, k };
}
return indexes;
}
One other suggestion I could give you is to write your custom class Vector3d, with convenient functions like operator()(int x, int y, int z) etc. and save the data internally in simple vector<double> of size width * height * depth.
std::size_t rv[3] = {0};
std::size_t i = 0;
double max_value = mountain_table[0][0][0];
for (const auto& x : mountain_table) {
std::size_t j = 0;
for (const auto& y : x) {
auto it = std::max_element(y.begin(), y.end());
if (*it > max_value) {
rv[0] = i; rv[1] = j; rv[2] = it - y.begin();
max_value = *it;
}
++j;
}
++i;
}
I do not think you can use std::max_element for such data. You can use std::accumulate():
using dvect = std::vector<double>;
using ddvect = std::vector<dvect>;
using dddvect = std::vector<ddvect>;
dddvect mx = { { { 1, 2, 3 }, { -1, 3 }, { 8,-2, 3 } },
{ {}, { -1, 25, 3 }, { 7, 3, 3 } },
{ { -1, -2, -3 }, {}, { 33 } } };
struct max_value {
size_t i = 0;
size_t j = 0;
size_t k = 0;
double value = -std::numeric_limits<double>::infinity();
max_value() = default;
max_value( size_t i, size_t j, size_t k, double v ) : i( i ), j( j ), k( k ), value( v ) {}
max_value operator<<( const max_value &v ) const
{
return value > v.value ? *this : v;
}
};
auto max = std::accumulate( mx.begin(), mx.end(), max_value{}, [&mx]( const max_value &val, const ddvect &ddv ) {
auto i = std::distance( &*mx.cbegin(), &ddv );
return std::accumulate( ddv.begin(), ddv.end(), val, [i,&ddv]( const max_value &val, const dvect &dv ) {
auto j = std::distance( &*ddv.cbegin(), &dv );
return std::accumulate( dv.begin(), dv.end(), val, [i,j,&dv]( const max_value &val, const double &d ) {
auto k = std::distance( &*dv.cbegin(), &d );
return val << max_value( i, j, k, d );
} );
} );
} );
live example. Code could be simplified if C++14 or later allowed but I am not sure that it would worse the effort and data reorganization most probably would work better (you would be able to use std::max_element() on singe vector for example). On another side this layout supports jagged matrix as shown on example (different size subarrays)
You should use "for" loop , because you don't have 3d vector.
for (size_t i = 0; i <mountain_table.size(); ++i)
{
for (size_t j = 0; j < mountain_table[i].size() ++j)
{
// find max element index k here and check if it is maximum.
// If yes save i, j, k and update max val
}
}

Find nearest three values of a number from an array of numbers

I have 20 coorinates x[20], y[20], I'm trying to get the nearest 3 coorinates to the user coordinate, this function supposed to return the indexes of the 3 nearest values.
double distanceFormula(double x1, double x2, double y1, double y2){
return sqrt(pow((x1 - x2), 2) + pow((y1 - y2), 2));
}
int* FindNearestThree(double keyX, double keyY, double x[], double y[]){
int wanted [3];
double distance;
double distTemp;
for (int i = 0; i<20; i++)
{
distTemp = formula(keyX, x[i], keyY, y[i]);
if (distance != null || distance > distTemp){
distance = distTemp;
wanted[0] = i;
}
//this will get only the nearest value
}
return results;
}
using Point = std::pair<int, int>;
std::array<Point, 20> points;
populate(points);
std::sort(
points.begin()
, points.end()
, [up=get_user_coords()](const Point& p1, const Point& p2) {
int d1 = std::pow(up.first - p1.first, 2) + std::pow(up.second - p1.second, 2);
int d2 = std::pow(up.first - p2.first, 2) + std::pow(up.second - p2.second, 2);
return d1 < d2;
});
// The nearest 3 points are now at indices 0, 1, 2.
If you need to work with many, many more points, then I suggest doing some research on the Nearest neighbor search algorithm, because this can get slow fast.
Following may help:
template <std::size_t N, typename It, typename Queue>
std::array<It, N> asArray(Queue& queue, It emptyValue)
{
std::array<It, N> res;
for (auto& e : res) {
if (queue.empty()) {
e = emptyValue;
} else {
e = queue.top();
queue.pop();
}
}
return res;
}
template <std::size_t N, typename It, typename ValueGetter>
std::array<It, N>
MinNElementsBy(It begin, It end, ValueGetter valueGetter)
{
auto myComp = [&](const It& lhs, const It& rhs)
{
return valueGetter(*lhs) < valueGetter(*rhs);
};
std::priority_queue<It, std::vector<It>, decltype(myComp)> queue(myComp);
for (auto it = begin; it != end; ++it) {
queue.push(it);
if (N < queue.size()) {
queue.pop();
}
}
return asArray<N>(queue, end);
}
Live Demo
i guess it might be the simplest and really ugly solution:
for (int j = 0; j <3; j++) {
for (int i = 0; i<20; i++)
{ /* if statement needed here to check if you already
have current value in your result set
and then your loop as it is*/
}
}

Comparing two vector<bool> with SSE

I have two vector<bool> A and B.
I want to compare them and count the number of elements that are equal:
For example:
A = {0,1,0,1}
B = {0,0,1,1}
Result will be equal to 2.
I can use _mm_cmpeq_epi8 but it is only compare 16 elements (i.e. I should convert 0 and 1 to char and then do the comparison).
Is it possible to compare 128 elements each time with SSE (or SIMD instructions)?
If you can either assume that vector<bool> is using contiguous byte-sized elements for storage, or if you can consider using something like vector<uint8_t> instead, then this example should give you a good starting point:
static size_t count_equal(const vector<uint8_t> &vec1, const vector<uint8_t> &vec2)
{
assert(vec1.size() == vec2.size()); // vectors must be same size
const size_t n = vec1.size();
const size_t max_block_size = 255 * 16; // max block size before possible overflow
__m128i vcount = _mm_setzero_si128();
size_t i, count = 0;
for (i = 0; i + 16 <= n; ) // for each block
{
size_t m = std::min(n, i + max_block_size);
for ( ; i + 16 <= m; i += 16) // for each vector in block
{
__m128i v1 = _mm_loadu_si128((__m128i *)&vec1[i]);
__m128i v2 = _mm_loadu_si128((__m128i *)&vec2[i]);
__m128i vcmp = _mm_cmpeq_epi8(v1, v2);
vcount = _mm_sub_epi8(vcount, vcmp);
}
vcount = _mm_sad_epu8(vcount, _mm_setzero_si128());
count += _mm_extract_epi16(vcount, 0) + _mm_extract_epi16(vcount, 4);
vcount = _mm_setzero_si128(); // update count from current block
}
vcount = _mm_sad_epu8(vcount, _mm_setzero_si128());
count += _mm_extract_epi16(vcount, 0) + _mm_extract_epi16(vcount, 4);
for ( ; i < n; ++i) // deal with any remaining partial vector
{
count += (vec1[i] == vec2[i]);
}
return count;
}
Note that this is using vector<uint8_t>. If you really have to use vector<bool> and can guarantee that the elements will always be contiguous and byte-sized then you'll just need to coerce the vector<bool> into a const uint8_t * or similar somehow.
Test harness:
#include <cassert>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <vector>
#include <emmintrin.h> // SSE2
using std::vector;
static size_t count_equal_ref(const vector<uint8_t> &vec1, const vector<uint8_t> &vec2)
{
assert(vec1.size() == vec2.size());
const size_t n = vec1.size();
size_t i, count = 0;
for (i = 0 ; i < n; ++i)
{
count += (vec1[i] == vec2[i]);
}
return count;
}
static size_t count_equal(const vector<uint8_t> &vec1, const vector<uint8_t> &vec2)
{
assert(vec1.size() == vec2.size()); // vectors must be same size
const size_t n = vec1.size();
const size_t max_block_size = 255 * 16; // max block size before possible overflow
__m128i vcount = _mm_setzero_si128();
size_t i, count = 0;
for (i = 0; i + 16 <= n; ) // for each block
{
size_t m = std::min(n, i + max_block_size);
for ( ; i + 16 <= m; i += 16) // for each vector in block
{
__m128i v1 = _mm_loadu_si128((__m128i *)&vec1[i]);
__m128i v2 = _mm_loadu_si128((__m128i *)&vec2[i]);
__m128i vcmp = _mm_cmpeq_epi8(v1, v2);
vcount = _mm_sub_epi8(vcount, vcmp);
}
vcount = _mm_sad_epu8(vcount, _mm_setzero_si128());
count += _mm_extract_epi16(vcount, 0) + _mm_extract_epi16(vcount, 4);
vcount = _mm_setzero_si128(); // update count from current block
}
vcount = _mm_sad_epu8(vcount, _mm_setzero_si128());
count += _mm_extract_epi16(vcount, 0) + _mm_extract_epi16(vcount, 4);
for ( ; i < n; ++i) // deal with any remaining partial vector
{
count += (vec1[i] == vec2[i]);
}
return count;
}
int main(int argc, char * argv[])
{
size_t n = 100;
if (argc > 1)
{
n = atoi(argv[1]);
}
vector<uint8_t> vec1(n);
vector<uint8_t> vec2(n);
srand((unsigned int)time(NULL));
for (size_t i = 0; i < n; ++i)
{
vec1[i] = rand() & 1;
vec2[i] = rand() & 1;
}
size_t n_ref = count_equal_ref(vec1, vec2);
size_t n_test = count_equal(vec1, vec2);
if (n_ref == n_test)
{
std::cout << "PASS" << std::endl;
}
else
{
std::cout << "FAIL: n_ref = " << n_ref << ", n_test = " << n_test << std::endl;
}
return 0;
}
Compile and run:
$ g++ -Wall -msse3 -O3 test.cpp && ./a.out
PASS
std::vector<bool> is a specialization of std::vector for the type bool. Although not specified by the C++ standard, in most implementations std::vector<bool> is made space efficient such that each of its element is a single bit instead of a bool.
The behaviour of std::vector<bool> is similar to its primarily template counterpart, except that:
std::vector<bool> does not necessarily store its element contiguously .
In order to expose its elements (i.e., the individual bits) std::vector<bool> uses a proxy class (i.e., std::vector<bool>::reference). Objects of class std::vector<bool>::reference are returned by std::vector<bool> subscript operator (i.e., operator[]) by value.
Accordingly, I don't think it's portable to use _mm_cmpeq_epi8 like functions since storage of a std::vector<bool> is implementation defined (i.e., not guaranteed contiguous).
An alternative but portable way is to use regular STL facilities like the example below:
std::vector<bool> A = {0,1,0,1};
std::vector<bool> B = {0,0,1,1};
std::vector<bool> C(A.size());
std::transform(A.begin(), A.end(), B.begin(), C.begin(), [](bool const &a, bool const &b) { return a == b;});
std::cout << std::count(C.begin(), C.end(), true) << std::endl;
Live Demo

Optimization of determinant calculation function

Searching for the best algorithm I found there is a tradeoff: complexity to implement and big constant on the one hand, and runtime complexity on the other hand. I choose LU-decomposition-based algorithm, because it is quite simple to implement and have good enough performance.
#include <valarray>
#include <vector>
#include <utility>
#include <cmath>
#include <cstddef>
#include <cassert>
template< typename value_type >
struct math
{
using size_type = std::size_t;
size_type const dimension_;
value_type const & eps;
value_type const zero = value_type(0);
value_type const one = value_type(1);
private :
using vector = std::valarray< value_type >;
using matrix = std::vector< vector >;
matrix matrix_;
matrix minor_;
public :
math(size_type const _dimension,
value_type const & _eps)
: dimension_(_dimension)
, eps(_eps)
, matrix_(dimension_)
, minor_(dimension_ - 1)
{
assert(1 < dimension_);
assert(!(eps < zero));
for (size_type r = 0; r < dimension_; ++r) {
matrix_[r].resize(dimension_);
}
size_type const minor_size = dimension_ - 1;
for (size_type r = 0; r < minor_size; ++r) {
minor_[r].resize(minor_size);
}
}
template< typename rhs = matrix >
void
operator = (rhs const & _matrix)
{
auto irow = std::begin(matrix_);
for (auto const & row_ : _matrix) {
auto icol = std::begin(*irow);
for (auto const & v : row_) {
*icol = v;
++icol;
}
++irow;
}
}
value_type
det(matrix & _matrix,
size_type const _dimension)
{ // calculates lower unit triangular matrix and upper triangular
assert(0 < _dimension);
value_type det_ = one;
for (size_type i = 0; i < _dimension; ++i) {
vector & ri_ = _matrix[i];
using std::abs;
value_type max_ = abs(ri_[i]);
size_type pivot = i;
{
size_type p = i;
while (++p < _dimension) {
value_type y_ = abs(_matrix[p][i]);
if (max_ < y_) {
max_ = std::move(y_);
pivot = p;
}
}
}
if (!(eps < max_)) { // regular?
return zero; // singular
}
if (pivot != i) {
det_ = -det_; // each permutation flips sign of det
ri_.swap(_matrix[pivot]);
}
value_type & dia_ = ri_[i];
det_ *= dia_; // det is multiple of diagonal elements
for (size_type j = 1 + i; j < _dimension; ++j) {
_matrix[j][i] /= dia_;
}
for (size_type a = 1 + i; a < _dimension; ++a) {
vector & a_ = minor_[a - 1];
value_type const & ai_ = _matrix[a][i];
for (size_type b = 1 + i; b < _dimension; ++b) {
a_[b - 1] = ai_ * ri_[b];
}
}
for (size_type a = 1 + i; a < _dimension; ++a) {
vector const & a_ = minor_[a - 1];
vector & ra_ = _matrix[a];
for (size_type b = 1 + i; b < _dimension; ++b) {
ra_[b] -= a_[b - 1];
}
}
}
return det_;
}
value_type
det(size_type const _dimension)
{
return det(matrix_, _dimension);
}
value_type
det()
{
return det(dimension_);
}
};
// main.cpp
#include <iostream>
#include <cstdlib>
int
main()
{
using value_type = double;
value_type const eps = std::numeric_limits< value_type >::epsilon();
std::size_t const dimension_ = 3;
math< value_type > m(dimension_, eps);
m = { // example from https://en.wikipedia.org/wiki/Determinant#Laplace.27s_formula_and_the_adjugate_matrix
{-2.0, 2.0, -3.0},
{-1.0, 1.0, 3.0},
{ 2.0, 0.0, -1.0}
};
std::cout << m.det() << std::endl; // 18
return EXIT_SUCCESS;
}
LIVE DEMO
det() function is hottest function in the algorithm, that uses it as a part. I sure det() is not as fast as it can be, because runtime performance comparisons (using google-pprof) to reference implementation of the whole algorithm shows a disproportion towards det().
How to improve performance of det() function? What are evident optimizations to apply immediately? Should I change the indexing and memory access order or something else? Container types? Prefetching?
Typical value of dimension_ is in the range of 3 to 10 (but can be 100, if value_type is mpfr or something else).
Isn't your (snippet from det())
for (size_type a = 1 + i; a < _dimension; ++a) {
vector & a_ = minor_[a - 1];
value_type const & ai_ = _matrix[a][i];
for (size_type b = 1 + i; b < _dimension; ++b) {
a_[b - 1] = ai_ * ri_[b];
}
}
for (size_type a = 1 + i; a < _dimension; ++a) {
vector const & a_ = minor_[a - 1];
vector & ra_ = _matrix[a];
for (size_type b = 1 + i; b < _dimension; ++b) {
ra_[b] -= a_[b - 1];
}
}
doing the same as
for (size_type a = 1 + i; a < _dimension; ++a) {
vector & ra_ = _matrix[a];
value_type ai_ = ra_[i];
for (size_type b = 1 + i; b < _dimension; ++b) {
ra_[b] -= ai_ * ri_[b];
}
}
without any need for minor_? Moreover, now the inner loop can easily be vectorised.