MKL sparse matrix addition fails on last entry - c++

I'm trying to use the MKL routine mkl_dcsradd to add an upper-triangular matrix to its transpose. In this case, the upper triangular matrix stores part of the adjacency matrix of a graph, and I need the full version for implementing another algorithm.
In this simplified example, I start with a list of (11) edges, and build an upper-triangular CSR matrix from it. I have checked that this much works. However, when I try to add it to its transpose, dcsradd stops on the final row, saying it's run out of space. However, this shouldn't be the case. An upper triangular matrix (no zeros along the diagonal) with n non-zero entries, when added to its transpose, should result in a matrix with 2n (22) non-zeros.
When I supply dcsradd with a maximum non-zeros of 22, it fails, but when I supply it with 23 (an excessive value), it works correctly. Why is this?
I've simplified my code down to a minimal example demonstrating the error:
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <mkl.h>
int main()
{
int nnz = 11;
int numVertices = 10;
int32_t u[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1 };
int32_t v[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 8 };
double w[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
int fullNnz = nnz * 2;
int dim = numVertices;
double triData[nnz];
int triCols[nnz];
int triRows[dim];
// COO to upper-triangular CSR
int info = -1;
int job [] = { 2, 1, 0, 0, nnz, 0 };
mkl_dcsrcoo(job, &dim,
triData, triCols, triRows,
&nnz, w, u, v,
&info);
printf("info = %d\n", info);
// Allocate final memory
double data[fullNnz];
int cols[fullNnz];
int rows[dim];
// B = A + A^T (to make a full adjacency matrix)
int request = 0, sort = 0;
double beta = 1.0;
int WRONG_NNZ = fullNnz + 1; // What is happening here?
mkl_dcsradd("t", &request, &sort, &dim, &dim,
triData, triCols, triRows,
&beta, triData, triCols, triRows,
data, cols, rows,
&WRONG_NNZ, &info);
printf("info = %d\n", info);
// Convert back to 0-based indexing (via Cilk)
cols[:]--;
rows[:]--;
printf("data:");
for (double d : data) printf("%.0f ", d);
printf("\ncols:");
for (int c : cols) printf("%d ", c);
printf("\nrows:");
for (int r : rows) printf("%d ", r);
printf("\n");
return 0;
}
I compile with:
icc -O3 -std=c++11 -xHost main.cpp -o main -openmp -L/opt/intel/composerxe/mkl/lib -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm
When I give 22, the output is:
info = 0
info = 10
data:1 10 1 2 11 2 3 3 4 4 5 10 5 6 6 7 7 8 11 8 9 0
cols:1 5 0 2 8 1 3 2 4 3 5 0 4 6 5 7 6 8 1 7 9 -1
rows:0 2 5 7 9 11 14 16 18 21
But, when I give 23, the output is:
info = 0
info = 0
data:1 10 1 2 11 2 3 3 4 4 5 10 5 6 6 7 7 8 11 8 9 9
cols:1 5 0 2 8 1 3 2 4 3 5 0 4 6 5 7 6 8 1 7 9 8
rows:0 2 5 7 9 11 14 16 18 21

Related

Pointer to portions of array

I have an object of std::vector<std::array<double, 16>>
vector entry Data
[0] - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[1] - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[2] - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[...] - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
This is intended to represent a 4x4 matrix in ravel format.
To not duplicate information I would like to create a pointer to extract a 3x3 from the above structure:
I have mathematical operations for the 3x3 structure (std::array<double, 9>)
someStructure: pointing to data elements [0, 1, 2, 4, 5, 6, 8, 9 10]
The end goal is do: std::array<double, 9> tst = someStructure[0] + someStructure[1];
Is this doable?
Best Regards
The 3x3 part is not contiguous, hence a pointer alone wont help here.
You can write a view_as_3x3 that allows you to access elements of the submatrix of the 4x4 as if it was contiguous:
struct view_as_3x3 {
double& operator[](size_t index) {
static const size_t mapping[] = {0, 1, 2, 4, 5, 6, 8, 9, 10};
return parent[mapping[index]];
}
std::array<double, 16>& parent;
};
Such that for example
for (size_t = 0; i< 9; ++i) std::cout << " " << view_as_3x3{orignal_matrix}[i];
is printing the 9 elements of the 3x3 sub-matrix of the original 4x4 original_matrix.
Then you could more easily apply your 3x3 algorithms to the 3x3 submatrix of a 4x4 matrix. You just need to replace the std::array<double, 9> with some generic T. For example change
double sum_of_elements(const std::array<double, 9>& arr) {
double res = 0;
for (int i=0;i <9; ++i) res += arr[i];
return res;
}
To:
template <typename T>
double sum_of_elements(const T& arr) {
double res = 0;
for (int i=0;i <9; ++i) res += arr[i];
return res;
}
The calls are then
std::array<double, 16> matrix4x4;
sum_of_elements(view_as_3x3{matrix4x4});
// or
std::array<double, 9> matrix3x3;
sum_of_elements(matrix3x3);
It would be nicer to use iterators instead of indices, however, writing the view with custom iterators requires considerable amount of boilerplate. On the other hand, I would not suggest to use naked std::arrays in the first place, but rather some my_4x4matrix that holds the array as member and provides iterators and more convenience methods.

getting every n'th row in eigen MatrixXf

I would like to do simple stuff - extract every second row in a matrix of shape [4,5] to get two output matrices of shapes [2,5]. My matrix is:
Eigen::MatrixXf m(4,5);
m << 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20;
when I print I get the expected:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Now how to obtain the expected outputs: ?
// expected mOut1:
1 2 3 4 5
11 12 13 14 15
// and expected mOut2:
6 7 8 9 10
16 17 18 19 20
So far I tried with Eigen::Map and strides, but I have no idea how do those strides work.
I made a function:
Eigen::Map<Eigen::MatrixXf, 0, Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic>> myMatrixSlice(const float* data, unsigned dim1, unsigned dim2, int stride1, int stride2) {
using Stride = Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic>;
return Eigen::Map<Eigen::MatrixXf, 0, Stride>{
const_cast<float*>(data), dim1, dim2, Stride{stride1, stride2}};
}
and I want to call it like this:
auto mOut1 = myMatrixSlice(m.data(), 2, 5, ?, ?);
auto mOut2 = myMatrixSlice(m.data(), 2, 5, ?, ?); // remember to start with row=1
but I guess I need some help on this.

Find the row sum of a matrix using CSR or COO storing format

I have for example the following matrix B which is stored in COO and CSR format (retrieved from the non-symetric example here). Could you please suggest an efficient c++ way to apply the matlab sum(B,2) function using the coo or csr(or both) storing format? Because it is quit possible to work with large arrays can we do that using parallel programming (omp or CUDA (e.g, thrust))?
Any algorithmic or library based suggestions are highly appreciated.
Thank you!
PS: Code to construct a sparse matrix and get the CSR coordinates can be found for example in the answer of this post.
COO format: CSR format:
row_index col_index value columns row_index value
1 1 1 0 0 1
1 2 -1 1 3 -1
1 3 -3 3 5 -3
2 1 -2 0 8 -2
2 2 5 1 11 5
3 3 4 2 13 4
3 4 6 3 6
3 5 4 4 4
4 1 -4 0 -4
4 3 2 2 2
4 4 7 3 7
5 2 8 1 8
5 5 -5 4 -5
For COO its pretty simple:
struct MatrixEntry {
size_t row;
size_t col;
int value;
};
std::vector<MatrixEntry> matrix = {
{ 1, 1, 1 },
{ 1, 2, -1 },
{ 1, 3, -3 },
{ 2, 1, -2 },
{ 2, 2, 5 },
{ 3, 3, 4 },
{ 3, 4, 6 },
{ 3, 5, 4 },
{ 4, 1, -4 },
{ 4, 3, 2 },
{ 4, 4, 7 },
{ 5, 2, 8 },
{ 5, 5, -5 },
};
std::vector<int> sum(5);
for (const auto& e : matrix) {
sum[e.row-1] += e.value;
}
and for large matrixes you can just split up the for loop into multiple smaller ranges and add the results at the end.
If you only need the sum of each row (and not columwise) CSR is also straight forward (and even more efficient):
std::vector<int> row_idx = { 0, 3, 5, 8, 11, 13 };
std::vector<int> value = { 1, -1, -3, -2, 5, 4, 6, 4, -4, 2, 7, 8, -5 };
std::vector<int> sum(5);
for(size_t i = 0; i < row_idx.size()-1; ++i) {
sum[i] = std::accumulate(value.begin() + row_idx[i], value.begin() + row_idx[i + 1], 0);
}
Again, for parallelism you can simply split up the loop.

Striding windows

Assume that I have a vector:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
What I need to do is split this vector into block sizes of blocksize with an overlap
blocksize = 4
overlap = 2
The result, would be a 2D vector with size 4 containing 6 values.
x[0] = [1, 3, 5, 7, 9, 11]
x[1] = [ 2 4 6 8 10 12]
....
I have tried to implement this with the following functions:
std::vector<std::vector<double> > stride_windows(std::vector<double> &data, std::size_t
NFFT, std::size_t overlap)
{
std::vector<std::vector<double> > blocks(NFFT);
for(unsigned i=0; (i < data.size()); i++)
{
blocks[i].resize(NFFT+overlap);
for(unsigned j=0; (j < blocks[i].size()); j++)
{
std::cout << data[i*overlap+j] << std::endl;
}
}
}
This is wrong, and, segments.
std::vector<std::vector<double> > frame(std::vector<double> &signal, int N, int M)
{
unsigned int n = signal.size();
unsigned int num_blocks = n / N;
unsigned int maxblockstart = n - N;
unsigned int lastblockstart = maxblockstart - (maxblockstart % M);
unsigned int numbblocks = (lastblockstart)/M + 1;
std::vector<std::vector<double> > blocked(numbblocks);
for(unsigned i=0; (i < numbblocks); i++)
{
blocked[i].resize(N);
for(int j=0; (j < N); j++)
{
blocked[i][j] = signal[i*M+j];
}
}
return blocked;
}
I wrote this function, thinking that it did the above, however, it will just store:
X[0] = 1, 2, 3, 4
x[1] = 3, 4, 5, 6
.....
Could anyone please explain how I would go about modifying the above function to allow for skips by overlap to take place?
This function is similar to this: Rolling window
EDIT:
I have the following vector:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
I want to split this vector, into sub-blocks (thus creating a 2D vector), with an overlap of the parameter overlap so in this case, the parameters would be: size=4 overlap=2, this would then create the following 2D vector:
`block0 = [ 1 3 5 7 9 11]
block1 = [ 2 4 6 8 10 12]
block2 = [ 3 5 7 9 11 13]
block3 = [ 4 6 8 10 12 14]`
So essentially, 4 blocks have been created, each block contains a value where the element is skipped by the overlap
EDIT 2:
This is where I need to get to:
The value of overlap will overlap the results of x in terms of placements inside the vector:
block1 = [1, 3, 5, 7, 9, 11]
Notice from the actual vector block:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
Value: 1 -> This is pushed into block "1"
Value 2 -> This is not pushed into block "1" (overlap is skip 2 places in the vector)
Value 3 -> This is pushed into block "1"
value 4 -> This is not pushed into block "1" (overlap is skip to places in the vector)
value 5 -> This is pushed into block "1"
value 6 -> "This is not pushed into block "1" (overlap is skip 2 places in the vector)
value 7 -> "This value is pushed into block "1"
value 8 -> "This is not pushed into block "1" (overlap is skip 2 places in the vector)"
value 9 -> "This value is pushed into block "1"
value 10 -> This value is not pushed into block "1" (overlap is skip 2 places in the
vector)
value 11 -> This value is pushed into block "1"
BLOCK 2
Overlap = 2;
value 2 - > Pushed back into block "2"
value 4 -> Pushed back into block "2"
value 6, 8, 10 etc..
So each time, the place in the vector is skipped by the "overlap" in this case, it is the value of 2..
This is what the expected output would be:
[[ 1 3 5 7 9 11]
[ 2 4 6 8 10 12]
[ 3 5 7 9 11 13]
[ 4 6 8 10 12 14]]
If I understand you correctly, you're pretty close. You need something like the following. I used int because frankly its easier to type than double =P
#include <iostream>
#include <algorithm>
#include <vector>
#include <limits>
#include <iterator>
std::vector<std::vector<int>>
split(const std::vector<int>& data, size_t blocksize, size_t overlap)
{
// compute maximum block size
std::vector<std::vector<int>> res;
size_t minlen = (data.size() - blocksize)/overlap + 1;
auto start = data.begin();
for (size_t i=0; i<blocksize; ++i)
{
res.emplace_back(std::vector<int>());
std::vector<int>& block = res.back();
auto it = start++;
for (size_t j=0; j<minlen; ++j)
{
block.push_back(*it);
std::advance(it,overlap);
}
}
return res;
}
int main()
{
std::vector<int> data { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 };
for (size_t i=2; i<6; ++i)
{
for (size_t j=2; j<6; ++j)
{
std::vector<std::vector<int>> blocks = split(data, i, j);
std::cout << "Blocksize = " << i << ", Overlap = " << j << std::endl;
for (auto const& obj : blocks)
{
std::copy(obj.begin(), obj.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
std::cout << std::endl;
}
}
return 0;
}
Output
Blocksize = 2, Overlap = 2
1 3 5 7 9 11 13
2 4 6 8 10 12 14
Blocksize = 2, Overlap = 3
1 4 7 10 13
2 5 8 11 14
Blocksize = 2, Overlap = 4
1 5 9 13
2 6 10 14
Blocksize = 2, Overlap = 5
1 6 11
2 7 12
Blocksize = 3, Overlap = 2
1 3 5 7 9 11
2 4 6 8 10 12
3 5 7 9 11 13
Blocksize = 3, Overlap = 3
1 4 7 10
2 5 8 11
3 6 9 12
Blocksize = 3, Overlap = 4
1 5 9
2 6 10
3 7 11
Blocksize = 3, Overlap = 5
1 6 11
2 7 12
3 8 13
Blocksize = 4, Overlap = 2
1 3 5 7 9 11
2 4 6 8 10 12
3 5 7 9 11 13
4 6 8 10 12 14
Blocksize = 4, Overlap = 3
1 4 7 10
2 5 8 11
3 6 9 12
4 7 10 13
Blocksize = 4, Overlap = 4
1 5 9
2 6 10
3 7 11
4 8 12
Blocksize = 4, Overlap = 5
1 6 11
2 7 12
3 8 13
4 9 14
Blocksize = 5, Overlap = 2
1 3 5 7 9
2 4 6 8 10
3 5 7 9 11
4 6 8 10 12
5 7 9 11 13
Blocksize = 5, Overlap = 3
1 4 7 10
2 5 8 11
3 6 9 12
4 7 10 13
5 8 11 14
Blocksize = 5, Overlap = 4
1 5 9
2 6 10
3 7 11
4 8 12
5 9 13
Blocksize = 5, Overlap = 5
1 6
2 7
3 8
4 9
5 10

Regarding vector values

Here is a code snippet below.
Input to program is
dimension d[] = {{4, 6, 7}, {1, 2, 3}, {4, 5, 6}, {10, 12, 32}};
PVecDim vecdim(new VecDim());
for (int i=0;i<sizeof(d)/sizeof(d[0]); ++i) {
vecdim->push_back(&d[i]);
}
getModList(vecdim);
Program:
class dimension;
typedef shared_ptr<vector<dimension*> > PVecDim;
typedef vector<dimension*> VecDim;
typedef vector<dimension*>::iterator VecDimIter;
struct dimension {
int height, width, length;
dimension(int h, int w, int l) : height(h), width(w), length(l) {
}
};
PVecDim getModList(PVecDim inList) {
PVecDim modList(new VecDim());
VecDimIter it;
for(it = inList->begin(); it!=inList->end(); ++it) {
dimension rot1((*it)->length, (*it)->width, (*it)->height);
dimension rot2((*it)->width, (*it)->height, (*it)->length);
cout<<"rot1 "<<rot1.height<<" "<<rot1.length<<" "<<rot1.width<<endl;
cout<<"rot2 "<<rot2.height<<" "<<rot2.length<<" "<<rot2.width<<endl;
modList->push_back(*it);
modList->push_back(&rot1);
modList->push_back(&rot2);
for(int i=0;i < 3;++i) {
cout<<(*modList)[i]->height<<" "<<(*modList)[i]->length<<" "<<(*modList)[i]->width<<" "<<endl;
}
}
return modList;
}
What I see is that the values rot1 and rot2 actually overwrite previous values.
For example that cout statement prints as below for input values defined at top. Can someone tell me why are these values being overwritten?
rot1 7 4 6
rot2 6 7 4
4 7 6
7 4 6
6 7 4
rot1 3 1 2
rot2 2 3 1
4 7 6
3 1 2
2 3 1
You are storing pointers to local variables when you do this kind of thing:
modList->push_back(&rot1);
These get invalidated every loop cycle. You could save yourself a lot of trouble by not storing pointers in the first place.