Sorting large std::vector of custom objects - c++

I am creating a sparse matrix in CSR format, for which I start with a vector of matrix element structures. It needs to be std::vector at the beginning because I don't know ahead of time how many non-zeros my matrix is going to have. Then, to fill up the appropriate arrays for the CSR matrix, I need to first sort this array of non-zeros, in the order they appear in the matrix if one goes through it line-by-line. But above a certain matrix size (roughly 1 500 000 non-zeros), the sorted vector does not start from the beginning of the matrix. It is still sorted, but starts around row 44000.
// Matrix element struct:
struct mel
{
int Ncols;
int row,col;
MKL_Complex16 val;
void print();
};
// Custom function for sorting:
struct less_than_MElem
{
inline bool operator() (const mel& ME1, const mel& ME2)
{
return ( ( ME1.row*ME1.Ncols+ME1.col ) < ( ME2.row*ME2.Ncols+ME2.col ) );
}
};
int main()
{
std::vector<mel> mevec;
/* long piece of code that fills up mevec */
std::sort( mevec.begin(), mevec.end(), less_than_MElem() );
return 0;
}
I thought maybe as the vector was grown dynamically it wound up in separate blocks in the memory and the iterator wasn't pointing at the genuine beginning/end anymore. So I have tried creating a new vector and started with resizing it to the size that is known by that time. Then copied the elements one-by-one into this new vector and sorted it, but the result was the same.
Nelements = mevec.size();
std::vector<mel> nzeros;
nzeros.resize(Nelements);
for( int i = 0; i < Nelements; i++ )
{
nzeros[i].Ncols = mevec[i].Ncols;
nzeros[i].row = mevec[i].row;
nzeros[i].col = mevec[i].col;
nzeros[i].val = mevec[i].val;
}
std::sort( nzeros.begin(), nzeros.end(), less_than_MElem() );
Can anyone think of a solution?

Related

Vector dot product in Microsoft SEAL with CKKS

I am currently trying to implement matrix multiplication methods using the Microsoft SEAL library. I have created a vector<vector<double>> as input matrix and encoded it with CKKSEncoder. However the encoder packs an entire vector into a single Plaintext so I just have a vector<Plaintext> which makes me lose the 2D structure (and then of course I'll have a vector<Ciphertext> after encryption). Having a 1D vector allows me to access only the rows entirely but not the columns.
I managed to transpose the matrices before encoding. This allowed me to multiply component-wise the rows of the first matrix and columns (rows in transposed form) of the second matrix but I am unable to sum the elements of the resulting vector together since it's packed into a single Ciphertext. I just need to figure out how to make the vector dot product work in SEAL to perform matrix multiplication. Am I missing something or is my method wrong?
It has been suggested by KyoohyungHan in the issue: https://github.com/microsoft/SEAL/issues/138 that it is possible to solve the problem with rotations by rotating the output vector and summing it up repeatedly.
For example:
// my_output_vector is the Ciphertext output
vector<Ciphertext> rotations_output(my_output_vector.size());
for(int steps = 0; steps < my_output_vector.size(); steps++)
{
evaluator.rotate_vector(my_output_vector, steps, galois_keys, rotations_output[steps]);
}
Ciphertext sum_output;
evaluator.add_many(rotations_output, sum_output);
vector of vectors is not the same as an array of arrays (2D, matrix).
While one-dimentional vector<double>.data() points to contiguous memory space (e.g., you can do memcpy on that), each of "subvectors" allocates own, separate memory buffer. Therefore vector<vector<double>>.data() makes no sense and cannot be used as a matrix.
In C++, two-dimensional array array2D[W][H] is stored in memory identically to array[W*H]. Therefore both can be processed by the same routines (when it makes sense). Consider the following example:
void fill_array(double *array, size_t size, double value) {
for (size_t i = 0; i < size; ++i) {
array[i] = value;
}
}
int main(int argc, char *argv[])
{
constexpr size_t W = 10;
constexpr size_t H = 5;
double matrix[W][H];
// using 2D array as 1D to fill all elements with 5.
fill_array(&matrix[0][0], W * H, 5);
for (const auto &row: matrix) {
for (const auto v : row) {
cout << v << '\t';
}
cout << '\n';
}
return 0;
}
In the above example, you can substitute double matrix[W][H]; with vector<double> matrix(W * H); and feed matrix.data() into fill_array(). However, you cannot declare vector(W) of vector(H).
P.S. There are plenty of C++ implementations of math vector and matrix. You can use one of those if you don't want to deal with C-style arrays.

Is it possible to pass a variable length array as a parameter in C++?

I do not know the value of V before. It is found within a file I open in the program. It cannot be defined as such #DEFINE V __. It does not work as a global variable. The input file changes V based on the contents. Expected the parameters to pass and use the djisktra's algorithm found on Geeks for Geeks.
I have tried declaring V globally, but I am given an error saying "variable must have constant value."
void dijkstra(int graph[V][V], int src, int V)
//array function being pasted, error is the V in graph[V]
//V is defined at beginning of main as
int V;
//where V is changed
while(std::getline(file2,newstr))
{
if(newstr.find(check) != std::string::npos)
{
V++;
}
}
//where it is passed in main
for(int i = 0; i < V; i++)
{
size = V;
dijkstra(array[size][size], i, V);
}
Don't use C-style arrays. Use std::vector and friends from the Standard Library where you can ask for the size if you want to know.
Converted:
void dijkstra(const std::vector<std::vector<int>>& graph, int src) {
auto v = graph.size();
// ... Other code.
}
For inserting you can use push_back:
std::vector<std::vector<int>> graph;
while(std::getline(file2,newstr)) {
if(newstr.find(check) != std::string::npos) {
std::vector<int> row;
row.push_back(...);
graph.push_back(row);
}
}
Then pass it in like a regular variable:
dijkstra(graph, src);
If all that vector stuff looks really ugly, typedef it to something more friendly looking.
For c style arrays, you need to know the size at compile time. A variable like int N; is a runtime value. A variable like constexpr int N = 9; is usable at compile time and cannot be mutated.
If you need an array sizeable at runtime, you need some sort of dynamic array. The most common one is std::vector.
void dijkstra(std::vector<int> graph, int src, int V)
std::vector<int> graph;
graph.resize(V * V); // vector are resizable
for(int i = 0; i < V; i++)
{
size = V;
dijkstra(graph, i, V);
}
Is it possible to pass a variable length array as a parameter in C++.
No.
Variable length arrays are not supported in std C++, But read on, they have an alternative that is surprisingly better.
I do not know the value of V before it is found within a file I open
in the program.
A 1d vector is trivial to create, after your code has found V, no compile time constant required.
Early in the startup in one of my programs, the gBoard vector is built using argv[3] and argv[4]. Here is a snippet:
aTermPFN += argv[1]; // ouput tty, /dev/pts/<argv[1]>
fillPatternChoiceLetter = argv[2][0];
aMaxRow = stoi(argv[3]);
aMaxCol = stoi(argv[4]);
userDim = true;
Clearly, the program has already started ... and V size is easily computed from (aMaxRow * aMaxCol).
I find it easy to access a 1d vector (or 1d array), in row major order, as if it is a 2d matrix, with the following function:
// game-board-index: computes index into the single dimension vector
// from 2d (row, col) matrix coordinates
size_t gbIndx(int r, int c) { return static_cast<size_t>((r * maxCol) + c); }
// a 2d game board of cells
// 2d access (row major order) implemented using 1d access
Cell_t* getCell( int r, int c ) { return (gBoard [gbIndx(r,c)]); }
// 1d access is surprisingly convenient for many functions
Cell_t* getCell( uint gbIndex ) { return (gBoard [gbIndex]); }
Sample initialization usage:
// vvvvvvvvvvvvvvvvvvv_-- 2d matrix access
gBoard [ gbIndx((midRow+1), midCol) ] -> setOptionX();
// ^^^^^^--1d row-major order index
A randomized gGoard is trivial in 1d:
void GOLUtil_t::setRandom()
{
CellVec_t myVec(gBoard); // copy cell vector
random_device rd;
mt19937_64 gen(rd());
shuffle (myVec.begin(), myVec.end(), gen); // shuffle order
int count = 1;
for ( auto it : myVec ) // randomly mark half the cells
{
if(count++ & 1)
it->setAlive(); // every odd cell
}
}
Note from https://en.cppreference.com/w/cpp/container/vector:
"The elements are stored contiguously, which means that elements can be accessed not only through iterators, but also using offsets to regular pointers to elements. This means that a pointer to an element of a vector may be passed to any function that expects a pointer to an element of an array."
I was surprised how often the 1d access enabled simpler code.
for (auto it : gBoard)
it->init(); // command each cell to init
Summary:
Despite variable-length-arrays (vla) not being supported in std C++, I believe you will find std::vector a better alternative. And you will find that passing the vector within your code works.

C++ Avoiding Triple Pointers

I am trying to create an array of X pointers referencing matrices of dimensions Y by 16. Is there any way to accomplish this in C++ without the use of triple pointers?
Edit: Adding some context for the problem.
There are a number of geometries on the screen, each with a transform that has been flattened to a 1x16 array. Each snapshot represents the transforms for each of number of components. So the matrix dimensions are 16 by num_components by num_snapshots , where the latter two dimensions are known at run-time. In the end, we have many geometries with motion applied.
I'm creating a function that takes a triple pointer argument, though I cannot use triple pointers in my situation. What other ways can I pass this data (possibly via multiple arguments)? Worst case, I thought about flattening this entire 3D matrix to an array, though it seems like a sloppy thing to do. Any better suggestions?
What I have now:
function(..., double ***snapshot_transforms, ...)
What I want to accomplish:
function (..., <1+ non-triple pointer parameters>, ...)
Below isn't the function I'm creating that takes the triple pointer, but shows what the data is all about.
static double ***snapshot_transforms_function (int num_snapshots, int num_geometries)
{
double component_transform[16];
double ***snapshot_transforms = new double**[num_snapshots];
for (int i = 0; i < num_snapshots; i++)
{
snapshot_transforms[i] = new double*[num_geometries];
for (int j = 0; j < num_geometries; j++)
{
snapshot_transforms[i][j] = new double[16];
// 4x4 transform put into a 1x16 array with dummy values for each component for each snapshot
for (int k = 0; k < 16; k++)
snapshot_transforms[i][j][k] = k;
}
}
return snapshot_transforms;
}
Edit2: I cannot create new classes, nor use C++ features like std, as the exposed function prototype in the header file is getting put into a wrapper (that doesn't know how to interpret triple pointers) for translation to other languages.
Edit3: After everyone's input in the comments, I think going with a flattened array is probably the best solution. I was hoping there would be some way to split this triple pointer and organize this complex data across multiple data pieces neatly using simple data types including single pointers. Though I don't think there is a pretty way of doing this given my caveats here. I appreciate everyone's help =)
It is easier, better, and less error prone to use an std::vector. You are using C++ and not C after all. I replaced all of the C-style array pointers with vectors. The typedef doublecube makes it so that you don't have to type vector<vector<vector<double>>> over and over again. Other than that the code basically stays the same as what you had.
If you don't actually need dummy values I would remove that innermost k loop completely. reserve will reserve the memory space that you need for the real data.
#include <vector>
using std::vector; // so we can just call it "vector"
typedef vector<vector<vector<double>>> doublecube;
static doublecube snapshot_transforms_function (int num_snapshots, int num_geometries)
{
// I deleted component_transform. It was never used
doublecube snapshot_transforms;
snapshot_transforms.reserve(num_snapshots);
for (int i = 0; i < num_snapshots; i++)
{
snapshot_transforms.at(i).reserve(num_geometries);
for (int j = 0; j < num_geometries; j++)
{
snapshot_transforms.at(i).at(j).reserve(16);
// 4x4 transform put into a 1x16 array with dummy values for each component for each snapshot
for (int k = 0; k < 16; k++)
snapshot_transforms.at(i).at(j).at(k) = k;
}
}
return snapshot_transforms;
}
Adding a little bit of object-orientation usually makes the code easier to manage -- for example, here's some code that creates an array of 100 Matrix objects with varying numbers of rows per Matrix. (You could vary the number of columns in each Matrix too if you wanted to, but I left them at 16):
#include <vector>
#include <memory> // for shared_ptr (not strictly necessary, but used in main() to avoid unnecessarily copying of Matrix objects)
/** Represents a (numRows x numCols) 2D matrix of doubles */
class Matrix
{
public:
// constructor
Matrix(int numRows = 0, int numCols = 0)
: _numRows(numRows)
, _numCols(numCols)
{
_values.resize(_numRows*_numCols);
std::fill(_values.begin(), _values.end(), 0.0f);
}
// copy constructor
Matrix(const Matrix & rhs)
: _numRows(rhs._numRows)
, _numCols(rhs._numCols)
{
_values.resize(_numRows*_numCols);
std::fill(_values.begin(), _values.end(), 0.0f);
}
/** Returns the value at (row/col) */
double get(int row, int col) const {return _values[(row*_numCols)+col];}
/** Sets the value at (row/col) to the specified value */
double set(int row, int col, double val) {return _values[(row*_numCols)+col] = val;}
/** Assignment operator */
Matrix & operator = (const Matrix & rhs)
{
_numRows = rhs._numRows;
_numCols = rhs._numCols;
_values = rhs._values;
return *this;
}
private:
int _numRows;
int _numCols;
std::vector<double> _values;
};
int main(int, char **)
{
const int numCols = 16;
std::vector< std::shared_ptr<Matrix> > matrixList;
for (int i=0; i<100; i++) matrixList.push_back(std::make_shared<Matrix>(i, numCols));
return 0;
}

Sorting an array of structs in C++

I'm using a particle physics library written in c++ for a game.
In order to draw the particles I must get an array of all their positions like so..
b2Vec2* particlePositionBuffer = world->GetParticlePositionBuffer();
This returns an array of b2Vec2 objects (which represent 2 dimensional vectors in the physics engine).
Also I can get and set their colour using
b2ParticleColor* particleColourBuffer = world->GetParticleColorBuffer();
I would like to get the 10% of the particles with the highest Y values (and then change their colour)
My idea is..
1. Make an array of structs the same size as the particlePositionBuffer array, the struct just contains an int (the particles index in the particlePositionBuffer array) and a float (the particles y position)
2.Then I sort the array by the y position.
3.Then I use the int in the struct from the top 10% of structs in my struct array to do stuff to their colour in the particleColourBuffer array.
Could someone show me how to sort and array of structs like that in c++ ?
Also do you think this is a decent way of going about this? I only need to do it once (not every frame)
Following may help:
// Functor to compare indices according to Y value.
struct comp
{
explicit comp(b2Vec2* particlePositionBuffer) :
particlePositionBuffer(particlePositionBuffer)
{}
operator (int lhs, int rhs) const
{
// How do you get Y coord ?
// note that I do rhs < lhs to have higher value first.
return particlePositionBuffer[rhs].getY() < particlePositionBuffer[lhs].getY();
}
b2Vec2* particlePositionBuffer;
};
void foo()
{
const std::size_t size = world->GetParticleCount(); // How do you get Count ?
const std::size_t subsize = size / 10; // check for not zero ?
std::vector<std::size_t> indices(size);
for (std::size_t i = 0; i != size; ++i) {
indices[i] = i;
}
std::nth_element(indices.begin(), indices.begin() + subsize, indices.end(),
comp(world->GetParticlePositionBuffer()));
b2ParticleColor* particleColourBuffer = world->GetParticleColorBuffer();
for (std::size_t i = 0; i != subsize; ++i) {
changeColor(particleColourBuffer[i])
}
}
If your particle count is low, it won't matter much either way, and sorting them all first with a simple stl sort routine would be fine.
If the number were large though, I'd create a binary search tree whose maximum size was 10% of the number of your particles. Then I'd maintain the minY actually stored in the tree for quick rejection purposes. Then this algorithm should do it:
Walk through your original array and add items to the tree until it is full (10%)
Update your minY
For remaining items in original array
If item.y is less than minY, go to next item (quick rejection)
Otherwise
Remove the currently smallest Y value from the tree
Add the larger Y item to the tree
Update MinY
A binary search tree has a nice advantage of quick insert, quick search, and maintained ordering. If you want to be FAST, this is better than a complete sort on the entire array.

How to use a 2d vector of pointers

What is the correct way to implement an efficient 2d vector? I need to store a set of Item objects in a 2d collection, that is fast to iterate (most important) and also fast to find elements.
I have a 2d vector of pointers declared as follows:
std::vector<std::vector<Item*>> * items;
In the constructor, I instantiate it as follows:
items = new std::vector<std::vector<Item*>>();
items->resize(10, std::vector<Item*>(10, new Item()));
I how do I (correctly) implement methods for accessing items? Eg:
items[3][4] = new Item();
AddItem(Item *& item, int x, int y)
{
items[x][y] = item;
}
My reasoning for using pointers is for better performance, so that I can pass things around by reference.
If there is a better way to go about this, please explain, however I would still be interested in how to correctly use the vector.
Edit: For clarification, this is part of a class that is for inventory management in a simple game. The set 10x10 vector represents the inventory grid which is a set size. The Item class contains the item type, a pointer to an image in the resource manager, stack size etc.
My pointer usage was in an attempt to improve performance, since this class is iterated and used to render the whole inventory every frame, using the image pointer.
It seems that you know the size of the matrix beforehand, and that this matrix is squared. Though vector<> is fine, you can also use native vectors in that case.
Item **m = new Item*[ n * n ];
If you want to access position r,c, then you only have to multiply r by n, and then add c:
pos = ( r * n ) + c;
So, if you want to access position 1, 2, and n = 5, then:
pos = ( 1 * 5 ) + 2;
Item * it = m[ pos ];
Also, instead of using plain pointers, you can use smart pointers, such as auto_ptr (obsolete) and unique_ptr, which are more or less similar: once they are destroyed, they destroy the object they are pointing to.
auto_ptr<Item> m = new auto_ptr<Item>[ n * n ];
The only drawback is that now you need to call get() in order to obtain the pointer.
pos = ( 1 * 5 ) + 2;
Item * it = m[ pos ].get();
Here you have a class that summarizes all of this:
class ItemsSquaredMatrix {
public:
ItemsSquaredMatrix(unsigned int i): size( i )
{ m = new std::auto_ptr<Item>[ size * size ]; }
~ItemsSquaredMatrix()
{ delete[] m; }
Item * get(unsigned int row, unsigned int col)
{ return m[ translate( row, col ) ].get(); }
const Item * get(unsigned int row, unsigned int col) const
{ return m[ translate( row, col ) ].get(); }
void set(unsigned int row, unsigned int col, Item * it)
{ m[ translate( row, col ) ].reset( it ); }
unsigned int translate(unsigned int row, unsigned int col) const
{ return ( ( row * size ) + col ); }
private:
unsigned int size;
std::auto_ptr<Item> * m;
};
Now you only have to create the class Item. But if you created a specific class, then you'd have to duplicate ItemsSquaredMatrix for each new piece of data. In C++ there is a specific solution for this, involving the transformation of the class above in a template (hint: vector<> is a template). Since you are a beginner, it will be simpler to have Item as an abstract class:
class Item {
public:
// more things...
virtual std::string toString() const = 0;
};
And derive all the data classes you will create from them. Remember to do a cast, though...
As you can see, there are a lot of open questions, and more questions will raise as you keep unveliling things. Enjoy!
Hope this helps.
For numerical work, you want to store your data as locally as possible in memory. For example, if you were making an n by m matrix, you might be tempted to define it as
vector<vector<double>> mat(n, vector<double>(m));
There are severe disadvantages to this approach. Firstly, it will not work with any proper matrix libraries, such as BLAS and LAPACK, which expect the data to be contiguous in memory. Secondly, even if it did, it would lead to lots of random access and pointer indirection in memory, which would kill the performance of your matrix operations. Instead, you need a contiguous block of memory n*m items in size.
vector<double> mat(n*m);
But you wouldn't really want to use a vector for this, as you would then need to translate from 1d to 2d indices manually. There are some libraries that do this for you in C++. One of them is Blitz++, but that seems to not be much developed now. Other alternatives are Armadillo and Eigen. See this previous answer for more details.
Using Eigen, for example, the matrix declaration would look like this:
MatrixXd mat(n,m);
and you would be able to access elements as mat[i][j], and multiply matrices as mat1*mat2, and so on.
The first question is why the pointers. There's almost never any reason
to have a pointer to an std::vector, and it's not that often that
you'd have a vector of pointers. You're definition should probably be:
std::vector<std::vector<Item> > items;
, or at the very least (supposing that e.g. Item is the base of a
polymorphic hierarchy):
std::vector<std::vector<Item*> > items;
As for your problem, the best solution is to wrap your data in some sort
of a Vector2D class, which contains an std::vector<Item> as member,
and does the index calculations to access the desired element:
class Vector2D
{
int my_rows;
int my_columns;
std::vector<Item> my_data;
public:
Vector2D( int rows, int columns )
: my_rows( rows )
, my_columns( columns )
{
}
Item& get( int row, int column )
{
assert( row >= 0 && row < my_rows
&& column >= 0 && column < my_columns );
return my_data[row * my_columns + column];
}
class RowProxy
{
Vector2D* my_owner;
int my_row;
public;
RowProxy(Vector2D& owner, int row)
: my_owner( &owner )
, my_row( row )
{
}
Item& operator[]( int column ) const
{
return my_owner->get( my_row, column );
}
};
RowProxy operator[]( int row )
{
return RowProxy( this, row );
}
// OR...
Item& operator()( int row, int column )
{
return get( row, column );
}
};
If you forgo bounds checking (but I wouldn't recommend it), the
RowProxy can be a simple Item*.
And of course, you should duplicate the above for const.