Eigen MatrixXd push back in c++ - c++

Eigen is a well known matrix Library in c++. I am having trouble finding an in built function to simply push an item on to the end of a matrix. Currently I know that it can be done like this:
Eigen::MatrixXd matrix(10, 3);
long int count = 0;
long int topCount = 10;
for (int i = 0; i < listLength; ++i) {
matrix(count, 0) = list.x;
matrix(count, 1) = list.y;
matrix(count, 2) = list.z;
count++;
if (count == topCount) {
topCount *= 2;
matrix.conservativeResize(topCount, 3);
}
}
matrix.conservativeResize(count, 3);
And this will work (some of the syntax may be out). But its pretty convoluted for a simple thing to do. Is there already an in built function?

There is no such function for Eigen matrices. The reason for this is such a function would either be very slow or use excessive memory.
For a push_back function to not be prohibitively expensive it must increase the matrix's capacity by some factor when it runs out of space as you have done. However when dealing with matrices, memory usage is often a concern so having a matrix's capacity be larger than necessary could be problematic.
If it instead increased the size by rows() or cols() each time the operation would be O(n*m). Doing this to fill an entire matrix would be O(n*n*m*m) which for even moderately sized matrices would be quite slow.
Additionally, in linear algebra matrix and vector sizes are nearly always constant and known beforehand. Often when resizeing a matrix you don't care about the previous values in the matrix. This is why Eigen's resize function does not retain old values, unlike std::vector's resize.
The only case I can think of where you wouldn't know the matrix's size beforehand is when reading from a file. In this case I would either load the data first into a standard container such as std::vector using push_back and then copy it into an already sized matrix, or if memory is tight run through the file once to get the size and then a second time to copy the values.

There is no such function, however, you can build something like this yourself:
using Eigen::MatrixXd;
using Eigen::Vector3d;
template <typename DynamicEigenMatrix>
void push_back(DynamicEigenMatrix& m, Vector3d&& values, std::size_t row)
{
if(row >= m.rows()) {
m.conservativeResize(row + 1, Eigen::NoChange);
}
m.row(row) = values;
}
int main()
{
MatrixXd matrix(10, 3);
for (std::size_t i = 0; i < 10; ++i) {
push_back(matrix, Vector3d(1,2,3), i);
}
std::cout << matrix << "\n";
return 0;
}
If this needs to perform too many resizes though, it's going to be horrendously slow.

Related

Vector dot product in Microsoft SEAL with CKKS

I am currently trying to implement matrix multiplication methods using the Microsoft SEAL library. I have created a vector<vector<double>> as input matrix and encoded it with CKKSEncoder. However the encoder packs an entire vector into a single Plaintext so I just have a vector<Plaintext> which makes me lose the 2D structure (and then of course I'll have a vector<Ciphertext> after encryption). Having a 1D vector allows me to access only the rows entirely but not the columns.
I managed to transpose the matrices before encoding. This allowed me to multiply component-wise the rows of the first matrix and columns (rows in transposed form) of the second matrix but I am unable to sum the elements of the resulting vector together since it's packed into a single Ciphertext. I just need to figure out how to make the vector dot product work in SEAL to perform matrix multiplication. Am I missing something or is my method wrong?
It has been suggested by KyoohyungHan in the issue: https://github.com/microsoft/SEAL/issues/138 that it is possible to solve the problem with rotations by rotating the output vector and summing it up repeatedly.
For example:
// my_output_vector is the Ciphertext output
vector<Ciphertext> rotations_output(my_output_vector.size());
for(int steps = 0; steps < my_output_vector.size(); steps++)
{
evaluator.rotate_vector(my_output_vector, steps, galois_keys, rotations_output[steps]);
}
Ciphertext sum_output;
evaluator.add_many(rotations_output, sum_output);
vector of vectors is not the same as an array of arrays (2D, matrix).
While one-dimentional vector<double>.data() points to contiguous memory space (e.g., you can do memcpy on that), each of "subvectors" allocates own, separate memory buffer. Therefore vector<vector<double>>.data() makes no sense and cannot be used as a matrix.
In C++, two-dimensional array array2D[W][H] is stored in memory identically to array[W*H]. Therefore both can be processed by the same routines (when it makes sense). Consider the following example:
void fill_array(double *array, size_t size, double value) {
for (size_t i = 0; i < size; ++i) {
array[i] = value;
}
}
int main(int argc, char *argv[])
{
constexpr size_t W = 10;
constexpr size_t H = 5;
double matrix[W][H];
// using 2D array as 1D to fill all elements with 5.
fill_array(&matrix[0][0], W * H, 5);
for (const auto &row: matrix) {
for (const auto v : row) {
cout << v << '\t';
}
cout << '\n';
}
return 0;
}
In the above example, you can substitute double matrix[W][H]; with vector<double> matrix(W * H); and feed matrix.data() into fill_array(). However, you cannot declare vector(W) of vector(H).
P.S. There are plenty of C++ implementations of math vector and matrix. You can use one of those if you don't want to deal with C-style arrays.

How to efficiently initialize a SparseVector in Eigen

In the Eigen docs for filling a sparse matrix it is recommended to use the triplet filling method as it can be much more efficient than making calls to coeffRef, which involves a binary search.
For filling SparseVectors however, there is no clear recommendation on how to do it efficiently.
The suggested method in this SO answer uses coeffRef which means that a binary search is performed for every insertion.
Is there a recommended, efficient way to build sparse vectors? Should I try to create a single row SparseMatrix and then store that as a SparseVector?
My use case is reading in LibSVM files, in which there can be millions of very sparse features and billions of data points. I'm currently representing these as an std::vector<Eigen::SparseVector>. Perhaps I should just use SparseMatrix instead?
Edit: One thing I've tried is this:
// for every data point in a batch do the following:
Eigen::SparseMatrix<float> features(1, num_features);
// copy the data over
typedef Eigen::Triplet<float> T;
std::vector<T> tripletList;
for (int j = 0; j < num_batch_instances; ++j) {
for (size_t i = batch.offset[j]; i < batch.offset[j + 1]; ++i) {
uint32_t index = batch.index[i];
float fvalue = batch.value;
if (index < num_features) {
tripletList.emplace_back(T(0, index, fvalue));
}
}
features.setFromTriplets(tripletList.begin(), tripletList.end());
samples->emplace_back(Eigen::SparseVector<float>(features));
}
This creates a SparseMatrix using the triplet list approach, then creates a SparseVector from that object. In my experiments with ~1.4M features and very high sparsity this is 2 orders of magnitude slower than using SparseVector and coeffRef, which I definitely did not expect.

Why is bool [][] more efficient than vector<vector<bool>>

I am trying to solve a DP problem where I create a 2D array, and fill the 2D array all the way. My function is called multiple times with different test cases. When I use a vector>, I get a time limit exceeded error (takes more than 2 sec for all the test cases). However, when I use bool [][], takes much less time (about 0.33 sec), and I get a pass.
Can someone please help me understand why would vector> be any less efficient than bool [][].
bool findSubsetSum(const vector<uint32_t> &input)
{
uint32_t sum = 0;
for (uint32_t i : input)
sum += i;
if ((sum % 2) == 1)
return false;
sum /= 2;
#if 1
bool subsum[input.size()+1][sum+1];
uint32_t m = input.size()+1;
uint32_t n = sum+1;
for (uint32_t i = 0; i < m; ++i)
subsum[i][0] = true;
for (uint32_t j = 1; j < n; ++j)
subsum[0][j] = false;
for (uint32_t i = 1; i < m; ++i) {
for (uint32_t j = 1; j < n; ++j) {
if (j < input[i-1])
subsum[i][j] = subsum[i-1][j];
else
subsum[i][j] = subsum[i-1][j] || subsum[i-1][j - input[i-1]];
}
}
return subsum[m-1][n-1];
#else
vector<vector<bool>> subsum(input.size()+1, vector<bool>(sum+1));
for (uint32_t i = 0; i < subsum.size(); ++i)
subsum[i][0] = true;
for (uint32_t j = 1; j < subsum[0].size(); ++j)
subsum[0][j] = false;
for (uint32_t i = 1; i < subsum.size(); ++i) {
for (uint32_t j = 1; j < subsum[0].size(); ++j) {
if (j < input[i-1])
subsum[i][j] = subsum[i-1][j];
else
subsum[i][j] = subsum[i-1][j] || subsum[i-1][j - input[i-1]];
}
}
return subsum.back().back();
#endif
}
Thank you,
Ahmed.
If you need a matrix and you need to do high performance stuff, it is not always the best solution to use a nested std::vector or std::array because these are not contiguous in memory. Non contiguous memory access results in higher cache misses.
See more :
std::vector and contiguous memory of multidimensional arrays
Is the data in nested std::arrays guaranteed to be contiguous?
On the other hand bool twoDAr[M][N] is guarenteed to be contiguous. It ensures less cache misses.
See more :
C / C++ MultiDimensional Array Internals
And to know about cache friendly codes:
What is “cache-friendly” code?
Can someone please help me understand why would vector> be any less
efficient than bool [][].
A two-dimensional bool array is really just a big one-dimensional bool array of size M * N, with no gaps between the items.
A two-dimensional std::vector doesn't exist; is not just one big one-dimensional std::vector but a std::vector of std::vectors. The outer vector itself has no memory gaps, but there may well be gaps between the content areas of the individual vectors. It depends on how your compiler implements the very special std::vector<bool> class, but if your element count is sufficiently big, then dynamic allocation is unavoidable to prevent a stack overflow, and that alone implies pointers to separated memory areas.
And once you need to access data from separated memory areas, things become slower.
Here is a possible solution:
Try to use a std::vector<bool> of size (input.size() + 1) * (sum + 1).
If that fails to make things faster, avoid the template specialisation by using a std::vector<char> of size (input.size() + 1) * (sum + 1), and cast the elements to and from bool as required.
In cases where you know the input size of the array from the beginning, using arrays will always be faster than Vector or equal. Because vector is a wrapper around array, therefore a higher-level implementation. It's benefit is allocating extra space for you when needed, which you don't need if you have a fixed size of elements.
If you had problem where you needed 1D arrays, the difference may not have bothered you(where you have a single vector, and a single array). But creating a 2D array, you also create many instances of the vector class, so the time difference between array and vector is multiplied by the number of elements you have in your container, making your code slow.
This time difference has many causes behind it, but the most obvious one is of course calling the vector constructor. You are calling a function subsum.size() times. The memory issue mentioned by other answers is another cause.
For performance, it is advised to use array's whenever you can in your code. Even if you need to use a vector, you should try to minimize the number of re-size's done by the vector(reserving, pre-allocating), achieving a closer implementation to array.

How can I optimize this function which handles large c++ vectors?

According to Visual Studio's performance analyzer, the following function is consuming what seems to me to be an abnormally large amount of processor power, seeing as all it does is add between 1 and 3 numbers from several vectors and store the result in one of those vectors.
//Relevant class members:
//vector<double> cache (~80,000);
//int inputSize;
//Notes:
//RealFFT::real is a typedef for POD double.
//RealFFT::RealSet is a wrapper class for a c-style array of RealFFT::real.
//This is because of the FFT library I'm using (FFTW).
//It's bracket operator is overloaded to return a const reference to the appropriate array element
vector<RealFFT::real> Convolver::store(vector<RealFFT::RealSet>& data)
{
int cr = inputSize; //'cache' read position
int cw = 0; //'cache' write position
int di = 0; //index within 'data' vector (ex. data[di])
int bi = 0; //index within 'data' element (ex. data[di][bi])
int blockSize = irBlockSize();
int dataSize = data.size();
int cacheSize = cache.size();
//Basically, this takes the existing values in 'cache', sums them with the
//values in 'data' at the appropriate positions, and stores them back in
//the cache at a new position.
while (cw < cacheSize)
{
int n = 0;
if (di < dataSize)
n = data[di][bi];
if (di > 0 && bi < inputSize)
n += data[di - 1][blockSize + bi];
if (++bi == blockSize)
{
di++;
bi = 0;
}
if (cr < cacheSize)
n += cache[cr++];
cache[cw++] = n;
}
//Take the first 'inputSize' number of values and return them to a new vector.
return Common::vecTake<RealFFT::real>(inputSize, cache, 0);
}
Granted, the vectors in question have sizes of around 80,000 items, but by comparison, a function which multiplies similar vectors of complex numbers (complex multiplication requires 4 real multiplications and 2 additions each) consumes about 1/3 the processor power.
Perhaps it has something to with the fact it has to jump around within the vectors rather then just accessing them linearly? I really have no idea though. Any thoughts on how this could be optimized?
Edit: I should mention I also tried writing the function to access each vector linearly, but this requires more total iterations and actually the performance was worse that way.
Turn on compiler optimization as appropriate. A guide for MSVC is here:
http://msdn.microsoft.com/en-us/library/k1ack8f1.aspx

Total Energy of a signal

I have been given this equation to calculate the total energy of a signal:
Ex= ∑ n|x[n]|2
Which to me suggests that you square each of the blocks up, then get the sum of the whole entire block. I am wondering if the code/Algorithm I have written is accurate for this equation and I have done it the most efficient way.
double totalEnergy(vector<double> data, const int rows, const int cols)
{
vector<double> temp;
double energy = 0;
for(int i=0; (i < 2); i++)
{
for(int j=0; (j < 2); j++)
{
temp.push_back( (data[i*2+j]*data[i*2+j]) );
}
}
energy = accumulate (temp.begin(), temp.begin()+(rows*cols), 0);
return energy;
}
int main(int argc, char *argv[]) {
vector<double> data;
data.push_back(4);
data.push_back(4);
data.push_back(4);
data.push_back(4);
totalEnergy(data, 2, 2);
}
Result: 64
Any help / advice would be greatly appreciated :)!
This is certainly not the most efficient way to do this computation although I think the implementation is nearly correct: the multiplication by n somehow got lost, though. Since I can't see what the sum index and bounds are I'm not going to fix this but I'll reproduce the results of the implementation, just "better". There are two obvious points which can be improved:
The code doesn't make any use of being in a particular column or row. That is, it could as well just consider the input as a flat array whose size is actually known from the size of the input vector.
The function uses two temporary vectors (the one passed to the function and one inside the function). Creating a std::vector<T> needs to allocate memory which isn't a cheap operation.
As a first approximation, I would transform the input vector in-place and then accumulate the result:
double square(double value) {
return value * value;
}
double totalEnergy(std::vector<double> data) {
std::transform(data.begin(), data.end(), data.begin(), &square);
return std::accumulate (data.begin(), data.end(), 0);
}
The function still make a copy of the data and modifies it. I don't like this. Oddly enough the operation you implemented is basically an inner product of a vector with itself, i.e., this yields the same result without creating an extra vector either:
double totalEnergy(std::vector<double> const& data) {
return std::inner_product(data.begin(), data.end(), data.begin(), 0);
}
Assuming this implements the correct formula (although I'm still suspicious of the n in the original formula), this is probably considerable faster. It seem to be more concise, too...