I want to compute K*es where K is an Eigen matrix (dimension pxp) and es is a px1 random binary vector with 1s.
For example if p=5 and t=2 a possible es is [1,0,1,0,0]' or [0,0,1,1,0]' and so on...
How do I easily generate es with Eigen?
I came up with even a better solution, which is a combination of std::vector, Egien::Map and std::shuffle.
std::vector<int> esv(p,0);
std::fill_n(esv.begin(),t,1);
Eigen::Map<Eigen::VectorXi> es (esv.data(), esv.size());
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(std::begin(esv), std::end(esv), g);
This solution is memory efficient (since Eigen::Map doesn't copy esv) and has the big advantage that if we want to permute es several times (like in this case), then we just need to repeat std::shuffle(std::begin(esv), std::end(esv), g);
Maybe I'm wrong, but this solution seems more elegant and efficient than the previous ones.
So you're using Eigen. I'm not sure what matrix type you're using, but I'll go off the class Eigen::MatrixXd.
What you need to do is:
Create a 1xp matrix that's all 0
Choose random spots to flip from 0 to 1 that are between 0-p, and make sure that spot is unique.
The following code should do the trick, although you could implement it other ways.
//Your p and t
int p = 5;
int t = 2;
//px1 matrix
MatrixXd es(1, p);
//Initialize the whole 1xp matrix
for (int i = 0; i < p; ++i)
es(1, i) = 0;
//Get a random position in the 1xp matrix from 0-p
for (int i = 0; i < t; ++i)
{
int randPos = rand() % p;
//If the position was already a 1 and not a 0, get a different random position
while (es(1, randPos) == 1)
randPos = rand() % p;
//Change the random position from a 0 to a 1
es(1, randPos) = 1;
}
When t is close to p, Ryan's method need to generate much more than t random numbers. To avoid this performance degrade, you could solve your original problem
find t different numbers from [0, p) that are uniformly distributed
by the following steps
generate t uniformly distributed random numbers idx[t] from [0, p-t+1)
sort these numbers idx[t]
idx[i]+i, i=0,...,t-1 are the result
The code:
VectorXi idx(t);
VectorXd es(p);
es.setConstant(0);
for(int i = 0; i < t; ++i) {
idx(i) = int(double(rand()) / RAND_MAX * (p-t+1));
}
std::sort(idx.data(), idx.data() + idx.size());
for(int i = 0; i < t; ++i) {
es(idx(i)+i) = 1.0;
}
Related
I'm trying to compute DFT and its inversion, for now by simplest method possible, but it keeps not working. And what's worse, I'm not sure of it. Here is my code:
(realnum is double, freq_func and time_func are vectors of complex)
freq_func toFreq(const time_func & waveform)
{
freq_func res;
res.resize(waveform.size());
const realnum N = spectrum.size();
for (size_t k = 0; k < waveform.size(); k++)
for (size_t n = 0; n < waveform.size(); n++)
res[k] += waveform[n] * exp(complex(0, -2*PI*n*k/N));
return res;
}
time_func toTime(const freq_func & spectrum)
{
freq_func res;
res.resize(spectrum.size());
const realnum N = spectrum.size();
for (size_t n = 0; n < spectrum.size(); n++)
{
for (size_t k = 0; k < spectrum.size(); k++)
res[n] += spectrum[k] * exp(complex(0, 2*PI*n*k/N));
res[n] /= N;
}
return res;
}
Why does it never hold a = toTime(toFreq(a)) nor a = toFreq(toTime(a))? Why does toTime return results with considerable imaginary parts? Or should it? Some of online calculators do. And why does Wikipedia claim, that dividing by N can be moved to toFreq, or even substituted by dividing both by 1/sqrt(N), shouldn't there be only one possible definition?
The expression complex(0, 2*PI*n*k/N) creates and initializes a complex number with real part set to 0 and imaginary part set to 2*PI*n*k/N. To implement the DFT, you'd really want to be using a complex number whose magnitude is 1, and phase is 2*PI*n*k/N. You can do this with:
complex(polar(1,2*PI*n*k/N))
for the forward transform, and
complex(polar(1,-2*PI*n*k/N))
for the inverse transform.
As far as the Wikipedia claim is concerned, it is simply a question of definition of the DFT. Different implementations can choose different definitions and hence scaling by different factors. Normalized DFTs will choose the forward and inverse transform such that a round trip produced the original sequence (e.g. x == toTime(toFreq(x))). Other non-normalized DFTs may choose a different scaling (e.g. to save a few scaling operations when the scale is not important to the application at hand).
I write a simulation with Eigen and now I need to set a list of rows of my ColumnMajor SparseMatrix like this:
In row n:
for column elements m:
if m == n set value to one
else set value to zero
There is always the element with column index = row index inside the sparse matrix. I tried to use the InnerIterator but it did not work well since I have a ColumnMajor matrix. The prune method that was suggested in https://stackoverflow.com/a/21006998/3787689 worked but i just need to set the non-diagonal elements to zero temporarily and prune seems to actually delete them which slows a different part of the program down.
How should I proceed in this case?
Thanks in advance!
EDIT: I forgot to make clear: the sparse matrix is already filled with values.
Use triplets for effective insertion:
const int N = 5;
const int M = 10;
Eigen::SparseMatrix<double> myMatrix(N,M); // N by M matrix with no coefficient, hence this is the null matrix
std::vector<Eigen::Triplet<double>> triplets;
for (int i=0; i<N; ++i) {
triplets.push_back({i,i,1.});
}
myMatrix.setFromTriplets(triplets.begin(), triplets.end());
I solved it like this: Since I want to stick to a ColumnMajor matrix I do a local RowMajor version and use the InnerIterator to assign the values to the specific rows. After that I overwrite my matrix with the result.
Eigen::SparseMatrix<float, Eigen::RowMajor> rowMatrix;
rowMatrix = colMatrix;
for (uint i = 0; i < rowTable.size(); i++) {
int rowIndex = rowTable(i);
for (Eigen::SparseMatrix<float, Eigen::RowMajor>::InnerIterator
it(rowMatrix, rowIndex); it; ++it) {
if (it.row() == it.col())
it.valueRef() = 1.0f;
else
it.valueRef() = 0.0f;
}
}
colMatrix = rowMatrix;
For beginners, the simplest way set to zero a row/column/block is just to multiply it by 0.0.
So to patch an entire row in the way you desire it is enough to do:
A.row(n) *= 0; //Set entire row to 0
A.coeffRef(n,n) = 1; //Set diagonal to 1
This way you don't need to change your code depending of RowMajor/ColMajor orders. Eigen will do all the work in a quick way.
Also, if you are really interested in freeing memory after setting the row to 0, just add a A.prune(0,0) after you have finished editing all the rows in your matrix.
I'm working on an OpenCL project to generate very large hermitian (symmetric) matrices, and I am trying to determine the best way to generate the work IDs.
A hermitian matrix is symmetric along the diagonal, so that M(i,j) = M*(j,i).
In the brute force way, the for loop looks like:
for(int i = 0; i < N; i++)
{
for(int j = 0; j < N; j++)
{
complex<float> result = doSomeCalculation();
M(i,j) = result;
}
}
However, taking advantage of the hermitian property, the loop can be made to be twice as efficient by only calculating the upper triangular part of the matrix and duplicating the result in the lower triangular part:
for(int i = 0; i < N; i++)
{
for(int j = i; j < N; j++)
{
complex<float> result = doSomeCalculation();
M(i,j) = result;
M(j,i) = conj(result);
}
}
In both loops, doSomeCalculation() is an expensive operation, and each entry in the matrix is completely uncorrelated from every other entry (i.e. the problem is stupidly parallel).
My question is this:
How can I implement the second loop with doSomeCalculation as an OpenCL kernel so that the thread IDs are most efficiently used (i.e. so that the thread calculates both M(i,j) and M(j,i) without having to call doSomeCalculation() twice)?
You need to use a linear index, for example you can index every element of your matrix in this way:
0 1 2 ... N-1
* N-2 ... 2N-2
....
* * 2N-1 ... N(N+1)/2 -1
That is, the index K is given by:
k=iN-i*(i+1)/2+j
Where N is the size of the matrix and (i,j) are respectively the 0-based indices of the row and the column.
This relationship can be inverted; see the answer of this question, which I report here for completeness:
i = floor( ( 2*N+1 - sqrt( (2N+1)*(2N+1) - 8*k ) ) / 2 ) ;
j = k - N*i + i*(i+1)/2 ;
So you need to enqueue a 1D kernel with N(N+1)/2 work items, and you can decide by yourself the size of the workgroup (usually 64 items per work group is a good choice).
Then in the OpenCL code you can retrieve the index K by using:
int k = get_group_id(0)*64 + get_local_id(0);
And then use the two relationships above the index of the matrix element you need to compute.
Moreover, notice that you can also save space by representing your hermitian matrix as a linear vector with N(N+1)/2 elements.
If your matrices are really big, than you can dice up your NxN matrix into (N/k)x(N/k) tiles, each of size kxk. As soon as you need only a half of the data, you create 1D NDRange of size local_group_size * (N/k)x(N/k)/2 roughly.
Every tile of matrix is processed by one LocalGroup (size of LocalGroup is of your choice). The idea is that you create an array on Host side, which contain position of every WorkGroup in matrix. Kernel stub should look like follows:
void __kernel myKernel(
__global int* coords,
....)
{
int2 WorkGroupPositionInMatrix = vload2(get_group_id(0), coords);
...
DoCalculation();
...
WriteResultTwice();
...
return;
}
What you need to do by hand - is to cope with thouse WorkGroups, which will be placed on the matrix diagonal. If matrix size is big, than overhead for LocalGroups, placed on diagonal is negligible.
A right triangle can be cut in half vertically and the smaller portion rotated to fit with the larger portion to form a rectangle of equal area. Therefore it is easy to make your triangular global work area into one that is rectangular, which fits OpenCL.
See my answer here: OpenCL efficient way to group a lower triangular matrix
In C++ I need to calculate the determinant of a 6x6 matrix really fast.
This is how I would do this for a 2x2 matrix:
double det2(double A[2][2]) {
return A[0][0]*A[1][1] - A[0][1]*A[1][0];
}
I want a similar function for the determinant of a 6x6 matrix but I do not want to write it by hand since it contains 6! = 720 terms where each term is the product of 6 elements in the matrix.
Therefore I want to use Leibniz formula:
static int perms6[720][6];
static int signs6[720];
double det6(double A[6][6]) {
double sum = 0.0;
for(int i = 0; i < 720; i++) {
int j0 = perms6[i][0];
int j1 = perms6[i][1];
int j2 = perms6[i][2];
int j3 = perms6[i][3];
int j4 = perms6[i][4];
int j5 = perms6[i][5];
sum += signs6[i]*A[0]*A[j0]*A[1]*A[j1]*A[2]*A[j2]*A[3]*A[j3]*A[4]*A[j4]*A[5]*A[j5];
}
return sum;
}
How do I find the permutations and the signs?
Is there some way I could get the compiler to do more of the work (e.g. C macros or template metaprogramming) so that the function would be even faster?
EDIT:
I just timed the following code (Eigen):
Matrix<double,6,6> A;
// ... fill A
for(long i = 0; i < 1e6; i++) {
PartialPivLU< Matrix<double,6,6> > LU(A);
double d = LU.determinant();
}
to 1.25 s. So using LU or Gauss decomposition is definitely fast enough for my use!
Use Gauss method to make the matrix upper-triangular. For every operation you know how determinant is changed (not changed of multiplied by constant d) and it works in O(n^3). After that just multiply numbers on main diagonal and delete to product of all d's
Use Eigen, An example can be found here.
I am doing this assignment for fun.
http://groups.csail.mit.edu/graphics/classes/6.837/F04/assignments/assignment0/
There are sample outputs at site if you want to see how it is supposed to look. It involves iterated function systems, whose algorithm according the the assignment is:
for "lots" of random points (x0, y0)
for k=0 to num_iters
pick a random transform fi
(xk+1, yk+1) = fi(xk, yk)
display a dot at (xk, yk)
I am running into trouble with my implementation, which is:
void IFS::render(Image& img, int numPoints, int numIterations){
Vec3f color(0,1,0);
float x,y;
float u,v;
Vec2f myVector;
for(int i = 0; i < numPoints; i++){
x = (float)(rand()%img.Width())/img.Width();
y = (float)(rand()%img.Height())/img.Height();
myVector.Set(x,y);
for(int j = 0; j < numIterations;j++){
float randomPercent = (float)(rand()%100)/100;
for(int k = 0; k < num_transforms; k++){
if(randomPercent < range[k]){
matrices[k].Transform(myVector);
}
}
}
u = myVector.x()*img.Width();
v = myVector.y()*img.Height();
img.SetPixel(u,v,color);
}
}
This is how my pick a random transform from the input matrices:
fscanf(input,"%d",&num_transforms);
matrices = new Matrix[num_transforms];
probablility = new float[num_transforms];
range = new float[num_transforms+1];
for (int i = 0; i < num_transforms; i++) {
fscanf (input,"%f",&probablility[i]);
matrices[i].Read3x3(input);
if(i == 0) range[i] = probablility[i];
else range[i] = probablility[i] + range[i-1];
}
My output shows only the beginnings of a Sierpinski triangle (1000 points, 1000 iterations):
My dragon is better, but still needs some work (1000 points, 1000 iterations):
If you have RAND_MAX=4 and picture width 3, an evenly distributed sequence like [0,1,2,3,4] from rand() will be mapped to [0,1,2,0,1] by your modulo code, i.e. some numbers will occur more often. You need to cut off those numbers that are above the highest multiple of the target range that is below RAND_MAX, i.e. above ((RAND_MAX / 3) * 3). Just check for this limit and call rand() again.
Since you have to fix that error in several places, consider writing a utility function. Then, reduce the scope of your variables. The u,v declaration makes it hard to see that these two are just used in three lines of code. Declare them as "unsigned const u = ..." to make this clear and additionally get the compiler to check that you don't accidentally modify them afterwards.