I am trying to compute the cumulative distribution function for a set of values.
I computed the histogram using gsl and I tried to computed the CDF from here, but it seems like the values are shifted by one position.
This is the code I am using:
gHist = gsl_histogram_alloc((maxRange - minRange) / 5);
gsl_histogram_set_ranges_uniform(gHist, minRange, maxRange);
for (int j = 0; j < ValidDataCount; j++)
gsl_histogram_increment (gHist, ValAdd[j]);
gsl_histogram_pdf * p = gsl_histogram_pdf_alloc(gsl_histogram_bins(gHist));
gsl_histogram_pdf_init (p, gHist);
for (int j = 0; j < gsl_histogram_bins(gHist) + 1 ; j++)
printf ("%f ", p->sum[j]);
The histogram is like this:
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 .... goes on like this. there is a total of 20 values
And the cdf is:
0.00 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.1 0.1 ...
Why is there a 0 on the first position? Shouldn't it start with 0.05?
Thank you.
GSL alloc sum to be an array of size n+1, where n is the number of bins. However, only n entries are necessary to calculate the pdf. This extra allocation of one element happens because gsl defines sum[0] = 0.
in the GSL source coode "pdf.c" you can see that
gsl_histogram_pdf *gsl_histogram_pdf_alloc (const size_t n)
{
(...)
p->sum = (double *) malloc ((n + 1) * sizeof (double));
}
int gsl_histogram_pdf_init (gsl_histogram_pdf * p, const gsl_histogram * h)
{
(...)
p->sum[0] = 0;
for (i = 0; i < n; i++)
{
sum += (h->bin[i] / mean) / n;
p->sum[i + 1] = sum;
}
}
Related
I would like to find a mapping f:X --> N, with multiple discrete natural variables X of varying dimension, where f produces a unique number between 0 to the multiplication of all dimensions. For example. Assume X = {a,b,c}, with dimensions |a| = 2, |b| = 3, |c| = 2. f should produce 0 to 12 (2*3*2).
a b c | f(X)
0 0 0 | 0
0 0 1 | 1
0 1 0 | 2
0 1 1 | 3
0 2 0 | 4
0 2 1 | 5
1 0 0 | 6
1 0 1 | 7
1 1 0 | 8
1 1 1 | 9
1 2 0 | 10
1 2 1 | 11
This is easy when all dimensions are equal. Assume binary for example:
f(a=1,b=0,c=1) = 1*2^2 + 0*2^1 + 1*2^0 = 5
Using this naively with varying dimensions we would get overlapping values:
f(a=0,b=1,c=1) = 0*2^2 + 1*3^1 + 1*2^2 = 4
f(a=1,b=0,c=0) = 1*2^2 + 0*3^1 + 0*2^2 = 4
A computationally fast function is preferred as I intend to use/implement it in C++. Any help is appreciated!
Ok, the most important part here is math and algorythmics. You have variable dimensions of size (from least order to most one) d0, d1, ... ,dn. A tuple (x0, x1, ... , xn) with xi < di will represent the following number: x0 + d0 * x1 + ... + d0 * d1 * ... * dn-1 * xn
In pseudo-code, I would write:
result = 0
loop for i=n to 0 step -1
result = result * d[i] + x[i]
To implement it in C++, my advice would be to create a class where the constructor would take the number of dimensions and the dimensions itself (or simply a vector<int> containing the dimensions), and a method that would accept an array or a vector of same size containing the values. Optionaly, you could control that no input value is greater than its dimension.
A possible C++ implementation could be:
class F {
vector<int> dims;
public:
F(vector<int> d) : dims(d) {}
int to_int(vector<int> x) {
if (x.size() != dims.size()) {
throw std::invalid_argument("Wrong size");
}
int result = 0;
for (int i = dims.size() - 1; i >= 0; i--) {
if (x[i] >= dims[i]) {
throw std::invalid_argument("Value >= dimension");
}
result = result * dims[i] + x[i];
}
return result;
}
};
int f(const std::vector<int>& v) {
int result = 0;
for (int i = 0; i < v.size(); ++i) { O(N)
for (int j = v.size(); j >= 0; j -= 2) { O(N/2)
result += v.at(i) * j;
}
}
return result;
}
The inner for loop is O(N/2), however I am wondering why this is because
For example, if v.size() is 10, then
10 >= 0 ✓
8 >= 0 ✓
6 >= 0 ✓
4 >= 0 ✓
2>= 0 ✓
0 >= 0 ✓
-2 Fails
The inner for loop could be executed 6 times with an input size of 10
What am I missing?
EDIT* I understand that only highest magnitude is taken into consideration. This question was more about coming up with the original O(N/2 + 1)
Complexity gives you a way to assess the magnitude of time it would take an input of certain size to complete, not the accurate time it would perform with.
Therefore, when dealing with complexity, you should only consider the highest magnitude, without constant multipliers:
O(N/2 + 1) = O(N/2) = O(N)
In a comment, you said:
I understand this, but I am just curious as to how O(N/2) is obtained
Take a look at the following table:
Size of vector Number of time the inner loop is executed:
0 1
1 1
2 2
3 2
...
100 51
101 51
...
2x x + 1
2x + 1 x + 1
If you take the constant 1 out of that equation, the inner loop is O(N/2).
I would like to translate [vec,val] = eig(A) from MATLAB to c++ using Eigen library, but I couldn't reach to the same result!
I tried eigensolver,ComplexEigenSolver and SelfAdjointEigenSolver. None of them give me the result like eig(A) in MATLAB.
Sample matrices:
Tv(:,:,223) =
0.8648 -1.9658 -0.2785
-1.9658 4.9142 0.8646
-0.2785 0.8646 0.3447
Tv(:,:,224) =
1.9735 -0.4218 1.0790
-0.4218 3.3012 0.1855
1.0790 0.1855 3.7751
Tv(:,:,225) =
2.4948 1.0185 1.1633
1.0185 1.1732 -0.4479
1.1633 -0.4479 4.3289
Tv(:,:,226) =
0.3321 0.0317 0.1617
0.0317 0.0020 -0.0139
0.1617 -0.0139 0.5834
Eigen:
MatrixXcd vec(3 * n, 3);
VectorXcd val(3);
for (int k = 0; k < n; k++){
EigenSolver<Matrix3d> eig(Tv.block<3, 3>(3 * k, 0));
vec.block<3, 3>(3 * k, 0) = eig.eigenvectors();
cout <<endl << vec.block<3, 3>(3 * k, 0) << endl;
val = eig.eigenvalues();
cout << "val= " << endl << val << endl;
}
//results
(0.369152,0) (-0.830627,0) (-0.416876,0)
(-0.915125,0) (-0.403106,0) (-0.00717218,0)
(-0.162088,0) (0.384142,0) (-0.908935,0)
val=
(5.86031,0)
(0.0396418,0)
(0.223765,0)
(0.881678,0) (0.204005,0) (0.425472,0)
(0.23084,0) (-0.97292,0) (-0.011858,0)
(-0.411531,0) (-0.108671,0) (0.904894,0)
val=
(1.35945,0)
(3.41031,0)
(4.27996,0)
(0.526896,0) (-0.726801,0) (0.440613,0)
(-0.813164,0) (-0.581899,0) (0.0125466,0)
(-0.247274,0) (0.364902,0) (0.897609,0)
val=
(0.377083,0)
(2.72623,0)
(4.89367,0)
(0.88992,0) (-0.43968,0) (0.121341,0)
(0.13406,0) (-0.00214387,0) (-0.990971,0)
(-0.43597,0) (-0.898152,0) (-0.0570358,0)
val=
(0.257629,0)
(0.662467,0)
(-0.00267575,0)
MATLAB:
for k=1:n
[u,d] = eig(Tv(:,:,k))
end
%results
u =
0.8306 -0.4169 -0.3692
0.4031 -0.0072 0.9151
-0.3841 -0.9089 0.1621
d =
0.0396 0 0
0 0.2238 0
0 0 5.8603
u =
0.8817 0.2040 0.4255
0.2308 -0.9729 -0.0119
-0.4115 -0.1087 0.9049
d =
1.3594 0 0
0 3.4103 0
0 0 4.2800
u =
-0.5269 0.7268 0.4406
0.8132 0.5819 0.0125
0.2473 -0.3649 0.8976
d =
0.3771 0 0
0 2.7262 0
0 0 4.8937
u =
-0.1213 -0.8899 0.4397
0.9910 -0.1341 0.0021
0.0570 0.4360 0.8982
d =
-0.0027 0 0
0 0.2576 0
0 0 0.6625
What's your suggestion?
I don't get your question, as looking at your results they all returns the same. Recall that the eigen-decomposition of a matrix is not completely unique:
eigenvalues/vectors can be arbitrarily reordered
if v is an eigenvector, then -v is also a valid eigenvector
Since your matrices are symmetric, you should use SelfAdjointEigenSolver to get them automatically ordered as MatLab. Then the eigenvectors will only differs from their sign, but you will have to live with that.
Well.... the results are the same....
Result eigen:
(0.369152,0) (-0.830627,0) (-0.416876,0)
(-0.915125,0) (-0.403106,0) (-0.00717218,0)
(-0.162088,0) (0.384142,0) (-0.908935,0)
val=
(5.86031,0)
(0.0396418,0)
(0.223765,0)
result matlab:
u =
0.8306 -0.4169 -0.3692
0.4031 -0.0072 0.9151
-0.3841 -0.9089 0.1621
d =
0.0396 0 0
0 0.2238 0
0 0 5.8603
I have good news....
The vectors are THE SAME, but unordered.....
eigV1 from eigen is -eigV3 from Matlab,
eigV2 from eigen is -eigV1 from Matlab,
eigV3 from eigen is -eigV2 from Matlab,
The eigenvalues are reordered equally....
I have a big matrix as input, and I have the size of a smaller matrix. I have to compute the sum of all possible smaller matrices which can be formed out of the bigger matrix.
Example.
Input matrix size: 4 × 4
Matrix:
1 2 3 4
5 6 7 8
9 9 0 0
0 0 9 9
Input smaller matrix size: 3 × 3 (not necessarily a square)
Smaller matrices possible:
1 2 3
5 6 7
9 9 0
5 6 7
9 9 0
0 0 9
2 3 4
6 7 8
9 0 0
6 7 8
9 0 0
0 9 9
Their sum, final output
14 18 22
29 22 15
18 18 18
I did this:
int** matrix_sum(int **M, int n, int r, int c)
{
int **res = new int*[r];
for(int i=0 ; i<r ; i++) {
res[i] = new int[c];
memset(res[i], 0, sizeof(int)*c);
}
for(int i=0 ; i<=n-r ; i++)
for(int j=0 ; j<=n-c ; j++)
for(int k=i ; k<i+r ; k++)
for(int l=j ; l<j+c ; l++)
res[k-i][l-j] += M[k][l];
return res;
}
I guess this is too slow, can anyone please suggest a faster way?
Your current algorithm is O((m - p) * (n - q) * p * q). The worst case is when p = m / 2 and q = n / 2.
The algorithm I'm going to describe will be O(m * n + p * q), which will be O(m * n) regardless of p and q.
The algorithm consists of 2 steps.
Let the input matrix A's size be m x n and the size of the window matrix being p x q.
First, you will create a precomputed matrix B of the same size as the input matrix. Each element of the precomputed matrix B contains the sum of all the elements in the sub-matrix, whose top-left element is at coordinate (1, 1) of the original matrix, and the bottom-right element is at the same coordinate as the element that we are computing.
B[i, j] = Sum[k = 1..i, l = 1..j]( A[k, l] ) for all 1 <= i <= m, 1 <= j <= n
This can be done in O(m * n), by using this relation to compute each element in O(1):
B[i, j] = B[i - 1, j] + Sum[k = 1..j-1]( A[i, k] ) + A[j] for all 2 <= i <= m, 1 <= j <= n
B[i - 1, j], which is everything of the sub-matrix we are computing except the current row, has been computed previously. You keep a prefix sum of the current row, so that you can use it to quickly compute the sum of the current row.
This is another way to compute B[i, j] in O(1), using the property of the 2D prefix sum:
B[i, j] = B[i - 1, j] + B[i, j - 1] - B[i - 1, j - 1] + A[j] for all 1 <= i <= m, 1 <= j <= n and invalid entry = 0
Then, the second step is to compute the result matrix S whose size is p x q. If you make some observation, S[i, j] is the sum of all elements in the matrix size (m - p + 1) * (n - q + 1), whose top-left coordinate is (i, j) and bottom-right is (i + m - p + 1, j + n - q + 1).
Using the precomputed matrix B, you can compute the sum of any sub-matrix in O(1). Apply this to compute the result matrix S:
SubMatrixSum(top-left = (x1, y1), bottom-right = (x2, y2))
= B[x2, y2] - B[x1 - 1, y2] - B[x2, y1 - 1] + B[x1 - 1, y1 - 1]
Therefore, the complexity of the second step will be O(p * q).
The final complexity is as mentioned above, O(m * n), since p <= m and q <= n.
So, I am doing a little code for opengl that picks the color of one square and sum 0.01 on his value, so the color will be more shining. I have values of colors for each square in one array , and I got one variable that holds the value of the maximum one element of the color can go, in this case this value is one.
This is part of the function
for(GLint i = 0; i < 3; i++) {
if(colors[selectedSquare][i] > 0) {
colors[selectedSquare][i] += 0.01;
if(colors[selectedSquare][i] == maxColor) {
flag = false;
}
}
}
I call this function in glutTimerFunc, and improve the value of the color in 0.01 for each time. When the value of the color goes egual 1 (the maxColor) i start to reducing the color in other part of the function.
The problem here is that the comparison
(colors[selectedSquare][i] == maxColor)
Never gets true, I made some output to check and this is what I got
colors[selectedSquare][i] value = 0.99 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 0
colors[selectedSquare][i] value = 1 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 0
colors[selectedSquare][i] value = 1.01 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 0
colors[selectedSquare][i] value = 1.02 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) = 0
But the interesting thing starts here, when I change the comparison to
((int)colors[selectedSquare][i] == maxColor)
I get this output
colors[selectedSquare][i] value = 0.99 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 0
colors[selectedSquare][i] value = 1 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 0
colors[selectedSquare][i] value = 1.01 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 1
colors[selectedSquare][i] value = 1.02 size = 4
maxColor value = 1 size = 4
(colors[selectedSquare][i] == maxColor) is 1
I measure the size using sizeof(), and the declaration of colors and maxColor is like that
GLfloat (Memoria::colors)[9][3] = {
{ 0.80, 0.80, 0.00 },
{ 0.00, 0.80, 0.80 },
{ 0.80, 0.00, 0.00 },
{ 0.00, 0.80, 0.00 },
{ 0.00, 1.00, 1.00 },
{ 1.00, 0.00, 0.00 },
{ 1.00, 0.00, 1.00 },
{ 1.00, 1.00, 0.00 },
{ 1.00, 1.00, 1.00 },
};
const GLfloat maxColor;
Both belong to the same class, but colors is static.
Hope someone knows the problem.
Directly comparing doubles is a bad idea. You could use >= instead of == or do something like
if(fabs(colors[selectedSquare][i] - maxColor) > delta)
where delta is a precision you want to use.
Your problem is - doubles are never stored exactly as you seem to expect them to be. There are always fluctuations at the end of the number far beyond the comma separated part.