I have a problem with the NIST/Diehard Binary Matrix test. It's about dividing a binary sequence into a 32x32 matrix and calculating its rank. After calculating ranks I need to compute a xi^2 value and then calculate p-value(must be from 0 to 1). I'm getting p-value extremely small even in a random sequence.
I've tried to hardcode some small examples and getting my p-value right though I think my problem is in reading a binary sequence file and getting bits from it.
This is reading from a file and converting to bits sequence.
ifstream fin("seq1.bin", ios::binary);
fin.seekg(0, ios::end);
int n = fin.tellg();
unsigned int start, end;
char *buf = new char[n];
fin.seekg(0, ios::beg);
fin.read(buf, n);
n *= 8;
bool *s = new bool[n];
for (int i = 0; i < n / 8; i++) {
for (int j = 7; j >= 0; j--) {
s[(i) * 8 + 7 - j] = (bool)((buf[i] >> j) & 1);
}
}
Then I form my matrix and calculate it's rank
int *ranks = new int[N];
for (int i = 0; i < N; i++) {
bool *arr = new bool[m*q];
copy(s + i * m*q, s +(i * m*q) + (m * q), arr);
ranks[i] = binary_rank(arr, m, q);
}
Cheking occurance in ranks
int count_occurrences(int arr[], int n, int x){
int result = 0;
for (int i = 0; i < n; i++)
if (x == arr[i])
result++;
return result;
}
Calculating xi^2 and p-value
double calculate_xi(int fm, int fm_1, int remaining, int N) {
double N1 = 0.2888*N;
double N2 = 0.5776*N;
double N3 = 0.1336*N;
double x1 = (fm - N1)*(fm - N1) / N1;
double x2 = (fm_1 - N2)*(fm_1 - N2) / N2;
double x3 = (remaining - N3)*(remaining - N3) / N3;
return x1 + x2 + x3;
}
double calculate_pvalue(double xi2) {
return exp(-(xi2 / 2));
}
I expect p-value between 0 and 1 but getting 0 every time. It's because of the extremely big xi^2 value and I couldn't find what I've done wrong. Could you please help me to get things right.
For this part:
for (int i = 0; i < n / 8; i++) {
for (int j = 7; j >= 0; j--) {
s[(i) * 8 + 7 - j] = (bool)((buf[i] >> j) & 1);
}
}
when you add elements to s array, looks like you switch the position of bytes inside each character: the last bit in character in buf goes into the first bit in character in s array, because the shift initially is 7, so you take first bit in char from buf[], but for s[] it looks to be 0, resulting in swapping. It is easy to verify with debugger though, as from code it is not so obvious. Thanks.
This code should output 0 0.25 0.5 0.75 1, instead it outputs zeros. Why is that?
Define a function u(x)=x;
void pde_advect_IC(double* x, double* u)
{
int N = sizeof(x) / sizeof(x[0]); //size of vector u
for (int i = 0; i <= N; i++)
u[i] = x[i];
}
Here is the implementation:
int main()
{
double a = 0.0;
double b = 1.0;
int nx = 4;
double dx = (b - a) / double(nx);
double xx[nx + 1]; //array xx with intervals
// allocate memory for vectors of solutions u0
double* u0 = new double [nx + 1];
//fill in array x
for (int i = 0; i <= nx; i++)
xx[i] = a + double(i) * dx;
pde_advect_IC(xx, u0); // u0 = x (initial conditions)
for (int i = 0; i <= nx; i++)
cout<<u0[i]<<endl;
// de-allocate memory of u0
delete [] u0;
delete [] u1;
return 0;
}
You can't use sizeof(x) because that will return the size of the pointer, not the array you thought you passed to it. You have to specify the size with a third parameter or use something more convenient like an std::vector and use size().
This works.
#include <iostream>
#include <cstdlib>
using namespace std;
void pde_advect_IC(double* x, double* u, const int& N)
{
for (int i = 0; i < N; i++)
u[i] = x[i];
}
int main()
{
double a = 0.0;
double b = 1.0;
int nx = 4;
double dx = (b - a) / double(nx);
double xx[nx + 1]; //array xx with intervals
// allocate memory for vectors of solutions u0
double* u0 = new double [nx + 1];
//fill in array x
for (int i = 0; i <= nx; i++)
xx[i] = a + double(i) * dx;
pde_advect_IC(xx, u0, nx + 1); // u0 = x (initial conditions)
for (int i = 0; i <= nx; i++)
cout << u0[i] << endl;
// de-allocate memory of u0
delete [] u0;
return 0;
}
Note that I added const int& N to pde_advect_IC() in order to pass it the size of the array, by const reference, to be sure it does not get modified by mistake.
Note that your trick with sizeof() does not work with pointers.
I stumbled upon this problem on Codility Lessons, here is the description:
A non-empty zero-indexed array A consisting of N integers is given.
A triplet (X, Y, Z), such that 0 ≤ X < Y < Z < N, is called a double slice.
The sum of double slice (X, Y, Z) is the total of A[X + 1] + A[X + 2] + ... + A[Y − 1] + A[Y + 1] + A[Y + 2] + ... + A[Z − 1].
For example, array A such that:
A[0] = 3
A[1] = 2
A[2] = 6
A[3] = -1
A[4] = 4
A[5] = 5
A[6] = -1
A[7] = 2
contains the following example double slices:
double slice (0, 3, 6), sum is 2 + 6 + 4 + 5 = 17,
double slice (0, 3, 7), sum is 2 + 6 + 4 + 5 − 1 = 16,
double slice (3, 4, 5), sum is 0.
The goal is to find the maximal sum of any double slice.
Write a function:
int solution(vector &A);
that, given a non-empty zero-indexed array A consisting of N integers, returns the maximal sum of any double slice.
For example, given:
A[0] = 3
A[1] = 2
A[2] = 6
A[3] = -1
A[4] = 4
A[5] = 5
A[6] = -1
A[7] = 2
the function should return 17, because no double slice of array A has a sum of greater than 17.
Assume that:
N is an integer within the range [3..100,000];
each element of array A is an integer within the range [−10,000..10,000].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(N), beyond input storage (not counting >the storage required for input arguments).
Elements of input arrays can be modified.
I have already read about the algorithm with counting MaxSum starting at index i and ending at index i, but I don't know why my approach sometimes gives bad results. The idea is to compute MaxSum ending at index i, ommiting the minimum value at range 0..i. And here is my code:
int solution(vector<int> &A) {
int n = A.size();
int end = 2;
int ret = 0;
int sum = 0;
int min = A[1];
while (end < n-1)
{
if (A[end] < min)
{
sum = max(0, sum + min);
ret = max(ret, sum);
min = A[end];
++end;
continue;
}
sum = max(0, sum + A[end]);
ret = max(ret, sum);
++end;
}
return ret;
}
I would be glad if you could help me point out the loophole!
My solution based on bidirectional Kadane's algorithm. More details on my blog here. Scores 100/100.
public int solution(int[] A) {
int N = A.length;
int[] K1 = new int[N];
int[] K2 = new int[N];
for(int i = 1; i < N-1; i++){
K1[i] = Math.max(K1[i-1] + A[i], 0);
}
for(int i = N-2; i > 0; i--){
K2[i] = Math.max(K2[i+1]+A[i], 0);
}
int max = 0;
for(int i = 1; i < N-1; i++){
max = Math.max(max, K1[i-1]+K2[i+1]);
}
return max;
}
Here is my code:
int get_max_sum(const vector<int>& a) {
int n = a.size();
vector<int> best_pref(n);
vector<int> best_suf(n);
//Compute the best sum among all x values assuming that y = i.
int min_pref = 0;
int cur_pref = 0;
for (int i = 1; i < n - 1; i++) {
best_pref[i] = max(0, cur_pref - min_pref);
cur_pref += a[i];
min_pref = min(min_pref, cur_pref);
}
//Compute the best sum among all z values assuming that y = i.
int min_suf = 0;
int cur_suf = 0;
for (int i = n - 2; i > 0; i--) {
best_suf[i] = max(0, cur_suf - min_suf);
cur_suf += a[i];
min_suf = min(min_suf, cur_suf);
}
//Check all y values(y = i) and return the answer.
int res = 0;
for (int i = 1; i < n - 1; i++)
res = max(res, best_pref[i] + best_suf[i]);
return res;
}
int get_max_sum_dummy(const vector<int>& a) {
//Try all possible values of x, y and z.
int res = 0;
int n = a.size();
for (int x = 0; x < n; x++)
for (int y = x + 1; y < n; y++)
for (int z = y + 1; z < n; z++) {
int cur = 0;
for (int i = x + 1; i < z; i++)
if (i != y)
cur += a[i];
res = max(res, cur);
}
return res;
}
bool test() {
//Generate a lot of small test cases and compare the output of
//a brute force and the actual solution.
bool ok = true;
for (int test = 0; test < 10000; test++) {
int size = rand() % 20 + 3;
vector<int> a(size);
for (int i = 0; i < size; i++)
a[i] = rand() % 20 - 10;
if (get_max_sum(a) != get_max_sum_dummy(a))
ok = false;
}
for (int test = 0; test < 10000; test++) {
int size = rand() % 20 + 3;
vector<int> a(size);
for (int i = 0; i < size; i++)
a[i] = rand() % 20;
if (get_max_sum(a) != get_max_sum_dummy(a))
ok = false;
}
return ok;
}
The actual solution is get_max_sum function(the other two are a brute force solution and a tester functions that generates a random array and compares the output of a brute force and actual solution, I used them for testing purposes only).
The idea behind my solution is to compute the maximum sum in a sub array that that starts somewhere before i and ends in i - 1, then do the same thing for suffices(best_pref[i] and best_suf[i], respectively). After that I just iterate over all i and return the best value of best_pref[i] + best_suf[i]. It works correctly because best_pref[y] finds the best x for a fixed y, best_suf[y] finds the best z for a fixed y and all possible values of y are checked.
def solution(A):
n = len(A)
K1 = [0] * n
K2 = [0] * n
for i in range(1,n-1,1):
K1[i] = max(K1[i-1] + A[i], 0)
for i in range(n-2,0,-1):
K2[i] = max(K2[i+1]+A[i], 0)
maximum = 0;
for i in range(1,n-1,1):
maximum = max(maximum, K1[i-1]+K2[i+1])
return maximum
def main():
A = [3,2,6,-1,4,5,-1,2]
print(solution(A))
if __name__ == '__main__': main()
Ruby 100%
def solution(a)
max_starting =(a.length - 2).downto(0).each.inject([[],0]) do |(acc,max), i|
[acc, acc[i]= [0, a[i] + max].max ]
end.first
max_ending =1.upto(a.length - 3).each.inject([[],0]) do |(acc,max), i|
[acc, acc[i]= [0, a[i] + max].max ]
end.first
max_ending.each_with_index.inject(0) do |acc, (el,i)|
[acc, el.to_i + max_starting[i+2].to_i].max
end
end
I have a 3007 x 1644 dimensional matrix of terms and documents. I am trying to assign weights to frequency of terms in each document so I'm using this log entropy formula http://en.wikipedia.org/wiki/Latent_semantic_indexing#Term_Document_Matrix (See entropy formula in the last row).
I'm successfully doing this but my code is running for >7 minutes.
Here's the code:
int N = mat.cols();
for(int i=1;i<=mat.rows();i++){
double gfi = sum(mat(i,colon()))(1,1); //sum of occurrence of terms
double g =0;
if(gfi != 0){// to avoid divide by zero error
for(int j = 1;j<=N;j++){
double tfij = mat(i,j);
double pij = gfi==0?0.0:tfij/gfi;
pij = pij + 1; //avoid log0
double G = (pij * log(pij))/log(N);
g = g + G;
}
}
double gi = 1 - g;
for(int j=1;j<=N;j++){
double tfij = mat(i,j) + 1;//avoid log0
double aij = gi * log(tfij);
mat(i,j) = aij;
}
}
Anyone have ideas how I can optimize this to make it faster? Oh and mat is a RealSparseMatrix from amlpp matrix library.
UPDATE
Code runs on Linux mint with 4gb RAM and AMD Athlon II dual core
Running time before change: > 7mins
After #Kereks answer: 4.1sec
Here's a very naive rewrite that removes some redundancies:
int const N = mat.cols();
double const logN = log(N);
for (int i = 1; i <= mat.rows(); ++i)
{
double const gfi = sum(mat(i, colon()))(1, 1); // sum of occurrence of terms
double g = 0;
if (gfi != 0)
{
for (int j = 1; j <= N; ++j)
{
double const pij = mat(i, j) / gfi + 1;
g += pij * log(pij);
}
g /= logN;
}
for (int j = 1; j <= N; ++j)
{
mat(i,j) = (1 - g) * log(mat(i, j) + 1);
}
}
Also make sure that the matrix data structure is sane (e.g. a flat array accessed in strides; not a bunch of dynamically allocated rows).
Also, I think the first + 1 is a bit silly. You know that x -> x * log(x) is continuous at zero with limit zero, so you should write:
double const pij = mat(i, j) / gfi;
if (pij != 0) { g += pij + log(pij); }
In fact, you might even write the first inner for loop like this, avoiding a division when it isn't needed:
for (int j = 1; j <= N; ++j)
{
if (double pij = mat(i, j))
{
pij /= gfi;
g += pij * log(pij);
}
}
Hi everyone I got a trouble with the returning element of a function. I need to return a double pointer to pointer "double**". But I got a double[][] matrix.
Here is the code:
double** createPalette(int r, int g, int b) {
double incR = 1 / r, incG = 1 / g, incB = 1 / b;
double Cp[r * g * b][3];
for (int i = 0; i < r; i++) {
for (int j = 0; j < g; j++) {
for (int k = 0; k < b; k++) {
Cp[i * r + j * g + k][0] = incR * i;
Cp[i * r + j * g + k][1] = incG * j;
Cp[i * r + j * g + k][2] = incB * k;
}
}
}
return Cp; //return &cp... (?)
}
I was looking for on internet, but I only found about simple pointer, no pointer to pointers.What should I do?
Thanks for all.
I think you know the value of r,g,b,so you can get the size of the matrix,you can do like this
void createPalette(int r, int g, int b, double matrix[][3])