Most efficient way to calculate a lower triangular matrix row index? - c++

I am working with a lower triangular matrix, the function below calculates a row index of such matrix. How can I optimize it in terms of execution time?
The triangular matrix can hold at most N (N + 1) / 2 nonzero elements (where N is the matrix dimension - N x N).
I have a set of numbers 0, 1, 2, ..., N (N + 1) / 2 - 1 and from those, I have to calculate the matrix row index.
My current solution:
inline
unsigned int calc_row(unsigned int i)
{
return static_cast<unsigned int>(floor(sqrt(static_cast<double>(0.25 + 2 * i)) - 0.5));
}
// example:
calc_row(0) == 0;
calc_row(1) == 1;
calc_row(2) == 1;
calc_row(3) == 2;
calc_row(4) == 2;
calc_row(5) == 2;
Question:
1) Do you think my current solution is performance friendly?
2) If not how can I optimize it (in terms of the function execution time)?
If You believe an alternate method to calculate the row index would perform better, I am fine with it. Unfortunately the lookup table is not an option in my case.
EDIT #1:
I just had an idea: Is there a way to make a template metaprogramming version of the lookup table? A way to generate the row number at a compile time could prove to be a significant optimization. The biggest unsigned int i would be around 10 million in my case.
EDIT #2:
I edited the entire question because it caused a major confusion. I am sorry about that.
EDIT #3:
calc_row() calculates the formula: (sqrt(1 + 8 * i) - 1) / 2 which is the solution for the quadratic equation x(x + 1) / 2 = i. Where i is row index.
The main idea for this solution lies in the fact that the linear index for a triangular matrix with diagonal can be calculated as: i (i + 1) / 2 + j. Where i is row index and j is column index.

Related

Find euler function of binomial coefficient

I've been trying to solve this problem:
Find Euler's totient function of binomial coefficient C(n, m) = n! / (m! (n - m)!) modulo 10^9 + 7, m <= n < 2 * 10^5.
One of my ideas was that first, we can precalculate the values of phi(i) for all i from 1 to n in linear time, also we can calculate all inverses to numbers from 1 to n modulo 10^9 + 7 using, for example, Fermat's little theorem. After that, we know, that, in general, phi(m * n) = phi(m) * phi(n) * (d / fi(d)), d = gcd(m, n). Because we know that gcd((x - 1)!, x) = 1, if x is prime, 2 if x = 4, and x in all other cases, we can calculate phi(x!) modulo 10^9 + 7 in linear time. However, in the last step, we need to calculate phi(n! / ((m! (n - m)!), (if we already know the function for factorials), so, if we are using this method, we have to know gcd(C(n, m), m! (n - m)!), and I don't know how to find it.
I've also been thinking about factorizing the binomial coefficient, but there seems no efficient way to do this.
Any help would be appreciated.
First, factorize all numbers 1..(2*10^5) as products of prime powers.
Now, factorize n!/k! = n(n-1)(n-2)...(n-k+1) as a product of prime powers by multiplying together the factors of the individual parts. Factorize (n-k)! as a product of prime powers. Subtract the latter powers from the former (to account for the divide).
Now you've got C(n, k) as a product of prime powers. Use the formula phi(N) = N * prod(1 - 1/p for p|N) to calculate phi(C(n, k)), which is straightforward given that you've computed the a list of all the prime powers that divide C(n, k) in the second step.
For example:
phi(C(9, 4)) = 9*8*7*6*5 / 5*4*3*2*1
9*8*7*6*5 = 3*3 * 2*2*2 * 7 * 3*2 * 5 = 7*5*3^3*2^4
5*4*3*2*1 = 5 * 2*2 * 3 * 2 * 1 = 5*3*2^3
9*8*7*6*5/(5*4*3*2*1) = 7*3^2*2
phi(C(9, 4)) = 7*3^2*2 * (1 - 1/7) * (1 - 1/3) * (1 - 1/2) = 36
I've done it in integers rather than integers mod M, but it seems like you already know how division works in the modulo ring.

How to count how many valid colourings in a graph?

I attempted this SPOJ problem.
Problem:
AMR10J - Mixing Chemicals
There are N bottles each having a different chemical. For each chemical i, you have determined C[i] which means that mixing chemicals i and C[i] causes an explosion. You have K distinct boxes. In how many ways can you divide the N chemicals into those boxes such that no two chemicals in the same box can cause an explosion together?
INPUT
The first line of input is the number of test cases T. T test cases follow each containing 2 lines.
The first line of each test case contains 2 integers N and K.
The second line of each test case contains N integers, the ith integer denoting the value C[i]. The chemicals are numbered from 0 to N-1.
OUTPUT
For each testcase, output the number of ways modulo 1,000,000,007.
CONSTRAINTS
T <= 50
2 <= N <= 100
2 <= K <= 1000
0 <= C[i] < N
For all i, i != C[i]
SAMPLE INPUT
3
3 3
1 2 0
4 3
1 2 0 0
3 2
1 2 0
SAMPLE OUTPUT
6
12
0
EXPLANATION
In the first test case, we cannot mix any 2 chemicals. Hence, each of the 3 boxes must contain 1 chemical, which leads to 6 ways in total.
In the third test case, we cannot put the 3 chemicals in the 2 boxes satisfying all the 3 conditions.
The summary of the problem, given a set of chemicals and a set of boxes, count how many possible ways to place these chemicals in boxes such that no chemicals will explode.
At first I used brute force method to solve the problem, I recursively place chemicals in boxes and count valid configurations, I got TLE at my first attempt.
Later I learned that the problem can be solved with graph colouring.
I can represent chemicals as vertexes and there'a an edge between chemicals if they cannot be placed each other.
And the set of boxes can be used as vertex colours, all I need to do was to count how many different valid colourings of the graph.
I applyed this concept to solve the problem unfortunately I got TLE again. I don't know how to improve my code, I need help.
code:
#include <bits/stdc++.h>
#define MAXN 100
using namespace std;
const int mod = (int) 1e9 + 7;
int n;
int k;
int ways;
void greedy_coloring(vector<int> adj[], int color[])
{
int u = 0;
for (; u < n; ++u)
if (color[u] == -1)//found first uncolored vertex
break;
if (u == n)//no uncolored vertexex means all vertexes are colored
{
ways = (ways + 1) % mod;
return;
}
bool available[k];
memset(available, true, sizeof(available));
for (int v : adj[u])
if (color[v] != -1)//if the adjacent vertex colored, make its color unavailable
available[color[v]] = false;
for (int c = 0; c < k; ++c)
if (available[c])
{
color[u] = c;
greedy_coloring(adj, color);
color[u] = -1;//don't forgot to reset the color
}
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int T;
cin >> T;
while (T--)
{
cin >> n >> k;
vector<int> adj[n];
int c[n];
for (int i = 0; i < n; ++i)
{
cin >> c[i];
adj[i].push_back(c[i]);
adj[c[i]].push_back(i);
}
ways = 0;
int color[n];
memset(color, -1, sizeof(color));
greedy_coloring(adj, color);
cout << ways << "\n";
}
return 0;
}
Counting the number of colorings in a general graph is #P-hard, but this graph has some special structure, which I'll exploit in a minute after I enumerate some basic properties of counting colorings. The first observation is that, if the graph has a node with no neighbors, if we delete that node, the number of colorings decreases by a factor of k. The second observation is that, if a node has exactly one neighbor and we delete it, the number of colorings decreases by a factor of k-1. The third is that the number of colorings is equal to the product of the number of colorings for each connected component. The fourth is that we can delete all but one parallel edge.
Using these properties, it suffices to determine a formula for each connected component of the 2-core of this graph, which is a simple cycle of some length. Let P(n) and C(n) be the number of ways to color a path or cycle respectively with n nodes. We use the basic properties above to find
P(n) = k (k-1)^(n-1).
Finding a formula for C(n) I think requires the deletion contraction formula, which leads to a recurrence
C(3) = k (k-1) (k-2), i.e., three nodes of different colors;
C(n) = P(n) - C(n-1) = k (k-1)^(n-1) - C(n-1).
Multiply the above recurrence by (-1)^n.
(-1)^3 C(3) = -k (k-1) (k-2)
(-1)^n C(n) = (-1)^n k (k-1)^(n-1) - (-1)^n C(n-1)
= (-1)^n k (k-1)^(n-1) + (-1)^(n-1) C(n-1)
(-1)^n C(n) - (-1)^(n-1) C(n-1) = (-1)^n k (k-1)^(n-1)
Let D(n) = (-1)^n C(n).
D(3) = -k (k-1) (k-2)
D(n) - D(n-1) = (-1)^n k (k-1)^(n-1)
Now we can write D(n) as a telescoping sum:
D(n) = [sum_{i=4}^n (D(n) - D(n-1))] + D(3)
D(n) = [sum_{i=4}^n (-1)^n k (k-1)^(n-1)] - k (k-1) (k-2).
Break it down as two geometric sums which then cancel nicely.
D(n) = [sum_{i=4}^n (-1)^n ((k-1) + 1) (k-1)^(n-1)] - k (k-1) (k-2)
= sum_{i=4}^n (1-k)^n - sum_{i=4}^n (1-k)^(n-1) - k (k-1) (k-2)
= (1-k)^n - (1-k)^3 - k (k-1) (k-2)
= (1-k)^n - (1 - 3k + 3k^2 - k^3) - (2k - 3k^2 + k^3)
= (1-k)^n - (1-k)
C(n) = (-1)^n (1-k)^n - (-1)^n (1-k)
= (k-1)^n + (-1)^n (k-1).
Note that after removing all parallel edges, we can have at most n edges. This means that in any one connected component we can only see one cycle (and simple at that), which makes the combinatorics rather straightforward. (Cycles are only dependent on how many edges each node can spawn, which is capped at 1.)
Second example:
k = 3
<< 0 <-- 3
/ ^
/ ^
1 --> 2
Since cycles are self contained, any connection to one removes the possibility of another. In the example above, we cannot make a second cycle involving node 3 by adding more nodes, and the same issue would extend to any subsequent connected nodes.
It should be enough, therefore, to perform a search, separating out connected components and marking their node count and whether they contain a cycle. Given a connected component, where c of the nodes are part of a cycle and m nodes are not, we have the following formula (David Eisenstat helped me correct my combinatoric for the count of colourings of a cycle):
if the component has a cycle:
[(k - 1)^c + (-1)^c * (k - 1)] *
(k - 1)^(m)
otherwise:
k * (k - 1)^(m - 1)
As David Eisenstat noted, multiply all these results for the final tally.

Strategy to understand code I can't get

Quick method to quickly compute Fibonacci, using Matrix property
Divide_Conquer_Fib(n) {
i = h = 1;
j = k = 0;
while (n > 0) {
if (n%2 == 1) { // if n is odd
t = j*h;
j = i*h + j*k + t;
i = i*k + t;
}
t = h*h;
h = 2*k*h + t;
k = k*k + t;
n = (int) n/2;
}
return j;
}
How do i understand this code? What would your strategy be? Would you put lots of print statements to see how states of variables change?
It is important to see how various developers' minds would go about understanding this code.
I would start off by running it against a few vales of n to check that it actually appears to give the correct answers. Then I'd read up on the mathematical theory to understand how it is likely to be working, and finally use that knowledge to take it to bits…
The Wikipedia entry section on the Matrix form explains the basis for this algorithm.
Well, the proper way to look at this code is to know what it does: Fibonacci numbers are coming up as an interesting exercise frequently, plus there is quite a bit of context saying what it does: it uses a matrix property together with divide and conquer. It turns out that you can compute the vector (Fibn, Fibn-1) as a product of some matrix and (Fibn-1, Fibn-2). Let's assume two rows in the code below are just two rows of the same matrix:
(Fib[n] ) (1 1) (Fib[n-1])
( ) = ( ) * ( )
(Fib[n-1]) (1 0) (Fib[n-2])
Now, matrix multiplication of quadratic matrices is associative, i.e., if the matrix above is M you can compute Fibn as Mn times (1, 0).
The next step is to compute Mn using divide and conquer. The basic trick here is that Mn can be decomposed according to the bits of n: Instead of computing the power by n multiplication you decompose the computation into computing squares and multiplying an extra term if the value is odd.
This is the basic underlying approach. The computation of the powers is done in the other direction, however, which works - I think - because the matrix is symmetric. I don't think you can derive the algorithm from the code easily if you are unaware of the basic approach.

Sum of all Maximum and minimum in all segments

I am doing a problem in which there is a need to find sum of maximum elements in a segment - sum of minimum elements in a segment.I tried using Sparse Table ,but it is two slow for the time limit.So i did something like this:
If n=4 segments are [1,2],[1,3],[1,4],[2,3],[2,4],[3,4].
The problem is similar to an RMQ problem but i have to do it for all segments and find the
sum=max(a[1],a[2])+
max(a[1],a[2],a[3])+max(a[1],a[2],a[3],a[4])+max(a[2],a[3])+m‌​ax(a[2],a[3],a[4])+max(a[3],a[4])-min(a[1],a[2])+min(a[1],a[2],a[3])+min(a[1],a[2‌​],a[3],a[4])+min(a[2],a[3])+min(a[2],a[3],a[4])+min(a[3],a[4])
for(i=1;i<n;i++)
{
maxtilli[i-1]=INT_MIN;
mintilli[i-1]=INT_MAX;
for(k=1,j=i;j<=n;k++,j++)
{
if(a[j]>maxtilli[k-1])
{
maxtilli[k]=a[j];
}
else
{
maxtilli[k]=maxtilli[k-1];
}
if(a[j]<mintilli[k-1])
{
mintilli[k]=a[j];
}
else
{
mintilli[k]=mintilli[k-1];
}
if(i!=j)
{
ans+=(maxtilli[k]-mintilli[k]);
}
}
}
Here n is of the order of 100,000. So is there any way to optimize it.
Suppose n=4 then segments are [1,2],[1,3],[1,4],[2,3],[2,4],[3,4].
The thing required is
sum=max(a[1],a[2])+max(a[1],a[2],a[3])+max(a[1],a[2],a[3],a[4])+max(a[2],a[3])+m‌​ax(a[2],a[3],a[4])+max(a[3],a[4])-min(a[1],a[2])+min(a[1],a[2],a[3])+min(a[1],a[2‌​],a[3],a[4])+min(a[2],a[3])+min(a[2],a[3],a[4])+min(a[3],a[4])
We can try to finish the first problem, sum of the max value in all segments.
Algorithm
First, you can find the max value a[i] in the whole sequence. All segments which contain a[i] would be considered. The answer plus A[i] * (i * (n - i)). And the problem is split into two small sequences [1, i - 1] and [i + 1, n], you can do it in the same way.
Code
void cal(int L, int R){
max_index = find_max(L, R); // O(logN), using Sparse Table or Segment Tree
int all_segments = (max_index - L + 1) * (R - max_index)
ans += a[max_index] * all_segments;
cal(L, max_index - 1);
cal(max_index + 1, R);
}
// call max_index N times, so the total complexity is O(N * logN)

Histogram approximation for streaming data

This question is a slight extension of the one answered here. I am working on re-implementing a version of the histogram approximation found in Section 2.1 of this paper, and I would like to get all my ducks in a row before beginning this process again. Last time, I used boost::multi_index, but performance wasn't the greatest, and I would like to avoid the logarithmic in number of buckets insert/find complexity of a std::set. Because of the number of histograms I'm using (one per feature per class per leaf node of a random tree in a random forest), the computational complexity must be as close to constant as possible.
A standard technique used to implement a histogram involves mapping the input real value to a bin number. To accomplish this, one method is to:
initialize a standard C array of size N, where N = number of bins; and
multiply the input value (real number) by some factor and floor the result to get its index in the C array.
This works well for histograms with uniform bin size, and is quite efficient. However, Section 2.1 of the above-linked paper provides a histogram algorithm without uniform bin sizes.
Another issue is that simply multiplying the input real value by a factor and using the resulting product as an index fails with negative numbers. To resolve this, I considered identifying a '0' bin somewhere in the array. This bin would be centered at 0.0; the bins above/below it could be calculated using the same multiply-and-floor method just explained, with the slight modification that the floored product be added to two or subtracted from two as necessary.
This then raises the question of merges: the algorithm in the paper merges the two closest bins, as measured from center to center. In practice, this creates a 'jagged' histogram approximation, because some bins would have extremely large counts and others would not. Of course, this is due to non-uniform-sized bins, and doesn't result in any loss of precision. A loss of precision does, however, occur if we try to normalize the non-uniform-sized bins to make the uniform. This is because of the assumption that m/2 samples fall to the left and right of the bin center, where m = bin count. We could model each bin as a gaussian, but this will still result in a loss of precision (albeit minimal)
So that's where I'm stuck right now, leading to this major question: What's the best way to implement a histogram accepting streaming data and storing each sample in bins of uniform size?
Keep four variables.
int N; // assume for simplicity that N is even
int count[N];
double lower_bound;
double bin_size;
When a new sample x arrives, compute double i = floor(x - lower_bound) / bin_size. If i >= 0 && i < N, then increment count[i]. If i >= N, then repeatedly double bin_size until x - lower_bound < N * bin_size. On every doubling, adjust the counts (optimize this by exploiting sparsity for multiple doublings).
for (int j = 0; j < N / 2; j++) count[j] = count[2 * j] + count[2 * j + 1];
for (int j = N / 2; j < N; j++) count[j] = 0;
The case i < 0 is trickier, since we need to decrease lower_bound as well as increase bin_size (again, optimize for sparsity or adjust the counts in one step).
while (lower_bound > x) {
lower_bound -= N * bin_size;
bin_size += bin_size;
for (int j = N - 1; j > N / 2 - 1; j--) count[j] = count[2 * j - N] + count[2 * j - N + 1];
for (int j = 0; j < N / 2; j++) count[j] = 0;
}
The exceptional cases are expensive but happen only a logarithmic number of times in the range of your data over the initial bin size.
If you implement this in floating-point, be mindful that floating-point numbers are not real numbers and that statements like lower_bound -= N * bin_size may misbehave (in this case, if N * bin_size is much smaller than lower_bound). I recommend that bin_size be a power of the radix (usually two) at all times.