What is the O complexity of the number of "empty" nodes in an AVL tree? - heap

We know that an AVL tree is usually very close to being balanced. Let's say we take an AVL tree and put it into an array (very similar to a heap, where the parent is index i, left child is 2i, and right child is 2i+1), how many empty indices would you get in terms of big O complexity?
So I know that the minimum number of nodes in a tree of height h = Fibonacci(h+2) - 1. So number of empty indices = 2^h - 1 - (Fibonacci(h+2) - 1) = 2^h - Fibonacci(h+2). But I don't know what to do next to prove it's complexity. I think it's O(log(n)), but I'm not sure.

If h is 0-based (as is the usual convention), the minimum number of nodes in an AVL tree with height h is F(h+3) - 1.
Let n = F(h+3) - 1 and let's try to solve for h to find the maximum height of an AVL tree with n nodes.
The closed form for F(x) is given by Binet's formula (see here for details):
F(x) = (phi^n - psi^n)/sqrt(5)
Therefore,
n = (phi^(h+3) - psi^(h+3))/sqrt(5) - 1
>= (phi^(h+3) - 1)/sqrt(5) - 1 since |psi| < 1
Solving for h yields
h <= log_phi(sqrt(5)(n + 1) + 1) - 3
<= 1.4405 log2(n)
A full tree with height h has 2^(h+1) - 1 nodes. Or in terms of n:
2^(h+1) - 1
<= 2^(1.4405 log2(n) + 1) - 1
= 2 * (2^log(n))^1.4405 - 1
= 2n^1.4405 - 1
Hence the number of empty nodes is bounded by
2n^1.4405 - 1 - n = O(n^1.4405)

Related

How to count how many valid colourings in a graph?

I attempted this SPOJ problem.
Problem:
AMR10J - Mixing Chemicals
There are N bottles each having a different chemical. For each chemical i, you have determined C[i] which means that mixing chemicals i and C[i] causes an explosion. You have K distinct boxes. In how many ways can you divide the N chemicals into those boxes such that no two chemicals in the same box can cause an explosion together?
INPUT
The first line of input is the number of test cases T. T test cases follow each containing 2 lines.
The first line of each test case contains 2 integers N and K.
The second line of each test case contains N integers, the ith integer denoting the value C[i]. The chemicals are numbered from 0 to N-1.
OUTPUT
For each testcase, output the number of ways modulo 1,000,000,007.
CONSTRAINTS
T <= 50
2 <= N <= 100
2 <= K <= 1000
0 <= C[i] < N
For all i, i != C[i]
SAMPLE INPUT
3
3 3
1 2 0
4 3
1 2 0 0
3 2
1 2 0
SAMPLE OUTPUT
6
12
0
EXPLANATION
In the first test case, we cannot mix any 2 chemicals. Hence, each of the 3 boxes must contain 1 chemical, which leads to 6 ways in total.
In the third test case, we cannot put the 3 chemicals in the 2 boxes satisfying all the 3 conditions.
The summary of the problem, given a set of chemicals and a set of boxes, count how many possible ways to place these chemicals in boxes such that no chemicals will explode.
At first I used brute force method to solve the problem, I recursively place chemicals in boxes and count valid configurations, I got TLE at my first attempt.
Later I learned that the problem can be solved with graph colouring.
I can represent chemicals as vertexes and there'a an edge between chemicals if they cannot be placed each other.
And the set of boxes can be used as vertex colours, all I need to do was to count how many different valid colourings of the graph.
I applyed this concept to solve the problem unfortunately I got TLE again. I don't know how to improve my code, I need help.
code:
#include <bits/stdc++.h>
#define MAXN 100
using namespace std;
const int mod = (int) 1e9 + 7;
int n;
int k;
int ways;
void greedy_coloring(vector<int> adj[], int color[])
{
int u = 0;
for (; u < n; ++u)
if (color[u] == -1)//found first uncolored vertex
break;
if (u == n)//no uncolored vertexex means all vertexes are colored
{
ways = (ways + 1) % mod;
return;
}
bool available[k];
memset(available, true, sizeof(available));
for (int v : adj[u])
if (color[v] != -1)//if the adjacent vertex colored, make its color unavailable
available[color[v]] = false;
for (int c = 0; c < k; ++c)
if (available[c])
{
color[u] = c;
greedy_coloring(adj, color);
color[u] = -1;//don't forgot to reset the color
}
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int T;
cin >> T;
while (T--)
{
cin >> n >> k;
vector<int> adj[n];
int c[n];
for (int i = 0; i < n; ++i)
{
cin >> c[i];
adj[i].push_back(c[i]);
adj[c[i]].push_back(i);
}
ways = 0;
int color[n];
memset(color, -1, sizeof(color));
greedy_coloring(adj, color);
cout << ways << "\n";
}
return 0;
}
Counting the number of colorings in a general graph is #P-hard, but this graph has some special structure, which I'll exploit in a minute after I enumerate some basic properties of counting colorings. The first observation is that, if the graph has a node with no neighbors, if we delete that node, the number of colorings decreases by a factor of k. The second observation is that, if a node has exactly one neighbor and we delete it, the number of colorings decreases by a factor of k-1. The third is that the number of colorings is equal to the product of the number of colorings for each connected component. The fourth is that we can delete all but one parallel edge.
Using these properties, it suffices to determine a formula for each connected component of the 2-core of this graph, which is a simple cycle of some length. Let P(n) and C(n) be the number of ways to color a path or cycle respectively with n nodes. We use the basic properties above to find
P(n) = k (k-1)^(n-1).
Finding a formula for C(n) I think requires the deletion contraction formula, which leads to a recurrence
C(3) = k (k-1) (k-2), i.e., three nodes of different colors;
C(n) = P(n) - C(n-1) = k (k-1)^(n-1) - C(n-1).
Multiply the above recurrence by (-1)^n.
(-1)^3 C(3) = -k (k-1) (k-2)
(-1)^n C(n) = (-1)^n k (k-1)^(n-1) - (-1)^n C(n-1)
= (-1)^n k (k-1)^(n-1) + (-1)^(n-1) C(n-1)
(-1)^n C(n) - (-1)^(n-1) C(n-1) = (-1)^n k (k-1)^(n-1)
Let D(n) = (-1)^n C(n).
D(3) = -k (k-1) (k-2)
D(n) - D(n-1) = (-1)^n k (k-1)^(n-1)
Now we can write D(n) as a telescoping sum:
D(n) = [sum_{i=4}^n (D(n) - D(n-1))] + D(3)
D(n) = [sum_{i=4}^n (-1)^n k (k-1)^(n-1)] - k (k-1) (k-2).
Break it down as two geometric sums which then cancel nicely.
D(n) = [sum_{i=4}^n (-1)^n ((k-1) + 1) (k-1)^(n-1)] - k (k-1) (k-2)
= sum_{i=4}^n (1-k)^n - sum_{i=4}^n (1-k)^(n-1) - k (k-1) (k-2)
= (1-k)^n - (1-k)^3 - k (k-1) (k-2)
= (1-k)^n - (1 - 3k + 3k^2 - k^3) - (2k - 3k^2 + k^3)
= (1-k)^n - (1-k)
C(n) = (-1)^n (1-k)^n - (-1)^n (1-k)
= (k-1)^n + (-1)^n (k-1).
Note that after removing all parallel edges, we can have at most n edges. This means that in any one connected component we can only see one cycle (and simple at that), which makes the combinatorics rather straightforward. (Cycles are only dependent on how many edges each node can spawn, which is capped at 1.)
Second example:
k = 3
<< 0 <-- 3
/ ^
/ ^
1 --> 2
Since cycles are self contained, any connection to one removes the possibility of another. In the example above, we cannot make a second cycle involving node 3 by adding more nodes, and the same issue would extend to any subsequent connected nodes.
It should be enough, therefore, to perform a search, separating out connected components and marking their node count and whether they contain a cycle. Given a connected component, where c of the nodes are part of a cycle and m nodes are not, we have the following formula (David Eisenstat helped me correct my combinatoric for the count of colourings of a cycle):
if the component has a cycle:
[(k - 1)^c + (-1)^c * (k - 1)] *
(k - 1)^(m)
otherwise:
k * (k - 1)^(m - 1)
As David Eisenstat noted, multiply all these results for the final tally.

Most efficient way to calculate a lower triangular matrix row index?

I am working with a lower triangular matrix, the function below calculates a row index of such matrix. How can I optimize it in terms of execution time?
The triangular matrix can hold at most N (N + 1) / 2 nonzero elements (where N is the matrix dimension - N x N).
I have a set of numbers 0, 1, 2, ..., N (N + 1) / 2 - 1 and from those, I have to calculate the matrix row index.
My current solution:
inline
unsigned int calc_row(unsigned int i)
{
return static_cast<unsigned int>(floor(sqrt(static_cast<double>(0.25 + 2 * i)) - 0.5));
}
// example:
calc_row(0) == 0;
calc_row(1) == 1;
calc_row(2) == 1;
calc_row(3) == 2;
calc_row(4) == 2;
calc_row(5) == 2;
Question:
1) Do you think my current solution is performance friendly?
2) If not how can I optimize it (in terms of the function execution time)?
If You believe an alternate method to calculate the row index would perform better, I am fine with it. Unfortunately the lookup table is not an option in my case.
EDIT #1:
I just had an idea: Is there a way to make a template metaprogramming version of the lookup table? A way to generate the row number at a compile time could prove to be a significant optimization. The biggest unsigned int i would be around 10 million in my case.
EDIT #2:
I edited the entire question because it caused a major confusion. I am sorry about that.
EDIT #3:
calc_row() calculates the formula: (sqrt(1 + 8 * i) - 1) / 2 which is the solution for the quadratic equation x(x + 1) / 2 = i. Where i is row index.
The main idea for this solution lies in the fact that the linear index for a triangular matrix with diagonal can be calculated as: i (i + 1) / 2 + j. Where i is row index and j is column index.

Adjacency matrix from gradient image

In reference to calculating adjacency matrix from gradient of image, I found something in python.
large-adjacency-matrix-from-image-in-python
I want to calculate an adjacency matrix based on 4 or 8 neighboring pixels. I also found http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408910/
How can i do this with 4 or 8 neighbors? I want to do this in C++. I already have gradient image for use.
For the sake of simplicity, assume that the gradient image is a square pixel bitmap of size n x n. Assign an ordinal number to each pixel by row-major counting starting in the northwestern corner.
Define the (n^2 x n^2) adjacency matrix A = (a_ij)_i,j=1..n^2 as follows:
a_i(i-n) = 1; i > n // northern neighbour
a_i(i+1) = 1; (i-1) mod n < n-1 // eastern neighbour
a_i(i-1) = 1; (i-1) mod n > 0 // western neighbour
a_i(i+n) = 1; i <= n^2 - n // southern neighbour
a_ij = 0; else
For 8 neighbours per pixel add
a_i(i-n+1) = 1; i > n and (i-n-1) mod n < n-1 // northeastern neighbour
a_i(i-n-1) = 1; i > n and (i-n-1) mod n > 0 // northwestern neighbour
a_i(i+n+1) = 1; i <= n^2 - n and (i+n-1) mod n < n-1 // southeastern neighbour
a_i(i+n-1) = 1; i <= n^2 - n and (i+n-1) mod n > 0 // southwestern neighbour
Instead of 1 you may assign the weights calculated from the gradient between adjacent pixels. Note that 0 entries would change to M, M representing a sufficiently large number ( infinite, since the respective cells are no neighbours, but that requires the implementation take special provisions ).
A will be sparse and will have a regular structure, for efficiency you should probably employ a class for sparse matrix processing. This SO question provides some suggestions.

Select K random lines from a text file

This is an extension of the original question of selecting a random line from a text of X lines where the probability of the text line selected is 1/X. The trick is to select the Jth line if you query a random variable Y with a range of [0,1) and it returns a value less than 1/J.
Now in this new version of the problem we have to select K random lines where K is less than X. I believe the probability for each line should be K/X.
I'm stuck on how to extend the original solution to K lines. Is it possible? any explanations would be great.
This can be solved using a generalization of the original algorithm. The intuition is as follows: maintain a list of k candidate lines from the file, which are initially seeded to the first k lines. Then, from that point forward, upon seeing the nth line of the file:
Choose a random value x between 1 and n, inclusive.
If x > k, ignore this element.
Otherwise, replace element x with the nth line of the file.
The proof that this correctly samples each element with probability k / n, where n is the total number of lines in the file, is as follows. Assume that n ≥ k. We prove by induction that each element has probability k / n of being picked by showing that after seeing z elements, each of those elements has probability k / z of being chosen. In particular, this means that after seeing n elements, each has probability k / n as required.
As our inductive basis, if we see exactly k elements, then each is picked. Thus the probability of being chosen is k / k, as required.
For the inductive step, assume that for some z ≥ k, each of the first z elements have been chosen with probability k / z and consider the (z + 1)st element. We choose a random natural number in the range [1, z + 1]. With probability k / (z + 1), we decide to choose this element, then evict some old element. This means that the new element is chosen with probability k / (z + 1). For each of the z original elements, the probability that it is chosen at this point is then the probability that we had chosen it after the first z elements were inspected (probability k / z, by our inductive hypothesis), and the probability that we retain it is z / (z + 1), since we replace it with probability 1 / (z + 1). Thus the new probability that it is chosen is (k / z) (z / (z + 1)) = k / (z + 1). Thus all of the first z + 1 elements are chosen with probability k / (z + 1), completing the induction.
Moreover, this algorithm runs in O(n) time and uses only O(k) space, meaning that the runtime is independent of the value of k. To see this, note that each iteration does O(1) work, and there are a total of O(n) interations.
If you're curious, I have an implementation of this algorithm as a C++ STL-style algorithm available here on my personal site.
Hope this helps!
First select the first element randomly out of the X lines using the first algorithm. Then select the second out of the remaining X-1 lines. Run this process K times.
The probability of any given set of K lines is (X choose K). I'll leave it up to you to verify that this algorithm gives the desired uniform distribution.

Finding size of perfect quad tree

I need to find the size of a perfect quad tree.
This means I have 1 root node that splits into 4 nodes that splits into 4 nodes etc.
so a quad tree of height 1 would be size 1
height 2 = size 5 (1 + 4)
height 3 = size 21 (1 + 4 + 16)
height 4 = size 85 (1 + 4 + 16 + 64)
etc..
I know that the size of a perfect binary tree can be found with: size = 2^(height+1)-1
So I believe that a similar equation exists for quad tree.
So what is it?
This is a geometric series. So the relevant formula is:
S = a * (1 - r^n) / (1 - r)
where a is the first value, r is the common ratio, n is the number of terms, and ^ denotes "to-the-power-of".
For a quad tree the algorithm is
((4^depth)-1)/3
For example with depth 3 you get
(64-1)/3 = 21
and if you count three layers you get
1 + 4 + 16 = 21
In my implementation I have even divided it into two arrays
where the size for all nodes that arent leave nodes is
((4^(depth-1))-1)/3
and leave nodes is
4^(depth-1)
I do these calculations at compile time with meta programming for pow, and a template argument for the depth. So i just allocate my nodes in two arrays.
Just in case anyone will need a code sample (in swift3)
public func tileCount(forLevelRange levelRange: Range<UInt>) -> UInt64 {
var tileCount: UInt64 = 0
for level in levelRange.lowerBound ..< levelRange.upperBound {
tileCount += UInt64(pow(Double(1 << level), 2) )
}
return tileCount
}