I have been solving a problem but couldn't get to an efficient solution.
Problem Statement:
Given a tree with N vertices and N-1 edges. Each vertex v denotes a value given by C[v] where C[ ] is an array. For this tree we have to perform Q queries which is given below:
A query is given by two integers u,v. Let us define a value A which is given by the product of the values of all the nodes which lie on the simple path between u and v.
More formally,if the simple path between u and v is [u,a,b...,v], then
A = C[u] * C[a] * C[b] * ... * C[v].
For this query we need to output the number of divisors of A.
Constraints :
1<=N,Q<=100000, 1<=C[i]<=1000000 for all 1<=i<=N.
My approach : Since the product could be very large, I am storing the prime factors and its count as answer.
I have first precomputed the LCA for the tree using binary lifting. Then I have defined a map< int, int > for each node which stores the prime divisors and its count of the product from root upto the current node. This could be achieved by a simple DFS and a separate function for merging the maps
(Note: I am finding the prime factorization of a node using sieve in O(LogN))
Then for each query of the type [u,v], I am finding the LCA (let's say L). Now I am subtracting the map of L from the map of u (Note: map of L will always be a subset of map of u) and similarly for node v.
Now I have all the prime factors and its count of the product.
Now simply using the result that for a number K = a^p * b^q * c^r..., the number of divisors D = (p+1) * (q+1) * (r+1)... we get out answer.
Time Complexity Analysis :
Let's define M as number of distinct primes up to 1000000.
DFS will run in O(N*M) time :
For each node, while combining the map the worst case would be when all primes up to M would be present.
LCA pre-computation : O(NLogN) time
Sieve pre-computation : O(NLogLogN) time
Each query will run in O(M+LogN) time : O(LogN) for finding LCA and O(M) for subtracting the map to find the prime factors of the product.
So Time Complexity: O(NLogN + NM + NLogLogN + QMLogN).
Now the issue is that in worst case M is of the order 50000. So this would blow up the time complexity. Is there any other efficient method?
Answer must be reported modulo 1e9+7
Related
What is the time complexity of inorder,postorder and preorder traversal of binary trees in data structures?? Is it O(n) or O(log n) or O(n^2)??
In-order, Pre-order, and Post-order traversals are Depth-First traversals.
For a Graph, the complexity of a Depth First Traversal is O(n + m), where n is the number of nodes, and m is the number of edges.
Since a Binary Tree is also a Graph, the same applies here.
The complexity of each of these Depth-first traversals is O(n+m).
Since the number of edges that can originate from a node is limited to 2 in the case of a Binary Tree, the maximum number of total edges in a Binary Tree is n-1, where n is the total number of nodes.
The complexity then becomes O(n + n-1), which is O(n).
O(n), because you traverse each node once. Or rather - the amount of work you do for each node is constant (does not depend on the rest of the nodes).
Introduction
Hi
I was asked this question today in class, and it is a good question! I will explain here and hopefully get my more formal answer reviewed or corrected where it is wrong. :)
Previous Answers
The observation by #Assaf is also correct since binary tree traversal travels recursively to visit each node once.
But!, since it is a recursive algorithm, you often have to use more advanced methods to analyze run-time performance. When dealing with a sequential algorithm or one that uses for-loops, using summations will often be enough. So, what follows is a more detailed explanation of this analysis for those who are curious.
The Recurrence
As previously stated,
T(n) = 2*T(n/2) + 1
where T(n) is the number of operations executed in your traversal algorithm (in-order, pre-order, or post-order makes no difference.
Explanation of the Recurrence
There are two T(n) because inorder, preorder, and postorder traversals all call themselves on the left and right child node. So, think of each recursive call as a T(n). In other words, **left T(n/2) + right T(n/2) = 2 T(n/2) **. The "1" comes from any other constant time operations within the function, like printing the node value, et cetera. (It could honestly be a 1 or any constant number & the asymptotic run-time still computes to the same value. Explanation follows.).
Analysis
This recurrence actually can be analyzed using big theta using the masters' theorem. So, I will apply it here.
T(n) = 2*T(n/2) + constant
where constant is some constant (could be 1 or any other constant).
Using the Masters' Theorem , we have T(n) = a*T(n/b) + f(n).
So, a=2, b=2, f(n) = constant since f(n) = n^c = 1, then it follows that c = 0 since f(n) is a constant.
From here, we can see that a = 2 and b^c = 2 ^0 = 1. So, a>b^c or 2>2^0. So, c < logb(a) or 0 < log2(2)
From here we have T(n) = BigTheta(n^{logb(a)}) = BigTheta(n^1) = BigTheta(n)
If your not famliar with BigTheta(n), it is "similar" ( please bear with me :) ) to O(n) but it is a "tighter bound" or tighter approximation of the run-time. So, BigTheta(n) is both worst-case O(n), and best-case BigOmega(n) run-time.
I hope this helps. Take care.
O(n),I would say .
I am doing for a balanced tree,applicable for all the trees.
Assuming that you use recursion,
T(n) = 2*T(n/2) + 1 ----------> (1)
T(n/2) for left sub-tree and T(n/2) for right sub-tree and '1' for verifying the base case.
On Simplifying (1) you can prove that the traversal(either inorder or preorder or post order) is of order O(n).
Travesal is O(n) for any order - because you are hitting each node once. Lookup is where it can be less than O(n) IF the tree has some sort of organizing schema (ie binary search tree).
T(n) = 2T(n/2)+ c
T(n/2) = 2T(n/4) + c => T(n) = 4T(n/4) + 2c + c
similarly T(n) = 8T(n/8) + 4c+ 2c + c
....
....
last step ... T(n) = nT(1) + c(sum of powers of 2 from 0 to h(height of tree))
so Complexity is O(2^(h+1) -1)
but h = log(n)
so, O(2n - 1) = O(n)
Depth first traversal of a binary tree is of order O(n).
Algo -- <b>
PreOrderTrav():-----------------T(n)<b>
if root is null---------------O(1)<b>
return null-----------------O(1)<b>
else:-------------------------O(1)<b>
print(root)-----------------O(1)<b>
PreOrderTrav(root.left)-----T(n/2)<b>
PreOrderTrav(root.right)----T(n/2)<b>
If the time complexity of the algo is T(n) then it can be written as T(n) = 2*T(n/2) + O(1).
If we apply back substitution we will get T(n) = O(n).
Consider a skewed binary tree with 3 nodes as 7, 3, 2. For any operation like for searching 2, we have to traverse 3 nodes, for deleting 2 also, we have to traverse 3 nodes and for for inserting 1 also, we have to traverse 3 nodes. So, binary tree has worst case complexity of O(n).
I have been solving a problem but then got stuck upon its subpart which is as follows:
Given an array of N elements whose ith element is A[i] and we are given Q queries of the type [L,R].
For each query output the number of divisors of product from Lth element to Rth element.
More formally, for each query lets define P as P = A[L] * A[L+1] * A[L+2] * ...* A[R].
Output the number of divisors of P modulo 998244353.
Constraints : 1<= N,Q <= 100000, 1<= A[i] <= 1000000.
My Approach,
For each index i, I have defined a map< int, int > which stores the prime divisor and its count in the product up to [1, i].
I am extracting the prime divisors of a number in O(LogN) using Sieve.
Then for each query (lets say {L,R} ), I am iterating through the map of Lth element and subtracting the count of each each key from the map of Rth element.
And then I am answering the query using the result:
if N = a^p * b^q * c^r ...(a,b,c being primes)
the number of divisors = (p+1)(q+1)(r+1)..
The time complexity of above solution is O(ND + QD), where D = number of distinct prime numbers upto 1000000. In worst case D = 78498.
Is there more efficient solution than this?
There is a more efficient solution for this. But it is slightly complicated. Here are steps to get to the necessary data structure.
Define a data type prime_factor that is a struct that contains a prime and a count.
Define a data type prime_factorization that is a vector of the first data type in ascending size of the primes. This can store the factorization of a number.
Write a function that takes a number, and turns its prime factorization into a prime_factorization
Write a function that takes 2 prime_factorization vectors and merges them into the factorization of the product of the two.
For each number in your array, compute its prime factorization. That gets stored in an array.
For each pair in your array, compute the prime factorization of the product. We will only need half of them. So elements 0, 1 go into one factorization, 2, 3 into the next and so on.
Repeat step 6 O(log(N)) times. So you have a vector of the factorization of each number, pairs, fours, eights, and so on. This results in approximately 2N precomputed factorization vectors. Most vectors are small though a few can be up to O(D) in size (where D is the number of distinct primes). Most of the merges should be very, very fast.
And now you have all of your data prepared. It can't take more than O(log(N)) times the space that storing the prime factors required by itself. (Less than that normally, though, because repeats among the small primes get gathered together in one prime_factor.)
Any range is the union of at most O(log(N)) of these computed vectors. For example the range 10..25 can be broken up into 10..11, 12..15, 16..24, 25. Arrange these intervals from smallest to largest and merge them. Then compute your answer from the result.
An exact analysis is complicated. But I assure you that query time is bounded above by O(Q * D * log(N)) and normally is much less than that.
UPDATE:
How do you find those intervals?
The answer is that you need to identify the number divisible by the highest power of 2 in the range, and then fill out both sides from there. And you figure that out by dividing by 2 (rounding down) until the range is of length 1. Then multiply the top boundary by 2 to find that mid-point.
For example if your range was 35-53 you would start by dividing by 2 to get 35-53, 17-26, 8-13, 4-6, 2-3. That was 2^4 we divided by. our power of 2 mid-point is 3*2^4 = 48. Our intervals above that midpoint are then 48-52, 53-53. Our intervals below are 40-47, 36-39, 35-35. And each of them is of length a power of 2 and starts at a number divisible by that power of 2.
I'm given 2 lists, a and b. Both them contain only integers. min(a) > 0, max(a) can be upto 1e10 and max(abs(b)) can be upto 1e5. I need to find the number of tuples (x, y, z), where x is in a and y, z are in b such that x = -yz. The number of elements in a and b can be upto 1e5.
My attempt:
I was able to come up with a naive n^2 algorithm. But, since the size can be upto 1e5, I need to come up with a nlogn solution (max) instead. What I did was:
Split b into bp and bn where the first one contains all the positive numbers and second one contains all the negative numbers and created their maps.
Then:
2.1 I iterate over a to get x's.
2.2 Iterate over the shorter one of bn and bp. Check if the current element divides x. If yes, use map.find() to see if z = -x/y is present or not.
What could be an efficient way to do this?
There's no O(n*logn) because: z = -x/y <=> log(z) = log(-x) - log(y)
As https://stackoverflow.com/users/12299000/kaya3 has mentioned, it is 3SUM#3_different_arrays. According to wikipedia:
Kane, Lovett, and Moran showed that the 6-linear decision tree complexity of 3SUM is O(n*log^2n)
Step 1: Sort the elements in list b (say bsorted)
Step 2: For a value x in a, go through the list bsorted for every value y in bsorted and binary search for (-x/y) on bsorted to find z
Complexity |a|=m and |b|=n complexity is O(mnlogn)
Here's an untested idea. Create a trie from the elements of b, where the "characters" are ordered prime numbers. For each element in a, walk all valid paths in the trie (DFS or BFS, where the test is being able to divide further by the current node), and for each leaf reached, check if the remaining element (after dividing at each node) exists in b. (We may need to handle duplicates by storing counts of each "word" and using simple combinatorics.)
So, I was asked this question in an interview. Given a group of numbers (not necessarily distinct), I have to find the multiplication of GCD's of all possible subsets of the given group of numbers.
My approach which I told the interviewer:
1. Recursively generate all possible subsets of the given set.
2a. For a particular subset of the given set:
2b. Find GCD of that subset using the Euclid's Algorithm.
3. Multiply it in the answer being obtained.
Assume GCD of an empty set to be 1.
However, there will be 2^n subsets and this won't work optimally if the n is large. How can I optimise it?
Assume that each array element is an integer in the range 1..U for some U.
Let f(x) be the number of subsets with GCD(x). The solution to the problem is then the sum of d^f(d) for all distinct factors 1 <= d <= U.
Let g(x) be the number of array elements divisible by x.
We have
f(x) = 2^g(x) - SUM(x | y, f(y))
We can compute g(x) in O(n * sqrt(U)) by enumerating all divisors of every array element. f(x) can be computed in O(U log U) from high to low values, by enumerating every multiple of x in the straightforward manner.
Pre - Requisite :
Fermat's little theorem (there is a generalised theorem too) , simple maths , Modular exponentiation
Explanation : Notations : A[] stands for our input array
Clearly the constraints 1<=N<=10^5 , tell me that either you need a O(N * LOG N ) solution , dont try to think DP as its complexity according to me will be N * max(A[i]) i.e. approx. 10^5 * 10 ^ 6 . Why? because you need the GCD of the subsets to make a transition.
Ok , moving on
We can think of clubbing the subsets with the same GCD so as to make the complexity.
So , lets decrement an iterator i from 10^6 to 1 trying to make the set with GCD i !
Now to make the subset with GCD(i) I can club it with any i*j where j is a non negative Integer. Why ?
GCD(i , i*j ) = i
Now ,
We can build a frequency table for any element as the number is quite reachable!
Now , during the contest what I did was , keep the number of subsets with gcd(i) at f2[i]
hence what we do is sum frequency of all elements from j*i where j varies from 1 to floor(i/j)
now the subsets with a common divisor(and not GCD) as i is (2^sum - 1) .
Now we have to subtract from this sum the subsets with GCD greater than i and having i as a common divisor of gcd as i.
This can also be done within the same loop by taking summation of f2[i*j] where j varies from 1 to floor(i/j)
Now the subsets with GCD i equal to 2^sum -1 - summation of f2[ij] Just multiply i ( No . of subsets with GCD i times ) i.e. power ( i , 2^sum -1 - summation of f2[ij] ) . But now to calculate this the exponent part can overflow but you can take its % with given MOD-1 as MOD was prime! (Fermat little theorem) using modular exponentiation
Here is a snippet of my code as I am unsure that can we post the code now!
for(i=max_ele; i >= 1;--i)
{
to_add=F[i];
to_subtract = 0 ;
for(j=2 ;j*i <= max_ele;++j)
{
to_add+=F[j*i];
to_subtract+=F2[j*i];
to_subtract>=(MOD-1)?(to_subtract%=(MOD-1)):0;
}
subsets = (((power(2 , to_add , MOD-1) ) - 1) - to_subtract)%(MOD-1) ;
if(subsets<0)
subsets = (subsets%(MOD-1) +MOD-1)%(MOD-1);
ans = ans * power(i , subsets , MOD);
F2[i]= subsets;
ans %=MOD;
}
I feel like I had complicated the things by using F2, I feel like we can do it without F2 by not taking j = 1. but it's okay I haven't thought about it and this is how I managed to get AC .
I'm implementing a dynamic kD-Tree in array representation (storing the nodes in std::vector) in breadth-first fashion. Each i-th non-leaf node have a left child at (i<<1)+1 and a right child at (i<<1)+2. It would support incremental insertion of points and collection of points.
However I'm facing problem determining the required number of possible nodes to incrementally preallocate space.
I've found a formula on the web, which seems to be wrong:
N = min(m − 1, 2n − ½m − 1),
where m is the smallest power of 2 greater than or equal to n, the
number of points.
My implementation of the formula is the following:
size_t required(size_t n)
{
size_t m = nextPowerOf2(n);
return min(m - 1, (n<<1) - (m>>1) - 1);
}
function nextPowerOf2 returns a power of 2 largest or equal to n
Any help would be appreciated.
Each node of a kd-tree divides the space into two spaces. Hence, the number of nodes in the kd-tree depends on how you perform this division:
1) If you divide them in the midpoint of the space (that is, if the space is from x1 to x2, you divide the space with the x3=(x1+x2)/2 line), then:
i) Each point will be allocated its own node, and
ii) Each intermediate node will be empty.
In this case, the number of nodes will depend on how large the coordinates of the points are. If the coordinates are bounded by |X|, then the total number of nodes in the kd-tree should be slightly less than log |X| * n (more precisely, around log |X| * n - n log n + 2n) in the worst case. To see this, consider the following way to add the points: you add multiple collections, each collection has two extremely nearby points located at random. For each pair of point, the tree will need to continuously divide the space log |X| times, and if log |X| is significantly larger than log n, creating log |X| intermediate nodes in the process.
2) If you divide them by using a point as a dividing line, then each node (including the intermediate nodes) will contain a point. Thus, the total number of nodes is simply n. However, note that using a point to divide the space may yield to a very bad performance if the points are not given in a random order (for example, if the points are given in an ascending order of X, the depth of the tree would be O(n). For comparison, the depth of the tree in (1) is at most O(log |X|) ).