Intuition behind using the Cartesian product for finding number of unique BSTs - c++

I am solving a LeetCode question. The question is:
Given n, how many structurally unique BSTs can be generated, that store the values from 1...n? For e.g., for n=3, a total of 5 unique BSTs can be generated as follows:
1 3 3 2 1
\ / / / \ \
3 2 1 1 3 2
/ / \ \
2 1 2 3
The maximum upvoted solution makes use of DP and the following recursive formula:
G(n) = G(0) * G(n-1) + G(1) * G(n-2) + … + G(n-1) * G(0)
where G(n) represents the number of unique BSTs that can be generated for n. The code is as follows:
class Solution {
public:
int numTrees(int n) {
vector<int> G(n+1);
G[0]=G[1]=1;
for(int i=2; i<=n; i++)
for(int j=1; j<=i; j++)
G[i]+=G[j-1]*G[i-j];
return G[n];
}
};
While I more-or-less do understand what is going on, I didn't understand why we take a Cartesian product (instead of simple addition, which is more intuitive). As per my understanding:
G[i] += G[j-1] * G[i-j];
should instead be:
G[i] += G[j-1] + G[i-j]; //replaced '*' with a '+'
This is so because, I think the number of unique BSTs possible with i as the current root should be the sum(?) of the number of BSTs for its left and right subtrees. I did try a few examples but somehow the numbers get multiplied magically in the original solution (with a *) and the final answer appears in G[n].
Could someone please provide an intuitive explanation for using Cartesian product instead of sum?
Note: The original question is here and the solution is here. Also, the original code is in Java while I have posted the C++ variation that I wrote above.

You can go by mathematical induction and then apply it to the sub-problems to get the result. Or simply just check for small values and then go for higher values.
For example:-
No of nodes BST representation
1 --> [1]
2 --> [2] [1]
/ \
[1] [2]
3 --> [1]
\
[2]
\
[3]
[2]
/ \
[1] [3]
[3]
/
[2]
/
[1]
4 -->
[1]
/ \
NUM{} NUM of keys with 3 val NUM{2,3,4}
[2]
/ \
NUM{1} NUM{3,4}
[3]
/ \
NUM{1,2} NUM{4}
[4]
/ \
NUM{1,2,3} NUM{}
From the 4th case you can clearly understand that we have to simply multiply the number of possible ways to group the left and right subtree in each of the trees. And for a given number of values we have to add them. That's why cartesian product is being used.
The product basically gives us all possible order the whole true can have.
For example:
G[i] += G[j-1] * G[i-j]; Here j-1 nodes are to the left( we can assume
without loss of generality) and i-j nodes to the right sub-tree. And
now you can arrange the left sub-tree in G[j-1] ways and similarly for
right sub-tree in G[i-j] ways. Now think how many ways can you arrange
the original tree which has this left and rigth subtree? It would
multiply. Because each combination of left and right subtree will give
rise to a unique tree representation.
This also explains why we define G[0]=1 because it conforms to the way we do things here. And also the number of arrangements with no value is also an arrangement. So it is considered 1.

Related

A problem of taking combination for set theory

Given an array A with size N. Value of a subset of Array A is defined as product of all numbers in that subset. We have to return the product of values of all possible non-empty subsets of array A %(10^9+7).
E.G. array A {3,5}
` Value{3} = 3,
Value{5} = 5,
Value{3,5} = 5*3 = 15
answer = 3*5*15 %(10^9+7).
Can someone explain the mathematics behind the problem. I am thinking of solving it by combination to solve it efficiently.
I have tried using brute force it gives correct answer but it is way too slow.
Next approach is using combination. Now i think that if we take all the sets and multiply all the numbers in those set then we will get the correct answer. Thus i have to find out how many times a number is coming in calculation of answer. In the example 5 and 3 both come 2 times. If we look closely, each number in a will come same number of times.
You're heading in the right direction.
Let x be an element of the given array A. In our final answer, x appears p number of times, where p is equivalent to the number of subsets of A possible that include x.
How to calculate p? Once we have decided that we will definitely include x in our subset, we have two choices for the rest N-1 elements: either include them in set or do not. So, we conclude p = 2^(N-1).
So, each element of A appears exactly 2^(N-1) times in the final product. All remains is to calculate the answer: (a1 * a2 * ... * an)^p. Since the exponent is very large, you can use binary exponentiation for fast calculation.
As Matt Timmermans suggested in comments below, we can obtain our answer without actually calculating p = 2^(N-1). We first calculate the product a1 * a2 * ... * an. Then, we simply square this product n-1 times.
The corresponding code in C++:
int func(vector<int> &a) {
int n = a.size();
int m = 1e9+7;
if(n==0) return 0;
if(n==1) return (m + a[0]%m)%m;
long long ans = 1;
//first calculate ans = (a1*a2*...*an)%m
for(int x:a){
//negative sign does not matter since we're squaring
if(x<0) x *= -1;
x %= m;
ans *= x;
ans %= m;
}
//now calculate ans = [ ans^(2^(n-1)) ]%m
//we do this by squaring ans n-1 times
for(int i=1; i<n; i++){
ans = ans*ans;
ans %= m;
}
return (int)ans;
}
Let,
A={a,b,c}
All possible subset of A is ={{},{a},{b},{c},{a,b},{b,c},{c,a},{a,b,c,d}}
Here number of occurrence of each of the element are 4 times.
So if A={a,b,c,d}, then numbers of occurrence of each of the element will be 2^3.
So if the size of A is n, number of occurrence of eachof the element will be 2^(n-1)
So final result will be = a1^p*a2^pa3^p....*an^p
where p is 2^(n-1)
We need to solve x^2^(n-1) % mod.
We can write x^2^(n-1) % mod as x^(2^(n-1) % phi(mod)) %mod . link
As mod is a prime then phi(mod)=mod-1.
So at first find p= 2^(n-1) %(mod-1).
Then find Ai^p % mod for each of the number and multiply with the final result.
I read the previous answers and I was understanding the process of making sets. So here I am trying to put it in as simple as possible for people so that they can apply it to similar problems.
Let i be an element of array A. Following the approach given in the question, i appears p number of times in final answer.
Now, how do we make different sets. We take sets containing only one element, then sets containing group of two, then group of 3 ..... group of n elements.
Now we want to know for every time when we are making set of certain numbers say group of 3 elements, how many of these sets contain i?
There are n elements so for sets of 3 elements which always contains i, combinations are (n-1)C(3-1) because from n-1 elements we can chose 3-1 elements.
if we do this for every group, p = [ (n-1)C(x-1) ] , m going from 1 to n. Thus, p= 2^(n-1).
Similarly for every element i, p will be same. Thus we get
final answer= A[0]^p *A[1]^p...... A[n]^p

Fibonacci sequence faster, but with different starting numbers (F[n]=F[n-1]+F[n-2])

(beginner here)
I want to know how to find n-th number of the sequence F[n]=F[n-1]+F[n-2].
Input:
F[0] = a;
F[1] = b;
a,b < 101
N < 1000000001
M < 8; M=10^M;
a and b are starting sequence numbers.
n is the n-th number of the sequence i need to find.
M is modulo, the number gets very large quickly so F[n]=F[n]%10^M, we find the remainder, because only last digits of the n-th number are needed
The recursive approach is too slow:
int fib(int n)
{
if (n <= 1)
return n;
return fib(n-1) + fib(n-2);
}
The dynamic programming solution which takes O(n) time is also too slow:
f[i] = f[i-1] + f[i-2];
While there are solutions on how to find n-th number faster if first numbers of the sequence are 0 and 1 (n-th number can be found in O(log n)) by using this formula:
If n is even then k = n/2:
F(n) = [2*F(k-1) + F(k)]*F(k)
If n is odd then k = (n + 1)/2
F(n) = F(k)*F(k) + F(k-1)*F(k-1)
(link to formula and code implementation with it: https://www.geeksforgeeks.org/program-for-nth-fibonacci-number/)
But this formula does not work if starting numbers are something like 25 and 60. And the recursive approach is too slow.
So I want to know how can I find the n-th number of a sequence faster than O(n). Partial code would be helpful.
Thank you.
This matrix:
A = / 1 1 \
\ 1 0 /
When multiplied by the column vector (fn+1, fn), where fn is the nth number in a Fibonacci sequence, will give you the column vector (fn+2, fn+1), i.e. it will advance you by one step. This works no matter what the initial elements of the sequence were.
For example:
/ 1 1 \ / 8 \ = / 13 \
\ 1 0 / \ 5 / \ 8 /
So the nth fibonacci number is the first element of An-1v, where v is a column vector containing f1 and f0, the first two numbers in your sequence.
Therefore, if you can quickly calculate An-1 modulo some number, this will give you fn. This can be done using Exponentiation by squaring, which works in O(logn). Just make sure to perform the modulo after every multiplication and addition to prevent the numbers from getting too big.

how to find the minimum number of primatics that sum to a given number

Given a number N (<=10000), find the minimum number of primatic numbers which sum up to N.
A primatic number refers to a number which is either a prime number or can be expressed as power of prime number to itself i.e. prime^prime e.g. 4, 27, etc.
I tried to find all the primatic numbers using seive and then stored them in a vector (code below) but now I am can't see how to find the minimum of primatic numbers that sum to a given number.
Here's my sieve:
#include<algorithm>
#include<vector>
#define MAX 10000
typedef long long int ll;
ll modpow(ll a, ll n, ll temp) {
ll res=1, y=a;
while (n>0) {
if (n&1)
res=(res*y)%temp;
y=(y*y)%temp;
n/=2;
}
return res%temp;
}
int isprimeat[MAX+20];
std::vector<int> primeat;
//Finding all prime numbers till 10000
void seive()
{
ll i,j;
isprimeat[0]=1;
isprimeat[1]=1;
for (i=2; i<=MAX; i++) {
if (isprimeat[i]==0) {
for (j=i*i; j<=MAX; j+=i) {
isprimeat[j]=1;
}
}
}
for (i=2; i<=MAX; i++) {
if (isprimeat[i]==0) {
primeat.push_back(i);
}
}
isprimeat[4]=isprimeat[27]=isprimeat[3125]=0;
primeat.push_back(4);
primeat.push_back(27);
primeat.push_back(3125);
}
int main()
{
seive();
std::sort(primeat.begin(), primeat.end());
return 0;
}
One method could be to store all primatics less than or equal to N in a sorted list - call this list L - and recursively search for the shortest sequence. The easiest approach is "greedy": pick the largest spans / numbers as early as possible.
for N = 14 you'd have L = {2,3,4,5,7,8,9,11,13}, so you'd want to make an algorithm / process that tries these sequences:
13 is too small
13 + 13 -> 13 + 2 will be too large
11 is too small
11 + 11 -> 11 + 4 will be too large
11 + 3 is a match.
You can continue the process by making the search function recurse each time it needs another primatic in the sum, which you would aim to have occur a minimum number of times. To do so you can pick the largest -> smallest primatic in each position (the 1st, 2nd etc primatic in the sum), and include another number in the sum only if the primatics in the sum so far are small enough that an additional primatic won't go over N.
I'd have to make a working example to find a small enough N that doesn't result in just 2 numbers in the sum. Note that because you can express any natural number as the sum of at most 4 squares of natural numbers, and you have a more dense set L than the set of squares, so I'd think it rare you'd have a result of 3 or more for any N you'd want to compute by hand.
Dynamic Programming approach
I have to clarify that 'greedy' is not the same as 'dynamic programming', it can give sub-optimal results. This does have a DP solution though. Again, i won't write the final process in code but explain it as a point of reference to make a working DP solution from.
To do this we need to build up solutions from the bottom up. What you need is a structure that can store known solutions for all numbers up to some N, this list can be incrementally added to for larger N in an optimal way.
Consider that for any N, if it's primatic then the number of terms for N is just 1. This applies for N=2-5,7-9,11,13,16,17,19. The number of terms for all other N must be at least two, which means either it's a sum of two primatics or a sum of a primatic and some other N.
The first few examples that aren't trivial:
6 - can be either 2+4 or 3+3, all the terms here are themselves primatic so the minimum number of terms for 6 is 2.
10 - can be either 2+8, 3+7, 4+6 or 5+5. However 6 is not primatic, and taking that solution out leaves a minimum of 2 terms.
12 - can be either 2+10, 3+9, 4+8, 5+7 or 6+6. Of these 6+6 and 2+10 contain non-primatics while the others do not, so again 2 terms is the minimum.
14 - ditto, there exist two-primatic solutions: 3+11, 5+9, 7+7.
The structure for storing all of these solutions needs to be able to iterate across solutions of equal rank / number of terms. You already have a list of primatics, this is also the list of solutions that need only one term.
Sol[term_length] = list(numbers). You will also need a function / cache to look up some N's shortest-term-length, eg S(N) = term_length iif N in Sol[term_length]
Sol[1] = {2,3,4,5 ...} and Sol[2] = {6,10,12,14 ...} and so on for Sol[3] and onwards.
Any solution can be found using one term from Sol[1] that is primatic. Any solution requiring two primatics will be found in Sol[2]. Any solution requiring 3 will be in Sol[3] etc.
What you need to recognize here is that a number S(N) = 3 can be expressed Sol[1][a] + Sol[1][b] + Sol[1][c] for some a,b,c primatics, but it can also be expressed as Sol[1][a] + Sol[2][d], since all Sol[2] must be expressible as Sol[1][x] + Sol[1][y].
This algorithm will in effect search Sol[1] for a given N, then look in Sol[1] + Sol[K] with increasing K, but to do this you will need S and Sol structures roughly in the form shown here (or able to be accessed / queried in a similar manner).
Working Example
Using the above as a guideline I've put this together quickly, it even shows which multi-term sum it uses.
https://ideone.com/7mYXde
I can explain the code in-depth if you want but the real DP section is around lines 40-64. The recursion depth (also number of additional terms in the sum) is k, a simple dual-iterator while loop checks if a sum is possible using the kth known solutions and primatics, if it is then we're done and if not then check k+1 solutions, if any. Sol and S work as described.
The only confusing part might be the use of reverse iterators, it's just to make != end() checking consistent for the while condition (end is not a valid iterator position but begin is, so != begin would be written differently).
Edit - FYI, the first number that takes at least 3 terms is 959 - had to run my algorithm to 1000 numbers to find it. It's summed from 6 + 953 (primatic), no matter how you split 6 it's still 3 terms.

Can I check whether the given number can be the sum of any arithmetic progression having n terms in it?

Is it possible for a given number s to just check that Is there any possible arithmetic progression having n terms and sum of these n terms results in s.
where starting element and difference of AP must not be zero.
for eg:
s = 24 & n = 4
yes, it is possible where AP is 3 5 7 9.
Note: I just want to check whether it is possible or not . No need to find the actual array. 0 < n < 10^9 & 0 < s < 10^18.
My Attempt:
we know that sum of an AP is equal to s = n(first+last)/2;
therefore first+last = 2*s/n;
2*s/n should be an integer.
we also know that last = first+(n-1)diff;
so my expression becomes 2*first + (n-1)diff = 2*s/n;
first = (2*s/n - (n-1)diff)/2; and it should be an integer for a particular value of diff.
this is my approach to doing this but its time complexity is too large to cover 10^18.
Please help. :)
Case 1: a and d are real numbers
Using s for the sum, n for the number of terms, a for the first term and d for the difference between terms, you get the result
2 * s / n = 2 * a + (n - 1) * d
This gives you one degree of freedom. So you can see that it's always possible to pick an infinite set of a and d values that satisfies this result.
Case 2: a and d are integer numbers
You can see from my result that if a and d are constrained to be integers, then the decomposition is only possible if the left hand side of this equation is also an integer; that is 2 * s is a multiple of n. (In your case, 2 * s is 48 which is a multiple of 4. So yes, there exists an integral a and d in that case).
Let a be the initial term of the progression and d its common difference. You want to solve the linear diophantine equation
n * a + (n*(n-1)/2) * d = s
The solution will exist if and only if s is a multiple ofgcd(n, n*(n-1)/2).
If n is odd, gcd(n, n*(n-1)/2) = n * gcd(1, (n-1)/2) = n.
If n is even, gcd(n, n*(n-1)/2) = (n/2) * gcd(2, n-1) = n/2.
In any case, the solution exists if and only if 2 * s is a multiple of n.
I think this is not possible in every case but if you can provide some more data then it can.
because there is multiple possibilities of same AP sum.
so in case you will give some hint it is possible

Dynamic programming algorithm N, K problem

An algorithm which will take two positive numbers N and K and calculate the biggest possible number we can get by transforming N into another number via removing K digits from N.
For ex, let say we have N=12345 and K=3 so the biggest possible number we can get by removing 3 digits from N is 45 (other transformations would be 12, 15, 35 but 45 is the biggest). Also you cannot change the order of the digits in N (so 54 is NOT a solution). Another example would be N=66621542 and K=3 so the solution will be 66654.
I know this is a dynamic programming related problem and I can't get any idea about solving it. I need to solve this for 2 days, so any help is appreciated. If you don't want to solve this for me you don't have to but please point me to the trick or at least some materials where i can read up more about some similar issues.
Thank you in advance.
This can be solved in O(L) where L = number of digits. Why use complicated DP formulas when we can use a stack to do this:
For: 66621542
Add a digit on the stack while there are less than or equal to L - K digits on the stack:
66621. Now, remove digits from the stack while they are less than the currently read digit and put the current digit on the stack:
read 5: 5 > 2, pop 1 off the stack. 5 > 2, pop 2 also. put 5: 6665
read 4: stack isnt full, put 4: 66654
read 2: 2 < 4, do nothing.
You need one more condition: be sure not to pop off more items from the stack than there are digits left in your number, otherwise your solution will be incomplete!
Another example: 12345
L = 5, K = 3
put L - K = 2 digits on the stack: 12
read 3, 3 > 2, pop 2, 3 > 1, pop 1, put 3. stack: 3
read 4, 4 > 3, pop 3, put 4: 4
read 5: 5 > 4, but we can't pop 4, otherwise we won't have enough digits left. so push 5: 45.
Well, to solve any dynamic programming problem, you need to break it down into recurring subsolutions.
Say we define your problem as A(n, k), which returns the largest number possible by removing k digits from n.
We can define a simple recursive algorithm from this.
Using your example, A(12345, 3) = max { A(2345, 2), A(1345, 2), A(1245, 2), A(1234, 2) }
More generally, A(n, k) = max { A(n with 1 digit removed, k - 1) }
And you base case is A(n, 0) = n.
Using this approach, you can create a table that caches the values of n and k.
int A(int n, int k)
{
typedef std::pair<int, int> input;
static std::map<input, int> cache;
if (k == 0) return n;
input i(n, k);
if (cache.find(i) != cache.end())
return cache[i];
cache[i] = /* ... as above ... */
return cache[i];
}
Now, that's the straight forward solution, but there is a better solution that works with a very small one-dimensional cache. Consider rephrasing the question like this: "Given a string n and integer k, find the lexicographically greatest subsequence in n of length k". This is essentially what your problem is, and the solution is much more simple.
We can now define a different function B(i, j), which gives the largest lexicographical sequence of length (i - j), using only the first i digits of n (in other words, having removed j digits from the first i digits of n).
Using your example again, we would have:
B(1, 0) = 1
B(2, 0) = 12
B(3, 0) = 123
B(3, 1) = 23
B(3, 2) = 3
etc.
With a little bit of thinking, we can find the recurrence relation:
B(i, j) = max( 10B(i-1, j) + ni , B(i-1, j-1) )
or, if j = i then B(i, j) = B(i-1, j-1)
and B(0, 0) = 0
And you can code that up in a very similar way to the above.
The trick to solving a dynamic programming problem is usually to figuring out what the structure of a solution looks like, and more specifically if it exhibits optimal substructure.
In this case, it seems to me that the optimal solution with N=12345 and K=3 would have an optimal solution to N=12345 and K=2 as part of the solution. If you can convince yourself that this holds, then you should be able to express a solution to the problem recursively. Then either implement this with memoisation or bottom-up.
The two most important elements of any dynamic programming solution are:
Defining the right subproblems
Defining a recurrence relation between the answer to a sub-problem and the answer to smaller sub-problems
Finding base cases, the smallest sub-problems whose answer does not depend on any other answers
Figuring out the scan order in which you must solve the sub-problems (so that you never use the recurrence relation based on uninitialized data)
You'll know that you have the right subproblems defined when
The problem you need the answer to is one of them
The base cases really are trivial
The recurrence is easy to evaluate
The scan order is straightforward
In your case, it is straightforward to specify the subproblems. Since this is probably homework, I will just give you the hint that you might wish that N had fewer digits to start off with.
Here's what i think:
Consider the first k + 1 digits from the left. Look for the biggest one, find it and remove the numbers to the left. If there exists two of the same biggest number, find the leftmost one and remove the numbers to the left of that. store the number of removed digits ( name it j ).
Do the same thing with the new number as N and k+1-j as K. Do this until k+1 -j equals to 1 (hopefully, it will, if i'm not mistaken).
The number you end up with will be the number you're looking for.