Time complexity in recursive function in which recursion reduces size - c++

I have to estimate time complexity of Solve():
// Those methods and list<Element> Elements belongs to Solver class
void Solver::Solve()
{
while(List is not empty)
Recursive();
}
void Solver::Recursive(some parameters)
{
Element WhatCanISolve = WhatCanISolve(some parameters); //O(n) in List size. When called directly from Solve() - will always return valid element. When called by recursion - it may or may not return element
if(WhatCanISolve == null)
return;
//We reduce GLOBAL problem size by one.
List.remove(Element); //This is list, and Element is pointed by iterator, so O(1)
//Some simple O(1) operations
//Now we call recursive function twice.
Recursive(some other parameters 1);
Recursive(some other parameters 2);
}
//This function performs search with given parameters
Element Solver::WhatCanISolve(some parameters)
{
//Iterates through whole List, so O(n) in List size
//Returns first element matching parameters
//Returns single Element or null
}
My first thought was that it should be somwhere around O(n^2).
Then I thought of
T(n) = n + 2T(n-1)
which (according to wolframalpha) expands to:
O(2^n)
However i think that the second idea is false, since n is reduced between recursive calls.
I also did some benchmarking with large sets. Here are the results:
N t(N) in ms
10000 480
20000 1884
30000 4500
40000 8870
50000 15000
60000 27000
70000 44000
80000 81285
90000 128000
100000 204380
150000 754390

Your algorithm is still O(2n), even though it reduces the problem size by one item each time. Your difference equation
T(n) = n + 2T(n-1)
does not account for the removal of an item at each step. But it only removes one item, so the equation should be T(n) = n + 2T(n-1) - 1. Following your example and
Saving the algebra by using WolframAlpha to solve this gives the solution T(n) = (c1 + 4) 2n-1 - n - 2 which is still O(2n). It removes one item, which is not a considerable amount given the other factors (especially the recursion).
A similar example that comes to mind is an n*n 2D matrix. Suppose you're only using it for a triangular matrix. Even though you remove one row to process for each column, iterating through every element still has complexity O(n2), which is the same as if all elements were used (i.e. a square matrix).
For further evidence, I present a plot of your own collected running time data:

Presumably the time is quadratic. If WhatCanISolve returns nullptr, iff the list is empty, then all calls
Recursive(some other parameters 2);
will finish in O(1), because they are run with an empty list. This means, the correct formula is actually
T(n) = C*n + T(n-1)
This means, T(n)=O(n^2), which corresponds well to what we see on the plot.

Related

How to erase elements more efficiently from a vector or set?

Problem statement:
Input:
First two inputs are integers n and m. n is the number of knights fighting in the tournament (2 <= n <= 100000, 1 <= m <= n-1). m is the number of battles that will take place.
The next line contains n power levels.
The next m lines contain two integers l and r, indicating the range of knight positions to compete in the ith battle.
After each battle, all nights apart from the one with the highest power level will be eliminated.
The range for each battle is given in terms of the new positions of the knights, not the original positions.
Output:
Output m lines, the ith line containing the original positions (indices) of the knights from that battle. Each line is in ascending order.
Sample Input:
8 4
1 0 5 6 2 3 7 4
1 3
2 4
1 3
0 1
Sample Output:
1 2
4 5
3 7
0
Here is a visualisation of this process.
1 2
[(1,0),(0,1),(5,2),(6,3),(2,4),(3,5),(7,6),(4,7)]
-----------------
4 5
[(1,0),(6,3),(2,4),(3,5),(7,6),(4,7)]
-----------------
3 7
[(1,0),(6,3),(7,6),(4,7)]
-----------------
0
[(1,0),(7,6)]
-----------
[(7,6)]
I have solved this problem. My program produces the correct output, however, it is O(n*m) = O(n^2). I believe that if I erase knights more efficiently from the vector, efficiency can be increased. Would it be more efficient to erase elements using a set? I.e. erase contiguous segments rather that individual knights. Is there an alternative way to do this that is more efficient?
#define INPUT1(x) scanf("%d", &x)
#define INPUT2(x, y) scanf("%d%d", &x, &y)
#define OUTPUT1(x) printf("%d\n", x);
int main(int argc, char const *argv[]) {
int n, m;
INPUT2(n, m);
vector< pair<int,int> > knights(n);
for (int i = 0; i < n; i++) {
int power;
INPUT(power);
knights[i] = make_pair(power, i);
}
while(m--) {
int l, r;
INPUT2(l, r);
int max_in_range = knights[l].first;
for (int i = l+1; i <= r; i++) if (knights[i].first > max_in_range) {
max_in_range = knights[i].first;
}
int offset = l;
int range = r-l+1;
while (range--) {
if (knights[offset].first != max_in_range) {
OUTPUT1(knights[offset].second));
knights.erase(knights.begin()+offset);
}
else offset++;
}
printf("\n");
}
}
Well, removing from vector wouldn't be efficient for sure. Removing from set, or unordered set would be more effective (use iterators instead of indexes).
Yet the problem will still remain O(n^2), because you have two nested whiles running n*m times.
--EDIT--
I believe I understand the question now :)
First let's calculate the complexity of your code above. Your worst case would be the case that max range in all battles is 1 (two nights for each battle) and the battles are not ordered with respect to the position. Which means you have m battles (in this case m = n-1 ~= O(n))
The first while loop runs n times
For runs for once every time which makes it n*1 = n in total
The second while loop runs once every time which makes it n again.
Deleting from vector means n-1 shifts that makes it O(n).
Thus with the complexity of the vector total complexity is O(n^2)
First of all, you don't really need the inner for loop. Take the first knight as the max in range, compare the rest in the range one-by-one and remove the defeated ones.
Now, i believe it can be done in O(nlogn) with using std::map. The key to the map is the position and the value is the level of the knight.
Before proceeding, finding and removing an element in map is logarithmic, iterating is constant.
Finally, your code should look like:
while(m--) // n times
strongest = map.find(first_position); // find is log(n) --> n*log(n)
for (opponent = next of strongest; // this will run 1 times, since every range is 1
opponent in range;
opponent = next opponent) // iterating is constant
// removing from map is log(n) --> n * 1 * log(n)
if strongest < opponent
remove strongest, opponent is the new strongest
else
remove opponent, (be careful to remove it after iterating to next)
Ok, now the upper bound would be O(2*nlogn) = O(nlogn). If the ranges increases, that makes the run time of upper loop decrease but increases the number of remove operations. I'm sure the upper bound won't change, let's make it a homework for you to calculate :)
A solution with a treap is pretty straightforward.
For each query, you need to split the treap by implicit key to obtain the subtree that corresponds to the [l, r] range (it takes O(log n) time).
After that, you can iterate over the subtree and find the knight with the maximum strength. After that, you just need to merge the [0, l) and [r + 1, end) parts of the treap with the node that corresponds to this knight.
It's clear that all parts of the solution except for the subtree traversal and printing work in O(log n) time per query. However, each operation reinserts only one knight and erase the rest from the range, so the size of the output (and the sum of sizes of subtrees) is linear in n. So the total time complexity is O(n log n).
I don't think you can solve with standard stl containers because there'no standard container that supports getting an iterator by index quickly and removing arbitrary elements.

What is the Big-O of code that uses random number generators?

I want to fill the array 'a' with random values from 1 to N (no repeated values). Lets suppose Big-O of randInt(i, j) is O(1) and this function generates random values from i to j.
Examples of the output are:
{1,2,3,4,5} or {2,3,1,4,5} or {5,4,2,1,3} but not {1,2,1,3,4}
#include<set>
using std::set;
set<int> S;// space O(N) ?
int a[N]; // space O(N)
int i = 0; // space O(1)
do {
int val = randInt(1,N); //space O(1), time O(1) variable val is created many times ?
if (S.find(val) != S.end()) { //time O(log N)?
a[i] = val; // time O(1)
i++; // time O(1)
S.insert(val); // time O(log N) <-- we execute N times O(N log N)
}
} while(S.size() < N); // time O(1)
The While Loop will continue until we generate all the values from 1 to N.
My understanding is that Set sorts the values in logarithmic time log(N), and inserts in log(N).
Big-O = O(1) + O(X*log N) + O(N*log N) = O(X*log N)
Where X the more, the high probability to generate a number that is not in the Set.
time O(X log N)
space O(2N+1) => O(N), we reuse the space of val
Where ?? it is very hard to generate all different numbers each time randInt is executed, so at least I expect to execute N times.
Is the variable X created many times ?
What would be the a good value for X?
Suppose that the RNG is ideal. That is, repeated calls to randInt(1,N) generate an i.i.d. (independent and identically distributed) sequence of values uniformly distributed on {1,...,N}.
(Of course, in reality the RNG won't be ideal. But let's go with it since it makes the math easier.)
Average case
In the first iteration, a random value val1 is chosen which of course is not in the set S yet.
In the next iteration, another random value is chosen.
With probability (N-1)/N, it will be distinct from val1 and the inner conditional will be executed. In this case, call the chosen value val2.
Otherwise (with probability 1/N), the chosen value will be equal to val1. Retry.
How many iterations does it take on average until a valid (distinct from val1) val2 is chosen? Well, we have an independent sequence of attempts, each of which succeeds with probability (N-1)/N, and we want to know how many attempts it takes on average until the first success. This is a geometric distribution, and in general a geometric distribution with success probability p has mean 1/p. Thus, it takes N/(N-1) attempts on average to choose val2.
Similarly, it takes N/(N-2) attempts on average to choose val3 distinct from val1 and val2, and so on. Finally, the N-th value takes N/1 = N attempts on average.
In total the do loop will be executed
times on average. The sum is the N-th harmonic number which can be roughly approximated by ln(N). (There's a well-known better approximation which is a bit more complicated and involves the Euler-Mascheroni constant, but ln(N) is good enough for finding asymptotic complexity.)
So to an approximation, the average number of iterations will be N ln N.
What about the rest of the algorithm? Things like inserting N things into a set also take at most O(N log N) time, so can be disregarded. The big remaining thing is that each iteration you have to check if the chosen random value lies in S, which takes logarithmic time in the current size of S. So we have to compute
which, from numerical experiments, appears to be approximately equal to N/2 * (ln N)^2 for large N. (Consider asking for a proof of this on math.SE, perhaps.) EDIT: See this math.SE answer for a short informal proof, and the other answer to that question for a more formal proof.
So in conclusion, the total average complexity is Θ(N (ln N)^2).
Again, this is assuming that the RNG is ideal.
Worst case
Like xaxxon mentioned, it is in principle possible (though unlikely) that the algorithm will not terminate at all. Thus, the worst case complexity would be O(∞).
That's a very bad algorithm for achieving your goal.
Simply fill the array with the numbers 1 through N and then shuffle.
That's O(N)
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
To shuffle, pick an index between 0 and N-1 and swap it with index 0. Then pick an index between 1 and N-1 and swap it with index 1. All the way until the end of the list.
In terms of your specific question, it depends on the behavior of your random number generator. If it's truly random, it may never complete. If it's pseudorandom, it depends on the period of the generator. If it has a period of 5, then you'll never have any dupes.
It's catastrophically bad code with complex behaviour. Generating the first number is O(1), Then the second involves a binary search, so a log N, plus a rerun of the generator should the number be found. The chance of getting an new number is p = 1- i/N. So the average number of re-runs is the reciprocal, and gives you another factor of N. So O(N^2 log N).
The way to do it is to generate the numbers, then shuffle them. That's O(N).

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?
Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).
This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.
You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

What is the time complexity of traversing a 2d array

What is the time complexity of traversing (rows ,columns) a two dimensional array?
bool check(int array [9][9])
{
int num=0;
for (int i = 0; i < 9; i++) {
for (int j = 0; j < 9; j++) {
if (array [i][j] == 0) {
num++;
}
}
}
return num;
}
I think each for loop will take square root of n so that nested loops totally take O(n) as traversing all elements, where I am defining n as the total size of the input (in this case 81 elements in array). Is that correct?
As you define n to be the total size of the input, yes the running time of the algorithm you propose will be O(n): you are performing one single operation on each element of the input, for n total operations.
Where the confusion is arising from this question is that by convention, multi-dimensional arrays are not referred to by their total size but rather by each of their dimensions separately. So rather than viewing array as being of size n (81) it would be considered to be an array of size p x q (9 x 9). That would give you a running time of O(pq). Or, if we limit it to square arrays with both dimensions r, O(r^2).
All are correct, which is why it's important to give a clear definition of your variables up front when talking about time complexity. Otherwise, when you use n to mean the total size when most people would assume that n would be a single dimension, you are inviting a lot of confusion.
The time complexity will be O (n*m) where n the number of arrays which is the 1st dimension and m the max size of each internal array ie, the 2nd dimension.
For any algorithm of the form
for (1..n) {
for (1..m) {
doSomething();
}
}
The average, best and worst case time complexity is O(n x m). In your case if n=m, it becomes O(n^2)
The time complexity is O(N), which means its time complexity is linear.
Let's look at the concept of time complexity. When we define any time complexity in Big O notation what we mean is how does the graph of N versus run time must look like in the worst execution case.
For given nested loop size of the data is 9*9 = 81.No matter what operation you perform in the inside for loop. The loops will not execute more than 9*9 = 81 times. If the size of the array was [10][10] the loops will execute not more than 100 times.
If you make graph of execution time of the code with number of inputs or data it will be linear.
The Time complexity is derived by how many times your code is going to do lookup of an element in the data structure to deduce the result. It does not matter whether it is 1-D, 2-D or n-D array. If you access an element not more than once for an n-D array to deduce the solution, the complexity is linear O(N), where N = N1 * N2 * ... *Nn
Let's understand this by taking real world example of two different hotels having N rooms each. You need to search your friend in the hotel.
In first scenario let's say first hotel has 100 rooms on single(ground) floor, you need to visit 100 rooms in worst case to find your friend, so here complexity is linear i.e. 0(N) or O(100).
In second scenario the hotel has 4 floors having 25 rooms each. In the worst case you have to visit 25*4=100 rooms (ignore the accessing time/process between floors), hence complexity is again linear.
A 2-d array arr[i][j] can be traversed by a single loop also, where the loop will run for (i × j) times.
Consider n = (i×j), then the time complexity for traversing a 2-d array is O(n).
Thanks to coder2design.com

Find dominant mode of an unsorted array

Note, this is a homework assignment.
I need to find the mode of an array (positive values) and secondarily return that value if the mode is greater that sizeof(array)/2,the dominant value. Some arrays will have neither.
That is simple enough, but there is a constraint that the array must NOT be sorted prior to the determination, additionally, the complexity must be on the order of O(nlogn).
Using this second constraint, and the master theorem we can determine that the time complexity 'T(n) = A*T(n/B) + n^D' where A=B and log_B(A)=D for O(nlogn) to be true. Thus, A=B=D=2. This is also convenient since the dominant value must be dominant in the 1st, 2nd, or both halves of an array.
Using 'T(n) = A*T(n/B) + n^D' we know that the search function will call itself twice at each level (A), divide the problem set by 2 at each level (B). I'm stuck figuring out how to make my algorithm take into account the n^2 at each level.
To make some code of this:
int search(a,b) {
search(a, a+(b-a)/2);
search(a+(b-a)/2+1, b);
}
The "glue" I'm missing here is how to combine these divided functions and I think that will implement the n^2 complexity. There is some trick here where the dominant must be the dominant in the 1st or 2nd half or both, not quite sure how that helps me right now with the complexity constraint.
I've written down some examples of small arrays and I've drawn out ways it would divide. I can't seem to go in the correct direction of finding one, single method that will always return the dominant value.
At level 0, the function needs to call itself to search the first half and second half of the array. That needs to recurse, and call itself. Then at each level, it needs to perform n^2 operations. So in an array [2,0,2,0,2] it would split that into a search on [2,0] and a search on [2,0,2] AND perform 25 operations. A search on [2,0] would call a search on [2] and a search on [0] AND perform 4 operations. I'm assuming these would need to be a search of the array space itself. I was planning to use C++ and use something from STL to iterate and count the values. I could create a large array and just update counts by their index.
if some number occurs more than half, it can be done by O(n) time complexity and O(1) space complexity as follow:
int num = a[0], occ = 1;
for (int i=1; i<n; i++) {
if (a[i] == num) occ++;
else {
occ--;
if (occ < 0) {
num = a[i];
occ = 1;
}
}
}
since u r not sure whether such number occurs, all u need to do is to apply the above algorithm to get a number first, then iterate the whole array 2nd time to get the occurance of the number and check whether it is greater than half.
If you want to find just the dominant mode of an array, and do it recursively, here's the pseudo-code:
def DominantMode(array):
# if there is only one element, that's the dominant mode
if len(array) == 1: return array[0]
# otherwise, find the dominant mode of the left and right halves
left = DominantMode(array[0:len(array)/2])
right = DominantMode(array[len(array)/2:len(array)])
# if both sides have the same dominant mode, the whole array has that mode
if left == right: return left
# otherwise, we have to scan the whole array to determine which one wins
leftCount = sum(element == left for element in array)
rightCount = sum(element == right for element in array)
if leftCount > len(array) / 2: return left
if rightCount > len(array) / 2: return right
# if neither wins, just return None
return None
The above algorithm is O(nlogn) time but only O(logn) space.
If you want to find the mode of an array (not just the dominant mode), first compute the histogram. You can do this in O(n) time (visiting each element of the array exactly once) by storing the historgram in a hash table that maps the element value to its frequency.
Once the histogram has been computed, you can iterate over it (visiting each element at most once) to find the highest frequency. Once you find a frequency larger than half the size of the array, you can return immediately and ignore the rest of the histogram. Since the size of the histogram can be no larger than the size of the original array, this step is also O(n) time (and O(n) space).
Since both steps are O(n) time, the resulting algorithmic complexity is O(n) time.