Wrong Complexity Calculation for Hash Tables? - c++

I was reading: https://www.geeksforgeeks.org/given-an-array-a-and-a-number-x-check-for-pair-in-a-with-sum-as-x/
I think there is a mistake with calculating the time complexity for Method2 (Hashing) where they claimed it's O(n) and I insist it's O(n) Amortized.
Algorithm:
1 Initialize an empty hash table s.
2 Do following for each element A[i] in A[]
2.1 If s[x – A[i]] is set then print the pair (A[i], x – A[i])
2.2 Insert A[i] into s.
Step 1 is done in O(1). Step 2 does O(n) iterations, Where for each one we are doing O(1) Amortized (2.1 & 2.2) so in total we have O(n) Amortized.

When an O(1) amortized step is performed n times, it is not valid to conclude the total cost is just O(n) amortized. The fact that a step is O(1) amortized means its average cost for n times is at most some constant c, and the fact that its average cost is at most c implies the total cost for those n steps is at most cn, so the cost for n steps is O(n), not just O(n) amortized.
By the definition of amortized cost with the aggregate method, the fact that a operation is T(n)/n amortized means there is some upper bound T(n) on performing n operation. So, if an operation is O(1) amortized, meaning there is some c such that the average cost is at most c, we have T(n)/n ≤ c, and therefore T(n) ≤ cn, and therefore performing n operation has at most cn cost. Therefore, the cost of n operations is O(n), not just O(n) amortized.
There can be some confusion in considering operations in isolation rather than as part of a sequence of n operations. If some program executes billions of unordered_set insertions and we take a random sample of n of them, it is not guaranteed that those n have an O(1) amortized time. We could have been unlikely to get many of the insertions that happened to be rebuilding the table. In such a random selection, the statistical average time would be O(1), but each sample could fluctuate. In contrast, when we look at all the insertions to insert n elements in the table, their times are correlated; the nature of the algorithm is such that it guarantees the table rebuilds occur only with a certain frequency, and this guarantees the total amount of work to be done over n insertions is O(n).

Related

Complexity analysis of loop with limited looping time

I'm wondering whether the big-o complexity of the following code should be o(1) or o(n).
for(int i=0;i<n && n<100;i++){sum++;}
Here's what i think:
since n is limited to be lower than 100, worst case will be O(99) + O(98) + ... + O(1) = 99 * O(1) = O(1)
However, by intuition, the code is somehow O(n) due to the loop.
Would really appreciate it if someone can advise me on this.
Thank you!
Intuitively it is O(1) because as n increases the runtime does not increase after a certain point. However, this is an edge case, as were n bounded by a much higher number, say the maximum value of an int, it would seem to be no different than if n was not bounded at all. However, when considering runtime using complexity theory we usually ignore things like that maximum size of an int.
Another way to think of this is that the number of iterations grows linearly with n for n in (0,100) and is constant otherwise. When considering n can be any value, however, the algorithm is definitely O(1)
This is all assuming each iteration of the loop takes constant time.
For more information look up Liouville's theorem.

Complexity in Dijkstras algorithm

So I've been attempting to analyze a specialized variant of Dijkstras algorithm that I've been working on. I'm after the worst case complexity.
The algorithm uses a Fibonacci Heap which in the case of the normal Dijkstra would run in O(E + V log V).
However this implementation needs to do a lookup in the inner loop where we update neighbours. This lookup will execute for every edge and will be in logarithmic time, where the lookup is in a datastructure that contains all edges. Also the graph has the restriction that no node will have more than 4 neighbours.
O(Vlog V) is the complexity for the outer loop but I'm not sure what the worst case will be for the inner loop. I'm thinking that since that each edge in the graph will be checked O(E) times and each edge will take logarithmic time it should be Elog E which should exceed Vlog V and result in O(Elog E) complexity for the algorithm.
Any insight would be awesome!
The amortized complexity of Decrease-Key on Fibonacci Heap is O(1), that is to say you have |E| such operations on the Fibonacci Heap, the total cost will be O(E). Also you have |V| Extract-Min operations, which cost O(lnV) each. So the total cost is O(E+VlnV).

Runtime of inserting into multi set

What is the overall runtime of inserting into a multi set, lets say I am going over a billion elements and inserting into multi-set, which maintains a sorted ordering. What is my worst case runtime?
According to http://www.sgi.com/tech/stl/MultipleAssociativeContainer.html the complexity of insert is O(log n) for inserting a single element; for inserting a sequence of length N, it is O(N log n).
If you really want the time, and not the asymptotic complexity, you can time it for a different values - 1000, 10,000 - say, and then compute the constants of proportionality from there. The actual equation will be t = A n log n + C.
But of course the next time you run on different hardware the values of A and C will change.

The amortized complexity of std::next_permutation?

I just read this other question about the complexity of next_permutation and while I'm satisfied with the response (O(n)), it seems like the algorithm might have a nice amortized analysis that shows a lower complexity. Does anyone know of such an analysis?
So looks like I'm going to be answering my own question in the affirmative - yes, next_permutation runs in O(1) amortized time.
Before I go into a formal proof of this, here's a quick refresher on how the algorithm works. First, it scans backwards from the end of the range toward the beginning, identifying the longest contiguous decreasing subsequence in the range that ends at the last element. For example, in 0 3 4 2 1, the algorithm would identify 4 2 1 as this subsequence. Next, it looks at the element right before this subsequence (in the above example, 3), then finds the smallest element in the subsequence larger than it (in the above example, 4). Then, it exchanges the positions of those two elements and then reverses the identified sequence. So, if we started with 0 3 4 2 1, we'd swap the 3 and 4 to yield 0 4 3 2 1, and would then reverse the last three elements to yield 0 4 1 2 3.
To show that this algorithm runs in amortized O(1), we'll use the potential method. Define Φ to be three times the length of the longest contiguously decreasing subsequence at the end of the sequence. In this analysis, we will assume that all the elements are distinct. Given this, let's think about the runtime of this algorithm. Suppose that we scan backwards from the end of the sequence and find that the last m elements are part of the decreasing sequence. This requires m + 1 comparisons. Next, we find, of the elements of that sequence, which one is the smallest larger than the element preceding this sequence. This takes in the worst case time proportional to the length of the decreasing sequence using a linear scan for another m comparisons. Swapping the elements takes, say, 1 credit's worth of time, and reversing the sequence then requires at most m more operations. Thus the real runtime of this step is roughly 3m + 1. However, we have to factor in the change in potential. After we reverse this sequence of length m, we end up reducing the length of the longest decreasing sequence at the end of the range to be length 1, because reversing the decreasing sequence at the end makes the last elements of the range sorted in ascending order. This means that our potential changed from Φ = 3m to Φ' = 3 * 1 = 3. Consequently, the net drop in potential is 3 - 3m, so our net amortized time is 3m + 1 + (3 - 3m) = 4 = O(1).
In the preceding analysis I made the simplifying assumption that all the values are unique. To the best of my knowledge, this assumption is necessary in order for this proof to work. I'm going to think this over and see if the proof can be modified to work in the case where the elements can contain duplicates, and I'll post an edit to this answer once I've worked through the details.
I am not really sure of the exact implementation of std::next_permutation, but if it is the same as Narayana Pandita's algorithm as desribed in the wiki here: http://en.wikipedia.org/wiki/Permutation#Systematic_generation_of_all_permutations,
assuming the elements are distinct, looks like it is O(1) amortized! (Of course, there might be errors in the below)
Let us count the total number of swaps done.
We get the recurrence relation
T(n+1) = (n+1)T(n) + Θ(n2)
(n+1)T(n) comes from fixing the first element and doing the swaps for the remaining n.
Θ(n2) comes from changing the first element. At the point we change the first element, we do Θ(n) swaps. Do that n times, you get Θ(n2).
Now let X(n) = T(n)/n!
Then we get
X(n+1) = X(n) + Θ(n2)/(n+1)!
i.e there is some constant C such that
X(n+1) <= X(n) + Cn2/(n+1)!
Writing down n such inequalities gives us
X(n+1) - X(n) <= Cn2/(n+1)!
X(n) - X(n-1) <= C(n-1)2/(n)!
X(n-1) - X(n-2) <= C(n-2)2/(n-1)!
...
X(2) - X(1) <= C12/(1+1)!
Adding these up gives us X(n+1) - X(1) <= C(\sum j = 1 to n (j^2)/(j+1)!).
Since the infinite series \sum j = 1 to infinity j^2/(j+1)! converges to C', say, we get X(n+1) - X(1) <= CC'
Remember that X(n) counts the average number of swaps needed (T(n)/n!)
Thus the average number of swaps is O(1).
Since finding the elements to swap is linear with the number of swaps, it is O(1) amortized even if you take other operations into consideration.
Here n stands for the count of elements in the container, not the total count of possible permutations. The algorithm must iterate through an order of all elements at each call; it takes a pair of bidirectional iterators, which implies that to get to one element the algorithm must first visit the one before it (unless its the first or last element). A bidirectional iterator allows iterating backwards, so the algorithm can (must, in fact) perform half as many swaps as there are elements. I believe the standard could offer an overload for a forward iterator, which would support dumber iterators at the cost of n swaps rather than half n swaps. But alas, it didn't.
Of course, for n possible permutations the algorithm operates in O(1).

complexity about going from beginning to end and back through a vector

I am trying to be familiar with the complexity evaluation of algorithms. In general I think that is a good/elegant practice, but in the specific I need it to express time complexity of my C++ code.
I have a small doubt. Suppose I have an algorithm that just reads data from the beginning of a std::vector until the end; then it does the same starting from the end to beginning (so are 2 cycles for indexes "From 0 To N" followed by "From N To 0").
I said to myself that the complexity for this stuff is O(2N): is this correct?
Once I reached the beginning, suppose that I want to start reading again all data from beginning to the end (passing in total 3 times the vector): is the complexity O(3N)?
It is maybe a stupid doubt, but I would like to have someone opinion anyway about my thinking process.
Big-O notation simply means:
f(n) = O( g(n) ) if and only if f(n) / g(n) does not grow to infinity as n increases
What you have to do is count the number of operations you're performing, which is f(n), and then find a function g(n) that increases at least as fast as f.
In your example of going one way and then back, the number of operations is f(n) = 2n because each element is read twice, so, you can choose g(n) = n. Since f(n) / g(n) = 2n / n = 2 obviously does not grow to infinity (it's a constant), you have an O(n) algorithm.
It's also an O(2n) algorithm, of course : since the "grow to infinity" property does not change when you multiply g(n) by a constant, any O( g(n) ) is also by definition an O( C g(n) ) algorithm for any constant C.
And it's also an O(n²) algorithm, because 2n / n² = 2 / n decreases towards zero. Big-O notation only provides an upper bound on the complexity.
O(N), O(2N) and O(3N) are equivalent. Multiplying a constant factor to the function inside the O( ) won't change its complexity as "linear".
It is true, however, that each scan will perform N reads in either direction, i.e. it will perform 2N ∈ O(N) reads when scanning from start to end to start, and 3N ∈ O(N) reads when scanning from start to end to start to end.
It's important to get a working feel for Big-O notation. I'll try to convey that...
As you say, your algorithm intuitively is "O(2N)", but imagine someone else writes an algorithm that iterates only once (therefore clearly O(N)) but spends twice as long processing each node, or a hundred times as long. You can see that O(2N) is only very weakly suggestive of something slower than an O(N) algorithm: not knowing what the operations are, O(N) might only be faster say 50.1% of the time.
Big-O becomes meaningful only as N gets huge: if your operations vary in length by say 1000:1, then the difference between an O(N) and O(NlogN) algorithm only becomes dominant as N exceeds 1000 squared (i.e. 1000000). So, Big-O notation is for reasoning about the cost of operations on large sets, in which linear factors like 2x or 10x just aren't considered relevant, and they're ignored.