Why is the complexity of heap Delete element equal to O(logn) and Build is O(n)? - heap

I am having trouble with understanding complexity in general, can you please explain why is Delete complexity in heap equal to O(log n) and why is Build O(n)? ( I assume insertion with complexity O(1) refers only to the adding of the element, without the place switching).
Isn't the element that we add at the bottom suppose to move all the way up in the worst case? Therefore ( even though i don't understand why it is log n ) , shouldn't the Build be O(log n) as well?

Constructing a heap from an array is O(n) because it uses the build_heap algorithm to rearrange the array in place. That's fundamentally different from doing n insert operations that are O(log n) each in the worst case. Building a heap by repeated insertion is O(n log n).
Removing the highest priority item uses this algorithm:
move the last item in the heap to the root
decrease the count by 1
starting at the new root node, move it down the heap to its proper place:
while the node is larger than either of its children
swap the node with its smallest child
This is O(log n) because moving the node down the heap can potentially require O(log n) swaps.
See how to build heap tree? for an explanation of the build_heap algorithm.
See https://stackoverflow.com/a/49781979/56778 for an explanation of how build_heap is O(n).

Related

Are these time complexities correct

I wanted to know if my time complexities regarding the following points along with the reasoning are correct or not.
Insertion at the end of a dynamic array is O(1) and anywhere else in
O(n) (since elements might need to copied and moved) (resembling a std::vector)
Searching through a single link list is O(n) since its linear.
Insertion and deletion in a single link list could either be O(1) or
O(n). It is O(1) if the address to the node is available otherwise
its O(n) since a linear search would be required.
I would appreciate feedback on this
Insertion at the end of a dynamic array is O(1) and anywhere else in O(n) (since elements might >need to copied and moved) (resembling a std::vector)
Amortized time complexity of a dynamic array is O(1).
https://stackoverflow.com/a/249695/1866301
Searching through a single link list is O(n) since its linear.
Yes
Insertion and deletion in a single link list could either be O(1) or O(n). It is O(1) if the
address to the node is available otherwise its O(n) since a linear search would be required.
Yes, if the nodes of the linked list are indexed by their addresses, you could get O(1), else you will have to search through the list which is O(n). There are other variants like skip list for which search is logarithmic, O(log n).
http://en.wikipedia.org/wiki/Skip_list
1) Insertion at the end of a dynamic array is amortized O(1). The reason is that, if the insertion forces a reallocation, the existing elements need to be moved to a new block of memory which is O(n). If you make sure the array is grown by a constant factor every time an allocation occurs, eventually the moves become infrequent enough to become insignificant. Inserting into the middle of a dynamic array will be O(n) to shift the elements after the insertion point over.
2) Correct; searching through a linked list, sorted or not, is O(n) because each element of the linked list will be visited during the search.
3) This is also correct. If you don't need a sorted list, insertion into a singly linked list is usually implemented as adding to the head of the list, so you can just update the list head to the new node and the next pointer of the new node to the be the old head of the list. This means insertion into an unsorted singly linked list will often be indicated as O(1) without much discussion.
Take a look at this cheatsheet for algorithm complexity, I use it as a reference

std::set erase complexity anomality?

I am trying to figure out the complexity of erasing multiple elements from std::set. I am using this page as source.
It claims that the complexity for erasing a single item using an iterator is amortized O(1), but erasing multiple items using the range form is log(c.size()) + std::distance(first, last) (i.e. - log of the set's size + the number of elements deleted).
Taken at face value, if the number of elements to be erased (n) is much smaller than the number of elements in the set (m), this means that looping over the elements to be erased and erasing them one at a time is quicker (O(n)) than erasing them with one call (O(log m) assuming n<<m).
Obviously, had that really been the case, the internal implementation of the second form would just do the above loop.
Is this an error at the site? A bug in the specs? Am I just missing something?
Thanks,
Shachar
Internally elements of a set are stored in a balanced binary tree. Balanced tree is a tree where maximal height difference between any left and right subtree of any node is 1.
Maintaining balanced structure is important to guarantee that a search of any element in the tree (in the set) takes in worst case O(log(n)) steps.
Removal of an element may destroy balance. To restore balance rotations must be performed. In some cases a single removal causes several rotations so that the operation takes O(log(n)) steps, but in average a removal takes O(1) steps.
So, when several elements scattered over the set must be deleted, one by one, the amortized costs with high probability will be O(1) per removal.
Removing several elements in the range (first, last, where one element follows the next) will almost certainly destroy the balance, what causes the log factor in the complexity: log(n) + std::distance(first, last)
It seems the problem is hiding behind the (somewhat weasel) word "amortized". The single item erase has O complexity of log(c.size()), but amortized complexity of O(1).
Performing multiple single erases in a loop will thus cost log(c.size()) + number of erases, which is exactly what the range form's complexity is.
Shachar

Looking for clarification on Hashing and BST functions and Big O notation

So I am trying to understand the data types and Big O notation of some functions for a BST and Hashing.
So first off, how are BSTs and Hashing stored? Are BSTs usually arrays, or are they linked lists because they have to point to their left and right leaves?
What about Hashing? I've had the most trouble finding clear information regarding Hashing in terms of computation-based searching. I understand that Hashing is best implemented with an array of chains. Is this for faster searching or to decrease overhead on creating the allocated data type?
This following question might be just bad interpretation on my part, but what makes a traversal function different from a search function in BSTs, Hashing, and STL containers?
Is traversal Big O(N) for BSTS because you're actually visiting each node/data member, whereas search() can reduce its time by eliminating half the searching field?
And somewhat related, why is it that in the STL, list.insert() and list.erase() have a Big O(1) whereas the vector and deque counterparts are O(N)?
Lastly, why would a vector.push_back() be O(N)? I thought the function could be done something along the lines of this like O(1), but I've come across text saying it is O(N):
vector<int> vic(2,3);
vector<int>::const iterator IT = vic.end();
//wanna insert 4 to the end using push_back
IT++;
(*IT) = 4;
hopefully this works. I'm a bit tired but I would love any explanations why something similar to that wouldn't be efficient or plausible. Thanks
BST's (Ordered Binary Trees) are a series of nodes where a parent node points to its two children, which in turn point to their max-two children, etc. They're traversed in O(n) time because traversal visits every node. Lookups take O(log n) time. Inserts take O(1) time because internally they don't need to a bunch of existing nodes; just allocate some memory and re-aim the pointers. :)
Hashes (unordered_map) use a hashing algorithm to assign elements to buckets. Usually buckets contain a linked list so that hash collisions just result in several elements in the same bucket. Traversal will again be O(n), as expected. Lookups and inserts will be amortized O(1). Amortized means that on average, O(1), though an individual insert might result in a rehashing (redistribution of buckets to minimize collisions). But over time the average complexity is O(1). Note, however, that big-O notation doesn't really deal with the "constant" aspect; only order of growth. The constant overhead in the hashing algorithms can be high enough that for some data-sets the O(log n) binary trees outperform the hashes. Nevertheless, the hash's advantage is that its operations are constant time-complexity.
Search functions take advantage (in the case of binary trees) of the notion of "order"; a search through a BST has the same characteristics as a basic binary search over an ordered array. O(log n) growth. Hashes don't really "search". They compute the bucket, and then quickly run through the collisions to find the target. That's why lookups are constant time.
As for insert and erase; in array-based sequence containers, all elements that come after the target have to be bumped over to the right. Move semantics in C++11 can improve upon the performance, but the operation is still O(n). For linked sequence containers (list, forward_list, trees), insertion and erasing just means fiddling with some pointers internally. It's a constant-time process.
push_back() will be O(1) until you exceed the existing allocated capacity of the vector. Once the capacity is exceeded, a new allocation takes place to produce a container that is large enough to accept more elements. All the elements need to then be moved into the larger memory region, which is an O(n) process. I believe Move Semantics can help here as well, but it's still going to be O(n). Vectors and strings are implemented such that as they allocate space for a growing data set, they allocate more than they need, in anticipation of additional growth. This is an efficiency safeguard; it means that the typical push_back() won't trigger a new allocation and move of the entire data set into a larger container. But eventually after enough push_backs, the limit will be reached, and the vector's elements will be copied into a larger container, which again has some extra headroom left over for more efficient push_backs.
Traversal refers to visiting every node, whereas search is only to find a particular node, so your intuition is spot on there. O(N) complexity because you need to visit N nodes.
std::vector::insert is for insert in the middle, and it involves copying all subsequent elements over by one slot, inorder to make room for the element being inserted, hence O(N). Linked list doesnt have this issue, hence O(1). Similar logic for erase. deque properties are similar to vector
std::vector::push_back is a O(1) operation, for the most part, only deviates if capacity is exceeded and reallocations + copy are needed.

Efficiency of the STL priority_queue

I have an application (C++) that I think would be well served by an STL priority_queue. The documentation says:
Priority_queue is a container adaptor, meaning that it is implemented on top of some underlying container type. By default that underlying type is vector, but a different type may be selected explicitly.
and
Priority queues are a standard concept, and can be implemented in many different ways; this implementation uses heaps.
I had previously assumed that top() is O(1), and that push() would be a O(logn) (the two reasons I chose the priority_queue in the first place) - but the documentation neither confirms nor denies this assumption.
Digging deeper, the docs for the Sequence concept say:
The complexities of single-element insert and erase are sequence dependent.
The priority_queue uses a vector (by default) as a heap, which:
... supports random access to elements, constant time insertion and removal of elements at the end, and linear time insertion and removal of elements at the beginning or in the middle.
I'm inferring that, using the default priority_queue, top() is O(1) and push() is O(n).
Question 1: Is this correct? (top() access is O(1) and push() is O(n)?)
Question 2: Would I be able to achieve O(logn) efficiency on push() if I used a set (or multiset) instead of a vector for the implementation of the priority_queue? What would the consequences be of doing this? What other operations would suffer as a consequence?
N.B.: I'm worried about time efficiency here, not space.
The priority queue adaptor uses the standard library heap algorithms to build and access the queue - it's the complexity of those algorithms you should be looking up in the documentation.
The top() operation is obviously O(1) but presumably you want to pop() the heap after calling it which (according to Josuttis) is O(2*log(N)) and push() is O(log(N)) - same source.
And from the C++ Standard, 25.6.3.1, push_heap :
Complexity: At most log(last - first) comparisons.
and pop_heap:
Complexity: At most 2 * log(last - first) comparisons.
top() - O(1) -- as it just returns the element # front.
push() -
insert into vector - 0(1) amortized
push_into_heap - At most, log(n) comparisons. O(logn)
so push() complexity is --
log(n)
No. This is not correct. top() is O(1) and push() is O(log n). Read note 2 in the documentation to see that this adapter does not allow iterating through the vector. Neil is correct about pop(), but still this allows working with the heap doing insertions and removals in O(log n) time.
If the underlying data structure is a heap, top() will be constant time, and push (EDIT: and pop) will be logarithmic (like you are saying). The vector is just used to store these things like this:
HEAP:
1
2 3
8 12 11 9
VECTOR (used to store)
1 2 3 8 12 11 9
You can think of it as the element at position i's children is (2i) and (2i+1)
They use the vector because it stores the data sequentially (which is much more efficient and cache-friendly than discrete)
Regardless of how it is stored, a heap should always be implemented (especially by the gods who made the STD lib) so that pop is constant and push is logarithmic
C++ STL priority_queue underlying data structure is Heap data structure(Heap is a non linear ADT which based on complete binary tree and complete binary tree is implemented through vector(or Array) container.
ex Input data : 5 9 3 10 12 4.
Heap (Considering Min heap) would be :
[3]
[9] [4]
[10] [12] [5]
NOW , we store this min heap in to vector,
[3][9][4][10][12][5].
Using formula ,
Parent : ceiling of n-1/2
Left Child : 2n+1
Right Child : 2n+2 .
Hence ,
Time Complexity for
Top = O(1) , get element from root node.
POP()= O(logn) , During deletion of root node ,there is chance to violation of heap order . hence restructure of heap order takes at most O(logn) time (an element might move down to height of tree).
PUSH()= O(logn) , During insertion also , there might chance to violation of heap order . hence restructure of heap order takes at most O(logn) time (an element might move up to root from leaf node).
Q1: I didn't look in the standard, but AFAIK, using vector (or deque btw), the complexity would be O(1) for top(), O(log n) for push() and pop(). I once compared std::priority_queue with my own heap with O(1) push() and top() and O(log n) pop() and it certainly wasn't as slow as O(n).
Q2: set is not usable as underlying container for priority_queue (not a Sequence), but it would be possible to use set to implement a priority queue with O(log n) push() and pop(). However, this wouldn't probably outperform the std::priority_queue over std::vector implementation.

Data structure (in STL or Boost) for retrieving kth smallest/largest item in a set?

I am looking for a data structure in C++ STL or boost with the following properties:
Retrieval of kth largest item in O(log n) time
Searching in O(log n) time
Deletion in O(log n) time
If such a data structure implementation doesn't exist, is there a way to adapt a different data structure with extra data (e.g., set) so that the above is possible?
Note: I've found is-there-any-data-structure-in-c-stl-for-performing-insertion-searching-and-r, but this is 5 years old and doesn't mention boost.
For the moment I assume that the elements are unique and that there are at least k elements. If not, you can use multiset similarly.
You can accomplish this using two sets in C++:
#include <set>
Set 1: Let's call this large. It keeps the k largest elements only.
Set 2: Let's call this rest. It keeps the rest of the elements.
Searching: Just search boths sets, takes O(log n) since both sets are red-black tree.
Deleting: If the element is in rest, just delete it. If not, delete it from large, and then remove the largest element from rest and put it into large. Deleting from red-black tree takes O(log n).
Inserting new elements (initializing): Each time a new element comes: (1) If large has less than k elements, add it to large. (2) Otherwise, if the element is greater than the minimum element in large, remove that minimum and move it to rest. Then, add the new element to large. If none of the above, simply add the new element to rest. Deleting and inserting for red-black trees takes O(log n).
This way, large always has the k largest elements, and the minimum of those is the k-th largest which you want.
I leave it to you to find how you can do search, insert, find min, find max, and delete in a set. It's not that hard. But all of these operations take O(log n) on a balanced binary search tree.