about make_heap algorithm in C++ - c++

http://www.cplusplus.com/reference/algorithm/make_heap/
In this link. it says:
Internally, a heap is a tree where
each node links to values not greater
than its own value. In heaps generated
by make_heap, the specific position of
an element in the tree rather than
being determined by memory-consuming
links is determined by its absolute
position in the sequence, with *first
being always the highest value in the
heap.
about "is determined by its absolute positon in the sequence" .
I confused here.
It also says "a heap is a tree where each node linkes to values not greater than its own value"
Do those 2 sentence contradict? SO confused here.
What exactly tree is for a heap in C++?
Wish any kind person can help me out
Thanks a lot

What this says is that a heap has a typical tree like structure, where each 'parent' node is greater than or equal to the value of the 'child' node ("...where each node links to values not greater than its own value...").
It then goes on to say that instead of using links (i.e. pointers in, say, a struct (like you would use for a linked list)), it uses in-place memory (otherwise known as an array - "...is determined by its absolute position in the sequence...").
*first is the first element (or the largest/smallest, depending on the comparator function) on the heap, and is always at the [0]th index of the array. For each index i, the children are located at [2*i+1] and [2*i+2].
Hope this helps.

If you look at heap implementations you see the tree is implemented as an array. You can find the values below a node at index i at indexes 2 * i+1 and 2 * i +2. So it is a tree, where you can access the elements by their absolute position in the array.

Related

Lower bound in Set STL in c++

I understand that set is implemented as tree structure internally in c++ . So, how does lower_bound gets performed on it? I mean I understand for vector that you pick the middle element using the start and end indexes and perform binary search, but how does it get implemented for tree like structures?
Finding the lower_bound in a set is almost the same as searching for an element in the set. At the end of the search (navigating the tree) you've either found the node where the element is, or you find an element that is greater than the one you're looking for, but there aren't any nodes with a lower value in its subtree (so this is where you'd want to insert the value if you were adding it, as the left child of this node).
Either way you've found the lower bound for the element.

Binary Search Tree and Max Heap at the same time (C++)

[EDITED!!!] Having read about different data structures and created many of them (in C++) I just wondering how could we make a data structure where each node would be a pair of keys (x,y), where x would refer to a value of a Max Heap and y to a key of a Binary Search Tree. I would like something like a BST and Max Heap at the same time (using tuples or pairs of keys as nodes each time). To make it more clear, technically I mean that in each node i of the tree a pair of keys (x,y) will be stored where x is the priority of the key and y will be the the value of the key.
It will be able to support all the functionalities of the above mentioned data structures, such as the insertion and deletion. For example, regarding deletion the tuple is going to go consecutively deeper by a sequence of simple consecutive rotations, until the tuple be a leaf. Then the deletion is easy as far as you know. If the tuple – the one we would like to delete - would be a leaf or an inner node, the deletion could be done in the same way as in BSTs.
Regarding insertion a tuple will be inserted into the tree only based on the key of the binary search tree. After that, the pair is going to be moved consecutively higher in the tree until the fundamental property of max heaps be violated.
Moreover, I have some extra functionalities into my mind. An additional function could be something like find_second_next(), taking as argument the x key, which is already on the tree, and this function will find the second smaller among all the y keys of the tree which are greater than x.
Another function could be a print_between(k1,k2) as well. This function will print all y keys of the tree having value in the range [k1,k2]. Finally, I would like to have also a print_with_higher_priority(x) function which will print all the x keys of the tree which are greater than x.
If you have some additional functionalities write them! :D
I am looking forward to seeing your contribution to this question!

Is it possible to implement a binary heap that is both a max and a min heap?

I'm trying to implement a binary heap (priority queue) that has the capabilities of both a min heap and a max heap. It needs to have an insert(value), extractMin(), and an extractMax() method. The extract methods remove the value from the heap and return the value.
I was originally using two arrays, called minHeap and maxHeap, one to store the data in a min heap structure, and the other to store the same data in a max heap structure. So when I call extractMin(), it removes and returns the value from minHeap. Then I have to remove that value from maxHeap as well (and vice-versa if I called extractMax()) in order to keep the data set identical in both heaps. And because of the heap-order property, it's guaranteed that I'll find that value in the leaves of the other heap. Searching for that value in the other heap results in a time complexity of O(n) or more precisely, O(n/2) since I'll only be searching the leaves. Not to mention, the percolatingDown() and percolatingUp() methods to restore the heaps after removing values is already O(log n); so in total, the extract methods would be O(n). The problem is, I need the extract methods to be O(log n).
Is there a better way to go about this?
I also thought of this idea but wanted to know what you all think first.
I just finished coding a "median heap" by placing the smaller half of the data in the max heap and the larger half in the min heap. With that structure, I'm able to easily retrieve the median of a given set of values. And I was thinking of using a similar structure of placing the smaller half of the data in the min heap and the larger half in the max heap and using the mean (rather than the median) of all the values to be the deciding factor of whether to place the value in the max or min heap when calling insert(value). I think this might work as the extract methods would stay O(log n).
The simple way is to just use a binary search tree, as M. Shaw recommends.
If you're required to build this on top of binary heaps, then in each heap, alongside each element, store the element's position in the other heap. Every time you move an element in one heap, you can go straight to its position in the other heap and update it. When you perform a delete-min or delete-max, no expensive linear scan in the other heap is required.
For example, if you store std::pairs with first as the element value and second as the position in the other heap, swapping two elements in the min-heap while updating their counterparts in the max-heap might look like this:
swap(minheap[i], minheap[j]);
maxheap[minheap[i].second].second = i;
maxheap[minheap[j].second].second = j;
You can create a hash table for the heap elements, which is shared by two heaps. The table is indexed by the value of the heap element. The value of the hashed bucket can be a struct consisting of the array index in minHeap and maxHeap respectively.
The benefit of this approach is that it is non-intrusive, meaning that the structure of the heap elements remains the same. And you don't have to create heaps side-by-side. You can create one after the other with the usual heap creation precedure.
E.g.,
struct tIndex
{
// Array index of the element in two heaps respectively
size_t minIndex;
size_t maxIndex;
};
std::unordered_map<int, tIndex> m;
Pay attention that any change to the heap may change the underlying array index of existing elements. So when you add/remove an element, or swap two elements, you may need to update its array index in the hash table accordingly.
You're close. The trick is to use another level of indirection. Keep the keys in an array K[i] and store only indices i in the heaps. Also keep two reverse maps: one for the max heap and one for the min. A reverse map is an array of integers R such that R[i] is the location in the min (or max) heap of the index i for key K[i]. In other words, if M[j] is the min (or max) heap, then R[M[j]] = j; Now whenever you do a sifting operation to move elements around in a heap, you must update the respective reverse map at the same time. In fact it works just like the relation above. At every step where you change a heap element M[j] = z, also update the reverse map R[z] = j; This increases run time by only a small constant factor. Now to delete K[i] from the heap, you can find it in constant time: It's at M[R[i]]. Sift it up to the root and remove it.
I know this works (finding a heap object to delete in constant time) because I've implemented it as part of a bigger algorithm. Check out https://github.com/gene-ressler/lulu/blob/master/ext/lulu/pq.c . The larger algorithm is for map marker merging: https://github.com/gene-ressler/lulu/wiki
http://www.geeksforgeeks.org/a-data-structure-question/
Min-Max heap I would say is the answer as pointed by "user2357112" if the most frequent operation is findMin and findMax. BST might be an overkill if we dont really want a completely ordered data structure , the above is a partial ordered data structured. Refer the link posted above.

Finding corruption in a linked list

I had an interview today for a developer position and was asked an interesting techincal question that i did not know the answer to. I will ask it here to see if anyone can provide me with a solution for my curiosity. It is a multi-part question:
1) You are given a singly linked list with 100 elements (integer and a pointer to next node), find a way to detect if there is a break or corruption halfway through the linked list? You may do anything with the linked list. Note that you must do this in the list as it is iterating and this is verification before you realise that the list has any issues with it.
Assuming that the break in the linked list is at the 50th element, the integer or even the pointer to the next node (51st element) may be pointing to a garbage value which is not necessarily an invalid address.
2) Note that if there is a corruption in the linked list, how would you minimize data loss?
To test for a "corrupted" integer, you would need to know what the range of valid values is. Otherwise, there is no way to determine that the value in any given (signed) integer is invalid. So, assuming you have a validity test for the int, you would always check that value before iterating to the next element.
Testing for a corrupted pointer is trickier - for a start, what you need to do is check the value of the pointer to the next element before you attempt to de-reference it, and ensure it is a valid heap address. That will avoid a segmentation fault. The next thing is to validate that what the pointer points at is in fact a valid linked list node element - that's a bit trickier? Perhaps de-reference the pointer into a list element class/struct, and test the validity of the int and "next" pointer, if they are also good, then can be pretty sure the previous node was good also.
On 2), having discovered a corrupted node, [if the next pointer is corrupted] what you should do is set the "next pointer" of the previous node to 'NULL' immediately, marking it as the end of the list, and log your error etc etc. if the corruption was just to the integer value, but not to the "next" element pointer, then you should remove that element from the list and link the previous and following nodes together instead - as no need to throw the rest of the list away in that case!
For the first part - Overload the new operator. When ever a new node is allocated allocate some additional space before and after the node and put some known values there. In traversal each node can be checked if it is in between the known values.
If you at design time know that corruption may become a critical issue, you could add a "magic value" as a field into the node data structure which allows you to identify whether some data is likely to be a node or not. Or even to run through memory searching for nodes.
Or double some link information, i.e. store the address of the node after the next node in each node, such that you can recover if one link is broken.
The only problem I see is that you have to avoid segmentation faults.
If you can do anything to the linked list, what you can do is to calculate the checksum of each element and store it on the element itself. This way you will be able to detect corruption even if it's a single bit error on the element.
To minimize data loss perhaps you can consider having storing the nextPtr in the previous element, that way if your current element is corrupted, you can always find the location of the next element from the previous.
This is an easy question, and there are several possible answers. Each trades off robustness with efficiency. Since increased robustness is a prerequisite of the question being asked, there are solutions available which sacrifice both time (list traversal speed, as well as speed of insertion and speed of deletion of nodes) or alternately space (extra info stored with each node). Now the problem has been stated that this is a fixed list of length 100, in which case the data structure of a linked list is most inappropriate. Why not make the puzzle a little more challenging and say that the size of the list is not known a priori?
Since the number of elements (100) is known, 100th node must contain a null pointer. If it does, the list with some good probability is valid (this cannot be guaranteed, if, for example, 99th node is corrupt and points to some memory location with all zeros). Otherwise, there is some problem (this can be returned as a fact).
upd: Also, it could be possible to, an every step, look at some structures delete would use if given the pointer, but since using delete itself is not safe in any sense, this is going to be implementation-specific.

Indexing: Implementing Tree data structures with Arrays/Vectors

I have been implementing a heap in C++ using a vector. Since I have to access the children of a node (2n, 2n+1) easily, I had to start at index 1. Is it the right way? As per my implementation, there is always a dummy element at zeroth location.
Your way works. Alternatively you can have root at index 0 and have children at 2n+1 and 2n+2
While this works well for heaps, you end up using a huge amount of redundant memory for other tree data structures that do not necessarily have a full and complete Binary tree. For example, this means that if you have a Binary search tree of 20 nodes with a depth of 5, you end up having to use an array of 2^5=32 instead of 20. Now imagine if you need a tree of 25 nodes with a depth of 22. You end up using a huge array of 4194304, whereas you could have used a linked representation to store just the 25 nodes.
You can still use an array and not incur such a memory hit. Just allocate a large block of memory as an array and use array indices as pointers to the children.
Thus, where you had
node.left = (node.index*2)
node.right = (node.index*2+1)
You simply use
node.left = <index of left child>
node.right = <index of right child>
Or you can just use pointers/references instead of integer indices to an array if your language supports it.
Edit:
It might not be obvious to everyone that a complete binary search tree takes up O(2^d) memory. There are d levels and every level has twice as many nodes as the level its parent is in (because every node except those at the bottom has exactly two children - never one). A binary heap is a binary tree (but not a Binary Search Tree) that is always complete by definition, so an array based implementation outlined by the OP does not incur any real memory overhead. For a heap, that is the best way to implement it in code. OTOH, most other binary trees (esp. Binary Search Trees) are not guaranteed to be complete. So trying to use this approach on would need O(2^depth) memory where depth can be as large as n, where we only need O(n) memory in a linked implementation.
So my answer is: yes, this is the best way for a heap. Just don't try it for other binary trees (unless you're sure they will always be complete).