binary heap - how and when to use max-heapify - c++

i'm reading about the heap data structure, and i can't figure out when to use max heapify function and why.
I wrote a insert function that will always keep the heap a max-heap and i can't see when will max-heapify ever be used.
Can you please explain?
Thank you
this is my code:
int PARENT(int i) {
return i/2;
}
int LEFT(int i) {
return 2*i;
}
int RIGHT(int i ) {
return 2*i +1;
}
void max_heapify(int *v, int index, int heapsize) {
int largest;
int left = LEFT(index);
int right = RIGHT(index);
if (left<heapsize && v[left] > v[index])
largest = left;
else
largest = index;
if (right < heapsize && v[right] > v[largest])
largest = right;
if (largest !=index) {
v[index] = v[index] ^v[largest];
v[largest] = v[index] ^v[largest];
v[index] = v[index] ^v[largest];
max_heapify(v,largest,heapsize);
}
}
void insert(int *v, int * length, int value) {
v[++*length] = value;
int valuePos = *length;
int parent = PARENT(valuePos);
if (parent!=valuePos) {
while (v[parent] < v[valuePos]) {
v[parent] = v[parent] ^ v[valuePos];
v[valuePos] = v[parent] ^v[valuePos];
v[parent] = v[parent] ^ v[valuePos];
valuePos = parent;
parent = PARENT(valuePos);
}
}
}

The heapify algorithm should be used when turning an array into a heap. You could do that by inserting each array element in turn into a new heap, but that would take O(n lg n) time, while heapify does it in O(n) time.

max_heapify is expected to invoke on a regular array, to make it a heap. And insert does the maintenance work, which requires the array (v in your function) already being a heap.

The max-heapify function, as you call it, is a general heapify function (a heap can use any valid comparison function for sorting it's elements). It is intended to be used as an init function for constructing a heap from an array.
The complexities of functions for dealing with a heap (with their intented usages):
init (max-heapify): O(n) , used to initialize a heap from a sorted sequence (array) (max-sorted, in your case)
insert : O(lg n) , used to insert a single element in a heap (maintains the heap tree "sorted")
delete : O(lg n) , used to remove a "best" (max, in your case) element from a heap (maintains the heap tree "sorted")
But, since this question is tagged C++, you should also consider using a std::set from STL instead of implementing your own heap. Complexities of the considered operations are the same as for any heap implementation, and it can easily operate with any comparison function (either pre-written or user-written). Another advantage against a heap implementation is that it is a sorted container, and you can easily iterate trough all the elements in the sorted order (not just the first one) without destroying the structure.
The only problem with std::set is that it is a unique container - meaning, only 1 copy of an element with a same key can exist in it. But there is a solution for that also - std::multiset keeps sorted instances of multiple objects with the same key.
Also, depending on your required usage (it there is a lot of data associated with the search key), you might also want to try std::map or std::multimap.
If you want to make your own heap implementation, I would strongly suggest putting it in a separate class (or even a namespace) if your intention is to use C++ to the fullest. If you just intend to keep the implementation in the form it is, you should consider re-tagging the question to C

You need to insert the data in heap randomly like in array. Afterwards u can call the max heapify function to keep the property of a Max Heap. Here is my code
class max_heap{
private: // are the private members of class
int *arr;
int size;
int ind;
};
void max_heap::bubbledown(int *ar, int i)
{
int len = ind - 1;
int lt = 2 * i;
int rt = lt + 1;
while (lt <= len && rt <= len)
{
if (arr[i] > arr[lt] && arr[i] > arr[rt])
break;
else if (ar[lt] > ar[rt])
{
if (ar[i] < ar[lt]){
swap(ar[i], ar[lt]);
i = lt;
lt = 2 * i;
}
}
else if (ar[lt] < ar[rt])
{
if (ar[i] < ar[rt]){
swap(ar[i], ar[rt]);
i = rt;
rt = (2 * i)+1;
}
}
}
}
void max_heap::heapify()
{
int len = ind - 1;
for (int i = len; i >= 1 && (i/2) >= 1; i--)
{
if (arr[i] > arr[i/2])
{
swap(arr[i], arr[i/2]);
bubbledown(arr, i);
}
}
}

Related

Why loop starts with i = n/2 for doing heap sort?

I need to change max-heap code to min-heap code. I changed some parts, but when I print, I get only the min-heap array by order, not sorted.
#include <iostream>
#include <fstream>
#define MAX_TREE 100
using namespace std;
typedef struct {
int key;
}element;
element a[MAX_TREE];
void SWAP(element root, element target, element temp) {
root = target;
target = temp;
}
void adjust(element e[], int root, int n) { //array로 입력된 tree를 min heap으로 adjust(조정해서) sort
/*adjust the binary tree to etablish the heap*/
int child, rootkey;
element temp;
temp = a[root]; //root element assign
rootkey = a[root].key; //root element's key value
child = 2 * root; //root ( a[i] )의 left child
//leftChild: i * 2 (if i * 2 <= n) rightChild: i * 2 + 1(if 1 * 2 + 1 <= n)
while (child <= n) { //if child exists
if ((child < n) &&//compare left child with right child
(a[child].key > a[child + 1].key))//
//if leftChild.key > rightChild.key
child++;//move to smaller child
if (rootkey < a[child].key) //if it satisfies min heap
break; //break when root key is smaller than child's key
else { //if it doesn't satisfies min heap
a[child / 2] = a[child];
//assign child to parent
child *= 2;
//loop until there's no child
}
}
a[child / 2] = temp; //if there's no more child, assign root element to child/2
}
void heapSort(element a[], int n) {
/*perform a heap sort on a[1:n]*/
int i;
element temp;
temp = a[1];
for (i = n / 2; i > 0; i--) { //<-This is the part I don't understand
adjust(a, i, n);
}
for (i = n - 1; i > 0; i-- ) {
SWAP(a[1], a[i + 1], temp);
adjust(a, 1, i);
}
}
void P1() {
int n;
std::fstream in("in1.txt");
in >> n;
printf("\n\n%d\n", n);
for (int i = 1; i <= n; i++) {
element temp;
in >> temp.key;
a[i] = temp;
}
heapSort(a, n);
//6 5 51 3 19 52 50
}
int main() {
P1();
}
It's my professor's example code. I need to input numbers from file in1.txt.
In that file there are values for n, m1, m2, m3...
n is the number of key values that will follow. Following m1, m2, ... are the key values of each element.
After getting input, I store integers in an array that starts with index [1]: it's a binary tree represented as an array.
I need to min-heapify this binary tree and apply heapsort.
This code was originally max-heap sort code. There are maybe some lines I missed to change.
I don't get why I need to start for statement with i = n/2. What's the reason?
Why for statement starts with i=n/2?
This is the part I don't understand
This loop:
for (i = n / 2; i > 0; i--) {
adjust(a, i, n);
}
... is the phase where the input array is made into a heap. The algorithm calls adjust for every internal node of the binary tree, starting with the "last" of those internal nodes, which sits at index n/2. This is Floyd's heap construction algorithm.
There would be no benefit to calling adjust on indexes that are greater than n/2, as those indices represent leaves in the binary tree, and there is nothing to "adjust" there.
The call of adjust will move the value at the root of the given subtree to a valid position in that subtree, so that that subtree is a heap. By moving backwards, this accumulates to bigger subtrees becoming heaps, until also the root is a heap.
The error
The error in your code is the SWAP function. As you pass arguments by value, none of the assignments in that function impact a.
Correction:
void SWAP(element &root, element &target) {
element temp = root;
root = target;
target = temp;
}
And on the caller side, drop temp.

When editing values in a binary heap, and calling heapify again, the result is not as expected

I am looking at a heap problem:
We are given a array with the index value in which we need to check
if that index value in the array (i.e. the given heap tree) is in the correct position or not.
If not then place the element at the correct position and this process is called heapify.
This is the example heap where values are modified:
My code:
//min-heapify
#include<bits/stdc++.h>
using namespace std;
class minHeap{
int *arr;
int size;
int capacity;
public:
minHeap(int c){
size = 0;
capacity = c;
arr = new int[c];
}
int left(int i) { return (2 * i + 1); }
int right(int i) { return (2 * i + 2); }
int parent(int i) { return (i - 1) / 2; }
//insert fun
void insert(int t){
if(size == capacity)
return;
size++;
arr[size-1] = t;
for(int i=size-1;i!=0 && arr[parent(i)] > arr[i];){
swap(arr[i],arr[parent(i)]);
i = parent(i);
}
}
//print array
void print_array(){
for(int i=0;i<size;i++){
cout<<arr[i]<<" ";
}
}
//min-heapify
void minHeapify(int i){
//finding the smallest among lc rc and current node
int lc = left(i); //left chid
int rc = right(i); //left child
int smallest = i;
//if left child exist and smaller then current
if(lc<size && arr[lc] < arr[i]){
smallest = lc;
}
//if right child exist and smaller then current
if(rc<size && arr[rc] < arr[smallest]){
smallest = rc;
}
//
if(smallest!=i){ //
swap(arr[smallest],arr[i]);
minHeapify(smallest);
}
}
//changes the value in array
void edit(int index,int value){
arr[index] = value;
}
//show the value at any index
int show(int index){
return arr[index];
}
};
int main(){
minHeap h(15);
h.insert(40);
h.insert(20);
h.insert(30);
h.insert(35);
h.insert(25);
h.insert(80);
h.insert(32);
h.insert(100);
h.insert(60);
h.insert(70);
cout<<"perfect minHeap-->"<<endl;
h.print_array();
cout<<endl<<endl;
cout<<"value changed (index 0 and 3)-->"<<endl;
h.edit(0,40);
h.edit(3,20);
h.print_array();
cout<<endl<<endl;
cout<<"passing index of 0-->"<<endl;
h.minHeapify(0);
cout<<"function running...\n"<<endl;
cout<<"Final Array after minHeapify-->"<<endl;
h.print_array();
cout<<endl;
cout<<"left of 20 is :-";
int index = h.left(1);
cout<<h.show(index);
return 0;
}
As you can see in my tree diagram and represented in array form, the output should be the same as a perfect minheap: we check the left of 20 to make sure we get 25.
The heap starts at index 0.
This is expected behaviour. The minHeapify assumes that the children of the node at the given index are both heaps, but this is evidently not the case after having mutated the heap with those two edits. Consequently the result of calling minHeapify(0) is not guaranteed to be a heap.
To solve this, I would suggest that you alter the edit function so that it guarantees that it restores the heap immediately, and makes it unnecessary to make separate calls afterwards to restore the heap.
In order to do that I would first suggest to move the second part of the insert function into a separate siftUp function, so we can reuse that part:
void insert(int t){
if(size == capacity)
return;
size++;
arr[size-1] = t;
siftUp(size-1); // Defer this work to another function
}
void siftUp(int j) {
for (int i = j; i != 0 && arr[parent(i)] > arr[i]; i = parent(i)) {
swap(arr[i], arr[parent(i)]);
}
}
Then modify edit such that it either sifts the new value up or down, depending on how it is about to modify the current node's value:
void edit(int index,int value){
if (value < arr[index]) {
arr[index] = value;
siftUp(index);
} else {
arr[index] = value;
minHeapify(index);
}
}
Now your heap will remain a valid heap even after making calls to edit. There is no more need to make an additional minHeapify call:
int main(){
minHeap h(15);
h.insert(40);
h.insert(20);
h.insert(30);
h.insert(35);
h.insert(25);
h.insert(80);
h.insert(32);
h.insert(100);
h.insert(60);
h.insert(70);
cout<<"perfect minHeap-->"<<endl;
h.print_array();
cout<<endl<<endl;
cout<<"value changed (index 0 and 3)-->"<<endl;
h.edit(0,40);
h.edit(3,20);
h.print_array();
cout<<endl;
return 0;
}
This code outputs at the end:
20 25 30 35 40 80 32 100 60 70
Which is this tree:
20
/ \
25 30
/ \ / \
35 40 80 32
/ \ /
100 60 70
...which is the final tree in the image you shared (except for a tiny difference at 60-70, which probably means you had those values inserted in reversed order -- but it is irrelevant for the question).
Why a single minHeapify(0) doesn't do the trick
When you edit the values in a heap without any further action, then the array no longer represents a heap. But h.minHeapify(0) can only guarantee to restore the heap when it is guaranteed that the children of node 0 are already heaps. Evidently this is not the case after the edit at index 3: the subtree rooted at index 1 is no longer a heap. And so a call to h.minHeapify(0) will not restore the heap property. minHeapify is an algorithm that only looks at a few nodes in the array, so it could never restore from just any edit in the array.
If you want the edits to not be accompanied by heap-correcting measures (by calling minHeapify or siftUp), then the only thing you can do after those edits is to rebuild the heap from scratch.
For that you can use the O(n) algorithm for turning any array into a heap:
void buildHeap() {
for (int i = (size - 1) / 2; i >= 0; i--)
minHeapify(i);
}
In your driver code you should then replace the h.minHeapify(0) call with a call of h.buildHeap().
Note that this algorithm takes more time than a simple h.minHeapify(0) call.
A comment on your code:
int index = h.left(1);
This does not retrieve the left child of 20. This code looks at index 1 (not index 0), where the node's value is 25 (not 20).

Maintain an unordered_map but at the same time need the lowest of it's mapped values at every step

I have an unordered_map<int, int> which is updated at every step of a for loop. But at the end of the loop, I also need the lowest of the mapped values. Traversing it to find the minimum in O(n) is too slow. I know there exists MultiIndex container in boost but I can't use boost. What is the simplest way it can be done using only STL?
Question:
Given an array A of positive integers, call a (contiguous, not
necessarily distinct) subarray of A good if the number of different
integers in that subarray is exactly K.
(For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.)
Return the number of good subarrays of A.
My code:
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
unordered_map<int, int> M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M[A[right]] = right;
if (right == A.size())
return 0;
int smallest, count;
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count = smallest - left + 1;
for (; right < A.size(); ++right)
{
M[A[right]] = right;
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count += smallest - left + 1;
}
return count;
}
};
Link to the question: https://leetcode.com/problems/subarrays-with-k-different-integers/
O(n) is not slow, in fact it is the theoretically fastest possible way to find the minimum, as it's obviously not possible to find the minimum of n items without actually considering each of them.
You could update the minimum during the loop, which is trivial if the loop only adds new items to the map but becomes much harder if the loop may change existing items (and may increase the value of the until-then minimum item!), but ultimately, this also adds O(n) amount of work, or more, so complexity-wise, it's not different from doing an extra loop at the end (obviously, the constant can be different - the extra loop may be slower than reusing the original loop, but the complexity is the same).
As you said, there are data structures that make it more efficient (O(log n) or even O(1)) to retrieve the minimum item, but at the cost of increased complexity to maintain this data structure during insertion. These data structures only make sense if you frequently need to check the minimum item while inserting or changing items - not if you only need to know the minimum only at the end of the loop, as you described.
I made a simple class to make it work although it's far from perfect, it's good enough for the above linked question.
class BiMap
{
public:
void insert(int key, int value)
{
auto itr = M.find(key);
if (itr == M.cend())
M.emplace(key, S.insert(value).first);
else
{
S.erase(itr->second);
M[key] = S.insert(value).first;
}
}
void erase(int key)
{
auto itr = M.find(key);
S.erase(itr->second);
M.erase(itr);
}
int operator[] (int key)
{
return *M.find(key)->second;
}
int size()
{
return M.size();
}
int minimum()
{
return *S.cbegin();
}
private:
unordered_map<int, set<int>::const_iterator> M;
set<int> S;
};
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
BiMap M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M.insert(A[right], right);
if (right == A.size())
return 0;
int count = M.minimum() - left + 1;
for (; right < A.size(); ++right)
{
M.insert(A[right], right);
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
count += M.minimum() - left + 1;
}
return count;
}
};

How can I convert this C++ hash table to dynamically expand and shrink instead of having a hard-set max value?

I found a hash table implementation online. It works by having a fixed limit of stored values, 200. No, In case I need more, I don't want to just increase the hard limit. Instead, is there a way to make it dynamically expand to hold more values?
#include<iostream>
#include<cstdlib>
#include<string>
#include<cstdio>
using namespace std;
class HashTableEntry {
public:
int k;
int v;
HashTableEntry(int k, int v) {
this->k= k;
this->v = v;
}
};
class HashMapTable {
private:
HashTableEntry **t;
unsigned int t_s;
public:
HashMapTable() {
t = new HashTableEntry * [t_s];
t_s = 200;
for (int i = 0; i< t_s; i++) {
t[i] = NULL;
}
}
int HashFunc(int k) {
return k % t_s;
}
void Insert(int k, int v) {
int h = HashFunc(k);
while (t[h] != NULL && t[h]->k != k) {
h = HashFunc(h + 1);
}
if (t[h] != NULL)
delete t[h];
t[h] = new HashTableEntry(k, v);
}
int SearchKey(int k) {
int h = HashFunc(k);
while (t[h] != NULL && t[h]->k != k) {
h = HashFunc(h + 1);
}
if (t[h] == NULL)
return -1;
else
return t[h]->v;
}
void Remove(int k) {
int h = HashFunc(k);
while (t[h] != NULL) {
if (t[h]->k == k)
break;
h = HashFunc(h + 1);
}
if (t[h] == NULL) {
cout<<"No Element found at key "<<k<<endl;
return;
} else {
delete t[h];
}
cout<<"Element Deleted"<<endl;
}
~HashMapTable() {
for (int i = 0; i < t_s; i++) {
if (t[i] != NULL)
delete t[i];
delete[] t;
}
}
};
The issue is if I use realloc or something and increase and decrease t_s it might change the keys in which values are stored, and break the hash table. Another issue is, when it has no items stored t_s would be 0, and in hashFunc, remainder by 0 is undefined. How would I handle these problems? How would I create a dynamically increasing and shrinking hash table in C++?
Typically people who are running "batch computations" (not with any real-time sensitivity) will just take the hit and create a copy with a larger size, then swap it in.
There are methods to incrementally grow a hash table so that you can still have O(1) access while growing but the constant hidden in the O(1) grows larger for all accesses and they're tricky to get right.
A different suggestion to have incremental growth while retaining O(1) access is to have a "stack" of hash tables - start with, say a 200 entry hash table, then when that reaches its fill limit (0.7 or 0.8 full, whatever you choose) push a 400 entry hash table on the stack and put new entries in it. Each time the top of the stack gets full push another empty double-size hash table on it. Add items only to the top of the stack. But you must search all stack elements (each hash table) on access before deciding that the item is missing. So your O(1) access grows - but this is a simpler scheme to get right.

Uninitialized Local Variable 'Quick' Used

I'm making this function which counts the total amount of swaps and comparisons a quick sort function would do in total. When I run it, however, I get this error:
error C4700: uninitialized local variable 'quick' used
This happens in the 'if' statement for the base case listed in the function code below. SwapandComp is the name of the struct I am using to keep track of both the swaps and comparisons for the sorting, and partition is the function where we find where to separate the original array, and it is also where we count the swaps and comparisons.
int partition(int numbers[], int i, int k) {
int l = 0;
int h = 0;
int midpoint = 0;
int pivot = 0;
int temp = 0;
bool done = false;
// Pick middle element as pivot
midpoint = i + (k - i) / 2;
pivot = numbers[midpoint];
l = i;
h = k;
while (!done) {
// Increment l while numbers[l] < pivot
while (numbers[l] < pivot) {
++l;
totalComps++;
}
// Decrement h while pivot < numbers[h]
while (pivot < numbers[h]) {
--h;
totalComps++;
}
// If there are zero or one elements remaining,
// all numbers are partitioned. Return h
if (l >= h) {
totalComps++;
done = true;
}
else {
// Swap numbers[l] and numbers[h],
// update l and h
temp = numbers[l];
numbers[l] = numbers[h];
numbers[h] = temp;
totalSwaps++;
++l;
--h;
}
}
return h;
}
And now here is the quick sort function. As mentioned before, SwapandComp is the struct I used to keep track of both swaps and comparisons.
SwapandComp quicksort(int numbers[], int i, int k) {
SwapandComp quick;
int j = 0;
int z = 0;
// Base case: If there are 1 or zero elements to sort,
// partition is already sorted
if (i >= k) {
return quick;
}
// Partition the data within the array. Value j returned
// from partitioning is location of last element in low partition.
j = partition(numbers, i, k);
// Recursively sort low partition (i to j) and
// high partition (j + 1 to k)
quickSort(numbers, i, j);
quicksort(numbers, j + 1, k);
quick.swaps = totalSwaps;
quick.comps = totalComps;
return quick;
}
On the second line down, I write
SwapandComp quick;
to use for the quick sort struct. The error doesn't really make sense to me because I did declare 'quick' as a new struct to have the function return. Any help is appreciated! Thanks!
Initialize struct as bellow :
SwapandComp quick = { 0 };
SwapandComp quick;
Unless that type has a constructor, declaring a variable with it inside a function will leave it in an indeterminate state. Then returning it (without first initialising it, as per your base case) will cause exactly the issue you're seeing, a "using an uninitialised variable" warning.
You could just initialise the members when declaring it, such as with:
SwapandComp quick; quick.swaps = quick.comps = 0;
But a better way to do it is with a real initialisers, something like:
struct SwapAndComp {
unsigned swaps;
unsigned comps;
SwapAndComp(): swaps(0U) , comps(0U) {};
};
This method (initialisation as part of the class itself) allows you to properly create the structure without any users of it needing to worry about doing it correctly. And, if you want flexibility, you can simply provide a constructor that allows it while still defaulting to the "set to zero" case:
SwapAndComp(unsigned initSwaps = 0U, unsigned initComps = 0U)
: swaps(initSwaps) , comps(initComps) {};