How to downheap? - heap

I am currently working on an assignment for a datastructures and algorithms class.
I have to delete the node out of the heap given;
6 after replacing the node ; 20
/ \ / \
11 9 11 9
/ \ / \ / \ / \
17 18 15 10 17 18 15 10
/
20
The question I have is would I downheap to the right, the left or does it matter?

Since you have a min-heap there, your downheap operation should swap the new parent with the smaller of its children. Otherwise your swap could result in a violation of the heap condition.

You, need to swap the parent node with that child node which is having the smaller value and this process need to continued untill the base condition for heap is satisfied

Related

Can anyone help me make a pseudocode that can traverse this heap in increasing order?

This is the heap.
Min_Heap
The answer should look like this [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15].
I do know for a fact that it repeats, but still having a hard time formulating pseudocode for this problem.
The rules for removing the smallest item from a binary heap:
Take the root element (this is the smallest)
Move the lowest, right-most item from the heap to the root.
If the new node has no children, or it is smaller than both of its children, then you're done.
If the new node is larger than either of its children, swap it with the smallest child.
Go to step 3.
Repeat those steps until the heap is empty.
In your heap, you have:
1
/ \
3 2
/ \ / \
7 5 6 4
/ \ / \ / \ / \
15 11 13 9 14 10 12 8
So you remove the value 1, and replace the root with 8, giving you:
8
/ \
3 2
/ \ / \
7 5 6 4
/ \ / \ / \ /
15 11 13 9 14 10 12
8 is larger than both of its children, so you swap with the smallest child, 2:
2
/ \
3 8
/ \ / \
7 5 6 4
/ \ / \ / \ /
15 11 13 9 14 10 12
Again, 8 is larger than both of its children, so you swap with the smallest, 4:
2
/ \
3 4
/ \ / \
7 5 6 8
/ \ / \ / \ /
15 11 13 9 14 10 12
And you're done because 8 is smaller than its only child, 12.
To remove the next item, you remove the 2, replace it with 12, and move 12 down the heap according to the rules. The result is:
3
/ \
5 4
/ \ / \
7 9 6 8
/ \ / \ / \
15 11 13 12 14 10

What is the tree-structure of a heap?

I'm reading Nicolai M. Josuttis's "The C++ standard library, a tutorial and reference", ed2.
He explains the heap data structure and related STL functions in page 607:
The program has the following output:
on entry: 3 4 5 6 7 5 6 7 8 9 1 2 3 4
after make_heap(): 9 8 6 7 7 5 5 3 6 4 1 2 3 4
after pop_heap(): 8 7 6 7 4 5 5 3 6 4 1 2 3
after push_heap(): 17 7 8 7 4 5 6 3 6 4 1 2 3 5
after sort_heap(): 1 2 3 3 4 4 5 5 6 6 7 7 8 17
I'm wondering how could this be figured out? for example, why the leaf "4" under path 9-6-5-4 is the left side child of node "5", not the right side one? And after pop_heap what's the tree structure then? In IDE debugging mode I could only see see the content of the vector, is there a way to figure out the tree structure?
why the leaf "4" under path 9-6-5-4 is the left side child of node "5", not the right side one?
Because if it was on the right side, that would mean there is a gap in the underlying vector. The tree structure is for illustrative purposes only. It is not a representation of how the heap is actually stored. The tree structure is mapped onto the underlying vector via a simple mathematical formula.
The root node of the tree is the first element of the vector (index 0). The index of the left child of a node is obtained from its parent's index by the simple formula: i * 2 + 1. And the index of the right child is obtained by i * 2 + 2.
And after pop_heap what's the tree structure then?
The root node is swapped with the greater of its two children1, and this is repeated until it is at the bottom of the tree. Then it is swapped with the last element. This element is then pushed up the tree, if necessary, by swapping with its parent if it is greater.
The root node is swapped with the last element of the heap. Then, this element is pushed down the heap by swapping with the greater of its two children1. This is repeated until it is in the correct position (i.e. it is not less than either of its children).
So after pop_heap, your tree looks like this:
----- 8 -----
| |
---7--- ---6---
| | | |
-7- -4- -5- x5
| | | | | | x
3 6 4 1 2 3 9
The 9 is not actually part of the heap anymore, but it is still part of the vector until you erase it, via a call pop_back or similar.
1. if the children are equal, as in the case of the adjacent 7's in the tree in your example, it could go either way. I believe that std::pop_heap sends it to the right, though I'm not sure if this is implementation defined
The first element in the vector is the root at index 0. Its left child is at index 1 and its right child at index 2. In general: left_child(i) = 2 * i + 1 and right_child(i) = 2 * i + 2 and parent(i) = floor((i - 1) / 2)
Another way to think about it is the heap fills each level from left to right in the tree. Following the elements in the vector the first level is 9 (1 value), second level 8 6 (2 values) and third level 7 7 5 5 (4 values), and so on. Both these ways will help you draw the heap in a tree structure when given a vector.

Implementing external mergesort. How to get started?

For a project I have to implement an external mergesort algorithm. It will be used to sort a file with mostly numbers or strings and will be some GBs in size. This is the definition of the mergesort that I've been given
void MergeSort (char *infile,
unsigned char field,
block_t *buffer,
unsigned int nmem_blocks,
char *outfile,
unsigned int *nsorted_segs,
unsigned int *npasses,
unsigned int *nios);
I'm not allowed to change that. The first argument is the file that I'm going to sort. The second the field according to which I want to sort the file (doesn't interest me right now), the third argument is the buffer. Which is a struct. Here is the definition of a block
typedef struct
{
unsigned int blockid;
unsigned int nreserved; // how many reserved entries
record_t entries[MAX_RECORDS_PER_BLOCK]; // array of records
bool valid; // if set, then this block is valid
unsigned char misc;
unsigned int next_blockid;
unsigned int dummy;
} block_t;
the fourth argument is the number of blocks in memory. The last three arguments can be set by me.
My questions are:
Do I take the file and cut it into two files?
Is the buffer a file stored in the harddrive or does it stay in the memory? Do I have to create a new file? I'm a little confused with this part.
These are my thoughts to start right now. First I get the file and split it in two parts. I also create a buffer, which I don't know what size it should have. Then I read the first block of records from the first file, and compare the numbers to the first block of records of the second file. Whenever the number is lesser or equal to another I will send it to the output file. Can you evaluate my stream of thoughts? Or am I thinking it wrong?
Refer my github repo - https://github.com/melvilgit/external-Merge-Sort/blob/master/README.md
Problem Stement
All Sorting Algorithm works within the RAM .When the data to be sorted does not fit into the RAM and instead they resides in the slower external memory (usually a hard drive) , this technique is used . Example , If we have to Sort 100 numbers with each number 1KB and our RAM size is 10KB ,external merge sort works like a charm !.
How to ?
Split Phase
Split the 100 KB file into 10 files each 10kb
Sort the 10KB files using some efficient Sorting Algo in O(nlogn)
Stores each of the smaller files to disk .
Merge Phase
Do a K-way merge with each smaller files one by one. Inline the details .
After the Split Phase , A list of file handler of all the splitted files will be stored - sortedTempFileHandlerList
Now, We creates a list of heapnode - heapnodes. Each heapnode will stores the actual entry read from the file and also the file which owns it . The heapnodes is heapified and it will be a min-heap.
Assuming there 10 files , heapnodes will takes 10KB only (each number assume 1KB) .
Loop While Least Element (the top of heap ) is INT_MAX
Picks the node with least element from heapnodes . ( 0(1) since heapnodes is a min heap )
Write the element to sortedLargeFile (it will be the sorted number)
Find the filehandler of the corresponding element by looking at heapnode.filehandler .
Read the next item from the file . If it's EOF, mark the item as INT_MAX
Put the item to heap top . Again Heapify to persist min heap property .
Continue ;
At the end of the Merge Phase sortedLargeFile will have all the elements in sorted order .
Example
Say We have a file largefile with the following Contents
5 8 6 3 7 1 4 9 10 2
In Split Phase ,We Split them into the Sorted chunks in 5 separate temp files.
temp1 - 5 ,8 temp2 - 3 ,6 temp3 - 1, 7 temp4 -4 , 9 temp5 - 2 ,10
Next Construct a Min Heap with top element from each files
1
/ \
2 5
/ \
4 3
Now picks the least Element from the min heap and write it to sortedOutputFile - 1.
Finds the next element of the file which owns min element 1 .
The no is 7 from temp3 . Move it to heap.
7 2
/ \ / \
2 5 Heapify --> 3 5
/ \ / \
4 3 4 7
Picks the least element 2 and moves it to sortedOutputFile - 1 2.
Finds the next element of the file which owns min element 2 .
The no is 10 from temp5 . Move it to heap.
10 3
/ \ / \
3 5 Heapify --> 4 5
/ \ / \
4 7 10 7
Picks the least element 3 and moves it to sortedOutputFile - 1 2 3.
Finds the next element of the file which owns min element 3 .
The no is 6 from temp2 . Move it to heap.
6 4
/ \ / \
4 5 Heapify --> 6 5
/ \ / \
10 7 10 7
Picks the least element 4 and moves it to sortedOutputFile - 1 2 3 4.
Finds the next element of the file which owns min element 4 .
The no is 9 from temp4 . Move it to heap.
9 5
/ \ / \
6 5 Heapify --> 6 9
/ \ / \
10 7 10 7
Picks the least element 5 and moves it to sortedOutputFile - 1 2 3 4 5.
Finds the next element of the file which owns min element 5.
The no is 8 from temp1 . Move it to heap
8 6
/ \ / \
6 9 Heapify --> 7 9
/ \ / \
10 7 10 8
Picks the least element 6 and moves it to sortedOutputFile - 1 2 3 4 5 6 .
Finds the next element of the file which owns min element 5 . .
We have see EOF . So mark the read no as INT_MAX .
INT_MAX 7
/ \ / \
7 9 Heapify --> 8 9
/ \ / \
10 8 10 INT_MAX
Picks the least element 6 and moves it to sortedOutputFile - 1 2 3 4 5 6 7 .
If we loop this process , we would reaches a point where , the heap would looks like below and the
sortedOutputFile - 1 2 3 4 5 6 7 8 9 10 .
We would also breaks at this point when the min element from heap becomes INT_MAX .
INT_MAX
/ \
INT_MAX INT_MAX
/ \
INT_MAX INT_MAX
I think the solution depends on the size of input file.
What I would do is to check the size of the file first, if it's smaller than certain size, for example, 1GB (providing 1GB is a small amount of memory on your machine), then I'll read the whole file, store the content in memory, merge sort them, then write to the new file.
Otherwise, I'll have to divide the original file to K temp files less than 1GB, merge sort each of them, then do a K way merge sort between the files, and finally concats the K files together. Basically you need to divide and conquer in two levels, first the files on disc, then the contents in memory.
If you are using any reasonable OS (anything with an X in it or BSD) and have enough memory rely on 'sort'. If you hit the limit on file size, use 'split' and then the --merge option of sort to merge the already sorted files.
If you really need to write code for an external sort you can save yourself a lot of trouble by starting with a thorough reading of Knuth's TAOCP Vol III, Chp 5 on external sorting.

Convert to a complete BST function

I need to save data in the bst to a sorted array, then, the ConvertToCompleteBST function clears current tree and uses the array to re-insert the values recursively by inserting the middle value in each call.
InsertRecursively(data_array,size);
For example, if the array was
[5 7 10 12 16 18 30]
then InsertRecursively([5 7 10 12 16 18 30],7) is called first,
the middle is 12 so 12 is the first one is inserted into the tree. Then in the recursive call , it will do the same but for
InsertRecursively([5 7 10],3)
InsertRecursively([16 18 30],3)
Then, InsertRecursively([5 7 10],3) will lead to inserting the node 7 and two other recursive calls
InsertRecursively([5],1) >> lead to inserting node 5
InsertRecursively([10],1) >> lead to inserting node 10
When size is 1, it will stop the recursive call.
I have no idea how to implement the insertRecursively function

C++, How to create and draw a Binary Tree then traverse it in Pre-Order

How do I create a Binary Tree and draw it using a Pre-Order Traversal strategy? The root would be the first number going in.
I have a set of numbers: 48 32 51 54 31 24 39. 48 would be the root. How are the child nodes pushed onto the Binary Tree in a Pre-Order traversal?
Imagine the following sub-problem. You have a set of numbers:
N A1...AX B1...BY
You know that N is the root of the corresponding tree. All you need to know is what numbers form the left sub-tree. Obviously the rest of the numbers form the right sub-tree.
If you remember the properties of a binary-search trees, you would know that elements of the left sub-tree have values smaller than the root (while the ones on the right have values bigger).
Therefore, the left sub-tree is the sequence of numbers that are smaller than (or possibly equal to) N. The rest of the numbers are in the right sub-tree.
Recursively solve for
A1...AX
and
B1...BY
For example given:
10 1 5 2 9 3 1 6 4 11 15 12 19 20
You get:
root: 10
left sub-tree: 1 5 2 9 3 1 6 4
right sub-tree: 11 15 12 19 20
Say you have the following binary tree:
A
/ \
B C
/ \ / \
D E F G
/ \
H I
A Pre-Order Traversal goes NODE, LEFT, RIGHT.
So Pre-Order of this binary tree would be: A B D E H I C F G
For more details on how to implement this in C++: https://stackoverflow.com/a/17658699/445131