How make_heap() function works? - c++

I have basic understanding of Vectors and iterators. But I am facing problem in understanding the output of the below code snippet.
To be specific, I am unable to findout the functionality of make_heap() function.
How it is producing output: 91 67 41 24 59 32 23 13 !!
As per my knowledge, the heap will look like this:
91
/ \
67 41
/ \ / \
59 24 32 23
/
13
So, I was expecting the output as:
91 67 41 59 24 32 23 13
I would really appreciate if anyone can help me in understanding how make_heap() generated such a output.
int main()
{
int aI[] = { 13, 67, 32, 24, 59, 41, 23, 91 };
vector<int> v(aI, aI + 8);
make_heap( v.begin(), v.end() );
vector<int>::iterator it;
for( it = v.begin(); it != v.end(); ++it )
cout << *it << " ";
//Output: 91 67 41 24 59 32 23 13
return 0;
}

A binary heap must satisfy two constraints (in addition for being a binary tree):
The shape property - the tree is a complete binary tree (except for the last level)
The heap property: each node is greater than or equal to each of its children
The ordering of siblings in a binary heap is not specified by the heap property, and a single node's two children can be freely interchanged unless doing so violates the shape property.
So in your example you can freely interchange between the nodes in the second level and get multiple outputs which are all legal.

When heapifying an unsorted array, the algorithm take advantage of the face that half the array will be leaf nodes (the higher indexes in the array) and the other half will be parents to those leaf nodes. The algorithm only has to iterate over the parent nodes and fix up their logical sub-trees. The leaf nodes start out as valid sub-heaps since by definition they are greater than their non-existent child nodes.
So we only have to fix up the sub-heaps that have at least one non-leaf node. Done in the correct order (from the middle of the array to the lowest index), when the last parent node is heapified, the whole array will be a valid heap.
Each step looks as follows:
iteration 1:
13 67 32 24 59 41 23 91
^ current parent under consideration
^ children of this parent
13 67 32 91 59 41 23 24 after heapifying this sub-tree
-- --
iteration 2:
13 67 32 91 59 41 23 24
^ current parent under consideration
^ ^ children of this parent
13 67 41 91 59 32 23 24 after heapifying this sub-tree
-- --
iteration 3:
13 67 41 91 59 32 23 24
^ current parent under consideration
^ ^ children of this parent
13 91 41 67 59 32 23 24 after heapifying this sub-tree
-- --
iteration 4:
13 91 41 67 59 32 23 24
^ current parent under consideration
^ ^ children of this parent
91 13 41 67 59 32 23 24 heapify swap 1
-- --
91 67 41 13 59 32 23 24 heapify swap 2
-- --
91 67 41 24 59 32 23 13 after heapifying this sub-tree
-- --
The naive method of heapifying an array is to walk through the array from index 0 to n-1 and at each iteration 'add' the element at that index to a heap made up of the elements before that index. This will result in the heap that you expected. That algorithm results in n heapify operations. The algorithm used by make_heap() results in n/2 heapify operations. It results in a different but still valid heap.

The make_heap constructs a Binary Heap in the vector by reordering the elements so that they satisfy the heap constraint. The heap constructed is a Maximal Heap, that is it puts the largest (according to operator< or provided compare) element in first element, the root of the heap, which is first element of the vector.
Binary heap is a balanced binary tree satisfying the condition that value in parent node is always larger (in this case, smaller is more common) than values of the child nodes. That means the root always contains largest element. Combined with efficient extraction of the root, this makes a good priority queue.
A binary heap is stored in array in breadth-first preorder. That is root is at position 0, it's immediate children at positions 1 and 2, children of 1 at positions 3 and 4, children of 2 at positions 5 and 6 and so on. In general, children of node n are at positions 2*n + 1 and 2*n + 2.
In C++, the make_heap function together with push_heap and pop_heap implement a complete priority queue over vector. There is also priority_queue container wrapper that combines them in a class.
Priority queues are primarily used in the famous Dijkstra's algorithm and various scheduling algorithms. Because Dijkstra's algorithm needs to select minimum, it is more common to define heap with minimum in the root. C++ standard library chose to define it with maximum, but note that you can trivially get minimal heap by giving it greater_than instead of less_than as comparator.
There are two ways to build a heap. By pushing each element to it, or by fixing the invariant from the first half of elements (the second half are leafs). The later is more efficient, so:
[13, 67, 32, 24, 59, 41, 23, 91]
24 < 91
[13, 67, 32, 91, 59, 41, 23, 24]
32 < 41
[13, 67, 41, 91, 59, 32, 23, 24]
67 < 91
[13, 91, 41, 67, 59, 32, 23, 24]
13 < 91
[91, 13, 41, 67, 59, 32, 23, 24]
the moved down element might still be violating constraint and this time it does: 13 < 67
[91, 67, 41, 13, 59, 32, 23, 24]
and still violating the constraint: 13 < 24
[91, 67, 41, 24, 59, 32, 23, 13]
root processed, we are done

Related

Function to create a list of age range

you have a list of 100 elements ranging from 1 to 98, example [2,4,35,47…].
Write a function that creates three lists with age range 1- 33, 34 - 67, 68 - 98.

Bubble-sorting rows of Fortran 2D array

I am working on the second part of an assignment which asks me to reorder a matrix such that each row is in monotonically increasing order and so that the first element of each row is monotonically increasing. If two rows have the same initial value, the rows should be ordered by the second element in the row. If those are both the same, it should be the third element, continuing through the last element.
I have written a bubble sort that works fine for the first part (reordering each row). I have written a bubble sort for the second part (making sure that the first element of each row is monotonically increasing). However, I am running into an infinite loop and I do not understand why.
I do understand that the issue is that my "inorder" variable is not eventually getting set to true (which would end the while loop). However, I do not understand why inorder is not getting set to true. My logic is the following: once the following code has swapped rows to the point that the rows are all in order, we will pass through the while loop one more time (and inorder will get set to true), which will cause the while loop to end. I am stumped as to why this isn't happening.
inorder = .false.
loopA: do while ( .not. inorder ) !While the rows are not ordered
inorder = .true.
loopB: do i = 1, rows-1 !Iterate through the first column of the array
if (arr(i,1)>arr(i+1,1)) then !If we find a row that is out of order
inorder = .false.
tempArr = arr(i+1,:) !Swap the corresponding rows
arr(i+1,:) = arr(i,:)
arr(i,:) = tempArr
end if
if (arr(i,1)==arr(i+1,1)) then !The first elements of the rows are the same
loopC: do j=2, cols !Iterate through the rest of the row to find the first element that is not the same
if (arr(i,j)>arr(i+1,j)) then !Found elements that are not the same and that are out of order
inorder = .false.
tempArr = arr(i+1,:) !Swap the corresponding rows
arr(i+1,:) = arr(i,:)
arr(i,:) = tempArr
end if
end do loopC
end if
end do loopB
end do loopA
Example input:
6 3 9 23 80
7 54 78 87 87
83 5 67 8 23
102 1 67 54 34
78 3 45 67 28
14 33 24 34 9
Example (correct) output (that my code is not generating):
1 34 54 67 102
3 6 9 23 80
3 28 45 67 78
5 8 23 67 83
7 54 78 87 87
9 14 24 33 34
It is also possible that staring at this for hours has made me miss something stupid, so I appreciate any pointers.
When you get to compare rows where the first element is identical, you then go through the whole array and compare every single item.
So if you have two arrays like this:
1 5 3
1 2 4
Then the first element is the same, it enters the second part of your code.
In second place, 5>2, so it swaps it:
1 2 4
1 5 3
But then it doesn't stop. In third place, 4>3, so it swaps it back
1 5 3
1 2 4
And now you're back to where you were.
Cheers

Is there better option than map?

Well, I am making a c++ program, that goes through long streams of symbols and I need to store information for further analysis where in the stream appears symbol sequences of certain length. For instance in binary stream
100110010101
I have a sequences for example of length 6 like this:
100110 starting on position 0
001100 starting on position 1
011001 starting on position 2
etc.
What I need to store are vectors of all positions where I can find the one certain sequence. So the result should be something like a table, maybe resembling a hash table that look like this:
sequence/ positions
10010101 | 1 13 147 515
01011011 | 67 212 314 571
00101010 | 2 32 148 322 384 419 455
etc.
Now, I figured mapping strings to integers is slow, so because I have information about symbols in the stream upfront, I can use it to map this fixed length sequences to an integer.
The next step was to create a map, that maps these "representing integers" to a corresponding index in the table, where I add next occurence of this sequence. However this is slow, much slower than I can afford. I tried both ordered and unordered map of both std and boost libraries, none having enough efficiency. And I tested it, the map is the real bottleneck here
And here is the loop in pseudocode:
for (int i=seqleng-1;i<stream.size();i++) {
//compute characteristic value for the sequence by adding one symbol
charval*=symb_count;
charval+=sdata[j][i]-'0';
//sampspacesize is number off all possible sequence with this symbol count and this length
charval%=sampspacesize;
map<uint64,uint64>::iterator &it=map.find(charval);
//if index exists, add starting position of the sequence to the table
if (it!=map.end()) {
(table[it->second].add(i-seqleng+1);
}
//if current sequence is found for the first time, extend the table and add the index
else {
table.add_row();
map[charval]=table.last_index;
table[table.last_index].add(i-seqleng+1)
}
}
So the question is, can I use something better than a map to keep the record of corresponding indeces in the table, or is this the best way possible?
NOTE: I know there is a fast way here, and thats creating a storage large enough for every possible symbol sequence (meaning if I have sequence of length 10 and 4 symbols, I reserve 4^10 slots and can omitt the mapping), but I am going to need to work with lengths and number of symbols that results in reserving amount of memory way beyond the computer's capacity. But the the actual number of used slots will not exceed 100 million (which is guaranteed by the maximal stream length) and that can be stored in a computer just fine.
Please ask anything if there is something unclear, this is my first large question here, so I lack experience to express myself the way others would understand.
An unordered map with pre-allocated space is usually the fastest way to store any kind of sparse data.
Given that std::string has SSO I can't see why something like this won't be about as fast as it gets:
(I have used an unordered_multimap but I may have misunderstood the requirements)
#include <unordered_map>
#include <string>
#include <iostream>
using sequence = std::string; /// #todo - perhaps replace with something faster if necessary
using sequence_position_map = std::unordered_multimap<sequence, std::size_t>;
int main()
{
auto constexpr sequence_size = std::size_t(6);
sequence_position_map sequences;
std::string input = "11000111010110100011110110111000001111010101010101111010";
if (sequence_size <= input.size()) {
sequences.reserve(input.size() - sequence_size);
auto first = std::size_t(0);
auto last = input.size();
while (first + sequence_size < last) {
sequences.emplace(input.substr(first, sequence_size), first);
++first;
}
}
std::cout << "results:\n";
auto first = sequences.begin();
auto last = sequences.end();
while(first != last) {
auto range = sequences.equal_range(first->first);
std::cout << "sequence: " << first->first;
std::cout << " at positions: ";
const char* sep = "";
while (first != range.second) {
std::cout << sep << first->second;
sep = ", ";
++first;
}
std::cout << "\n";
}
}
output:
results:
sequence: 010101 at positions: 38, 40, 42, 44
sequence: 000011 at positions: 30
sequence: 000001 at positions: 29
sequence: 110000 at positions: 27
sequence: 011100 at positions: 25
sequence: 101110 at positions: 24
sequence: 010111 at positions: 46
sequence: 110111 at positions: 23
sequence: 011011 at positions: 22
sequence: 111011 at positions: 19
sequence: 111000 at positions: 26
sequence: 111101 at positions: 18, 34, 49
sequence: 011110 at positions: 17, 33, 48
sequence: 001111 at positions: 16, 32
sequence: 110110 at positions: 20
sequence: 101010 at positions: 37, 39, 41, 43
sequence: 010001 at positions: 13
sequence: 101000 at positions: 12
sequence: 101111 at positions: 47
sequence: 110100 at positions: 11
sequence: 011010 at positions: 10
sequence: 101101 at positions: 9, 21
sequence: 010110 at positions: 8
sequence: 101011 at positions: 7, 45
sequence: 111010 at positions: 5, 35
sequence: 011101 at positions: 4
sequence: 001110 at positions: 3
sequence: 100000 at positions: 28
sequence: 000111 at positions: 2, 15, 31
sequence: 100011 at positions: 1, 14
sequence: 110001 at positions: 0
sequence: 110101 at positions: 6, 36
After many suggestions in comments and answer, I tested most of them and picked the fastest possibility, reducing the bottleneck caused by the mapping to almost the same time it ran without the "map"(but producing incorrect data, however I needed to find the minimum speed this can be reduced to)
This was achieved by replacing the unordered_map<uint64,uint> and vector<vector<uint>> with just unordered_map<uint64, vector<uint> >, more precisely boost::unordered_map. I tested it also with unord_map<string,vector<uint>> and it surprised me that it was not that much slower as I expected. However it was slower.
Also, probably due to the fact ordered_map moves nodes to remain a balanced tree in its internal structure, ord_map<uint64, vector<uint>> was a bit slower than ord_map<uint64,uint> together with vector<vector<uint>>. But since unord_map does not move its internal data during computation, seems that it is the fastest possible configuration one can use.

What's the simplest way to split 10 random numbers into two lists where the difference in the sum of the lists is as small as possible

You get 10 numbers that you have to split into two lists where the sum of numbers in the lists have the smallest difference possible.
so let's say you get:
10 29 59 39 20 17 29 48 33 45
how would you sort this into two lists where the difference in the sum of the lists is as small as possible
so in this case, the answer (i think) would be:
59 48 29 17 10 = 163
45 39 33 29 20 = 166
I'm using mIRC script as the language but perl or C++ is just as good for me.
edit: actually there can be multiple answers such as in this scenario, it could also be:
59 48 29 20 10 = 166
45 39 33 29 17 = 163
to me, it doesn't matter so long as the end result is that the difference of the sum of the lists is as small as possible
edit 2: each list must contain 5 numbers.
What you have listed is exactly the partition problem (for more details look at http://en.wikipedia.org/wiki/Partition_problem).
The point is that this is a NP-complete problem, therefore it does not exist a program able to solve any instance of this problem (i.e. with a bigger amount of numbers).
But if your problem is always with only ten numbers to divide into two lists of exactly five items each, then it becomes feasible, also to try naively all possible solutions, since they are only p^N, where p=2 is the number of partitions, and N=10 is the number of integers, thus only 2^10=1024 combinations, and each takes only O(N) to be verified (i.e. compute the difference).
Otherwise you can implement the greedy algorithm described in the Wikipedia page, it is simple to implement but there is no guarantee of optimality, in fact you can see this implementation in Java:
static void partition() {
int[] set = {10, 29, 59, 39, 20, 17, 29, 48, 33, 45}; // array of data
Arrays.sort(set); // sort data in descending order
ArrayList<Integer> A = new ArrayList<Integer>(5); //first list
ArrayList<Integer> B = new ArrayList<Integer>(5); //second list
String stringA=new String(); //only to print result
String stringB=new String(); //only to print result
int sumA = 0; //sum of items in A
int sumB = 0; //sum of items in B
for (int i : set) {
if (sumA <= sumB) {
A.add(i); //add item to first list
sumA+=i; //update sum of first list
stringA+=" "+i;
} else {
B.add(i); //add item to second list
sumB+=i; //update sum of second list
stringB+=" "+i;
}
}
System.out.println("First list:" + stringA + " = " + sumA);
System.out.println("Second list:"+ stringB+ " = " + sumB);
System.out.println("Difference (first-second):" + (sumA-sumB));
}
It does not return a good result:
First list: 10 20 29 39 48 = 146
Second list: 17 29 33 45 59 = 183
Difference (first-second):-37

Direct-inclusion sorting

What is the other name for direct-inclusion sorting and what is the algorithm for the same sort?
I have been trying to search on the Internet, but I'm not getting a straight answer, but I can not find any. I found this algorithm for straight insertion sort and in some books it's saying they are the same with direct direct-inclusion sorting, but I'm doubting it because the book is in Russian, so I want to confirm (that is, if it's true or might I have a translation error?)
Code in C++:
int main(int argc, char* argv[])
{
int arr[8] = {27, 412, 71, 81, 59, 14, 273, 87},i,j;
for (j=1; j<8; j++){
if (arr[j] < arr[j-1]) {
//Что бы значение j мы не меняли а работали с i
i = j;
//Меняем местами пока не найдем нужное место
do{
swap(arr[i],arr[i-1]);
i--;
//защита от выхода за пределы массива
if (i == 0)
break;
}
while (arr[i] < arr[i-1]) ;
}
for (i=0;i<8;i++)
cout << arr[i]<< ' ';
cout << '\n';
}
getch();
return 0;
}
Result
27 412 71 81 59 14 273 87
27 71 412 81 59 14 273 87
27 71 81 412 59 14 273 87
27 59 71 81 412 14 273 87
14 27 59 71 81 412 273 87
14 27 59 71 81 273 412 87
14 27 59 71 81 87 273 412
The posted code is Insertion sort.
Most implementations will copy an out-of-order element to a temporary variable and then work backwards, moving elements up until the correct open spot is found to "insert" the current element. That's what the pseudocode in the Wikipedia article shows.
Some implementations just bubble the out-of-order element backwards while it's less than the element to its left. That's what the inner do...while loop in the posted code shows.
Both methods are valid ways to implement Insertion sort.
The code you posted looks not like an algorithm for insertion sort, since you are doing a repeated swap of two neighboring elements.
Your code looks much more like some kind of bubble-sort.
Here a list of common sorting algorithms:
https://en.wikipedia.org/wiki/Sorting_algorithm
"straight insertion" and "direct inclusion" sounds like pretty much the same .. so I quess they probably are different names for the same algorithm.
Edit:
Possibly the "straight" prefix should indicate that only one container is used .. however, if two neighboring elements are swaped, I would not call it insertion-sort, since no "insert" is done at all.
Given the fact that the term "direct inclusion sort" yields no google hits at all, and "direct insertion sorting" only 27 hits, the first three of which are this post here and two identically phrased blog posts, I doubt that this term has any widely accepted meaning. So the part of your question about
some book its saying they are the same with direct direct-inclusion sorting
is hard to answer, unless we find a clear definition of what direct-inclusion sorting actually is.