Binary Merge sort & Natural Merge sort - c++

I know that homework questions are not the most popular on here, but I am at a total loss. I am doing an assignment which requires us to make multiple sorting algorithms. One of them however, is driving me insane. I can find no examples of it online anywhere, and he did not go over it fully in class. We have to make a merge sort that looks like this:
void mergeSort(int * a, int s, bool n = false)
Where a is the array, s is the size of said array, and n is false for binary merge sort, and true for natural merge sort. The problem is, I cant find what natural merge sort and binary merge sort are. I just find mergesort. And all of them ask for far more variables.
I am simply asking if anyone knows where I can find a good explanation of those two different types of mergesort.

I'm no expert on the topic, but the wikipedia page seems to be a good starting point
http://en.wikipedia.org/wiki/Merge_sort
It contains a section on natural merge sort with an example.
About binary merge sort:
A variant named binary merge sort uses a binary insertion sort to sort
groups of 32 elements, followed by a final sort using merge sort. It
combines the speed of insertion sort on small data sets with the speed
of merge sort on large data sets
And insertion sort may be read about here: http://en.wikipedia.org/wiki/Insertion_sort
which contains a selection on binary insertion sorting.
About the variables. The wikipedia example of 'bottom up merge sort' (of which natural merge sort is a variant) has this signature:
void BottomUpSort(A[], B[], n)
where A is the array to be sorted, n its length. B is a work array, and if a read the algoritm right it needs be of length n too. Anyway, it can be created in the beginning of the algoritm and deleted in the end.

Related

Benefit of printing values from an array in ascending order by selecting?

I read the tutorial regarding arrange a number of array in ascending order and understood the idea https://www.includehelp.com/cpp-programs/sort-an-array-in-ascending-order.aspx . However, now I'm thinking of other way to perform the operation. Wonder will the idea below works?
The method will be using while loop and check (while remaining number in array not equal to 0), find the smallest number in the array, print out the number and remove it from array. Repeat the same process until remaining number in array = 0. So my numbers will be print out in ascending order also and the number in the array will decrease in each loop until it reached zero.
I started learning programming just few weeks ago and have trouble writing out the code now. However I'm interested to know if this method will work? If cannot, please explain why.
What you've described is a variant of what's normally called a "selection sort". It's pretty well known. It does work, but there are many sorting algorithms that work--and while there are a few sorting algorithms that are generally less efficient, it's still one of the least efficient around.
Selection sort is typically faster than Bubble sort and a few of its variants like Shaker sort. Depending on the precise situation, it can also be faster than insertion sort, though that's pretty unusual. Those three (bubble sort, insertion sort, and selection sort) are the best known of the simple sorting algorithms. Of the three, bubble sort is most often the slowest, and insertion sort most often the fastest. But all three take time proportional to the square of the number of items being sorted, which means they get much slower in a hurry as you try to sort more items. If you have very many items, more advanced algorithms (e.g., Shell-Metzner, Quicksort, heap sort and merge sort) will almost always be substantially faster.
Ignoring execution speed for a moment, selection sort does have one extremely good property: it's easy to understand, easy to code up correctly and easy to prove that it works. If you only need to sort a few items, and need to type in the sorting code yourself (especially if you're in a hurry) it's my experience that it's probably the easiest sorting algorithm to be certain you've implemented correctly.

Merging Two Sorted Arrays with O(log(n+m)) Worst Case

What kind of algorithm can I use to merge two sorted arrays into one sorted array with worst-case time complexity of O(log(m+n)) where n, m are the length of the arrays? I have very little experience with algorithms, but I checked out merge-sort and it seems that the time-complexity for the merging step is O(n). Is there a different approach to merge in O(log(n))?
Edit: I hadn't considered initially, but maybe it's not possible to merge two sorted arrays in O(log(n))? The actual goal is to find the median of two sorted arrays. Is there a way to do this without merging them?
The only idea I've had was I read that merging two binomial heaps is O(log(n)), but turning an array into a binomial heap is O(n) I think so that won't work.
Edit2: I'm going to post a new question because I've realized that merging will never work fast enough. I think instead I need to perform a binary search on each array to find the median in log(n).
I don't think there is an algorithm that would merge two arrays in O(log(n+m)) time.
And it makes sense when you think about it. If you're trying to create a new sorted array of n+m elements you will need to do at least n+m copies. There is no way getting around that.
I think the best way would be to iterate through each array simultaneously and, at each iteration, compare the values of both elements. If one is less than the other (if you want the array sorted in descending order), then copy that element to the array and increment your indexing pointer for that array and vice versa. If the two elements are the same, you can just add them both into the newly sorted array and increment both pointers.
Continue until one of the pointers has reached the end of its respective array and then copy in the rest of the other array once one has.
That should be O(m+n)
Regarding your edit, there is a way to find the median of two separate arrays in log(n + m) time.
You can first find the median of the two sorted arrays (the middle element) and compare them. If they are equal, then that is the median. If the first's median is greater than the second's you know the median has to be in either the first half of the first array or the second half of the second array and vice versa if the first's median is less than the second's.
This method cuts your search space in half each iteration and is thus log(n + m)
You're probably thinking of The Selection Algorithm.
For a sorted data structure, finding the median is O(1). For an unsorted data structure (or a data structure where the data is sorted into two logical partitions) the runtime is O(n).
You could probably pull it off with a massively parallel reduction algorithm, but I think that's cheating in Runtime Analysis terms.
So I don't believe there's an algorithm that reduces it below O(n) (or, in your case, O(n+m))
You need to merge the arrays. so, no matter what, you need to traverse the 2 arrays at least, so the complexity can't be less than o(m+n)

Fastest way to search and sort vectors

I'm doing a project in which i need to insert data into vectors sort it and search it ...
i need fastest possible algorithms for sort and search ... i've been searching and found out that std::sort is basically quicksort which is one of the fastest sorts but i cant figure out which search algorithm is the best ? binarysearch?? can u help me with it? tnx ... So i've got 3 methods:
void addToVector(Obj o)
{
fvector.push_back(o);
}
void sortVector()
{
sort(fvector.begin(), fvector().end());
}
Obj* search(string& bla)
{
//i would write binary search here
return binarysearch(..);
}
I've been searching and found out that std::sort is basically
quicksort.
Answer: Not quite. Most implementations use a hybrid algorithm like
introsort, which combines quick-sort, heap-sort and insertion sort.
Quick-sort is one of the fastest sorting methods.
Answer: Not quite. In general it holds (i.e., in the average case quick-sort is of complexity). However, quick-sort has quadratic worst-case performance (i.e., ). Furthermore, for a small number of inputs (e.g., if you have a std::vector with a small numbers of elements) sorting with quick-sort tends to achieve worst performance than other sorting algorithms that are considered "slower" (see chart below):
I can't figure out which searching algorithm is the best. Is it binary-search?
Answer: Binary search has the same average and worst case performance (i.e., ). Also have in mind that binary-search requires that the container should be arranged in ascending or descending order. However, whether is better than other searching methods (e.g., linear search which has time complexity) depends on a number of factors. Some of them are:
The number of elements/objects (see chart below).
The type of elements/objects.
Bottom Line:
Usually looking for the "fastest" algorithm denotes premature optimization and according to one of the "great ones" (Premature optimization is the root of all evil - Donald Knuth). The "fastest", as I hope it has been clearly shown, depends on quite a number of factors.
Use std::sort to sort your std::vector.
After sorting your std::vector use std::binary_search to find out whether a certain element exists in your std::vector or use std::lower_bound or std::upper_bound to find and get an element from your std::vector.
For amortised O(1) access times, use a [std::unordered_map], maybe using a custom hash for best effects.
Sorting seems to be unneccessary extra work.
Searching and Sorting efficiency is highly dependent on the type of data, the ordering of the raw data, and the quantity of the data.
For example, for small sorted data sets, a linear search may be faster than a binary search; or the time differences between the two is negligible.
Some sort algorithms will perform horribly on inversely ordered data, such a binary tree sort. Data that does not have much variation may cause a high degree of collisions on hash algorithms.
Perhaps you need to answer the bigger question: Is search or sorting the execution bottleneck in my program? Profile and find out.
If you need the fastest or the best sorting algorithm... There is no such one. At least it haven't been found yet. There are algorithms that provide better results for different data, there are algorithms that provide good results for most of data. You either need to analyze your data and find the best one for your case or use generic algo like std::sort and expect it to provide good results but not the best.
if your elements are of integer you should use bucket sort algorithm which run at O(N) time instead of O(nlogn) average case as with qsort
[http://en.wikipedia.org/wiki/Bucket_sort]
Sorting
In case you want to know about the fastest sorting technique for integer values in a vector then I would suggest you to refer the following link:
https://github.com/fenilgmehta/Fastest-Integer-Sort
It uses radix sort and counting sort for large arrays and merge sort along with insertion sort for small arrays.
According to statistics, this sorting algorithm is way faster than C++ std::sort for integral values.
It is 6 times faster than C++ STL std::sort for "int64_t array[10000000]"
Searching
If you want to know whether a particular value is present in the vector or not, then you should use binary_search(...)
If you want to know the exact location of an element, then use lower_bound(...) and upper_bound(...)

Can we know if a collection is almost sorted without applying a sort algorithm?

In the wikipedia article on sorting algorithms,
http://en.wikipedia.org/wiki/Sorting_algorithm#Summaries_of_popular_sorting_algorithms
under Bubble sort it says:Bubble sort can also be used efficiently on a list of any length that is nearly sorted (that is, the elements are not significantly out of place)
So my question is: Without sorting the list using a sorting algoithm first, how can one know if that is nearly sorted or not?
Are you familiar with the general sorting lower bound? You can prove that in a comparison-based sorting algorithm, any sorting algorithm must make Ω(n log n) comparisons in the average case. The way you prove this is through an information-theoretic argument. The basic idea is that there are n! possible permutations of the input array, and since the only way you can learn about which permutation you got is to make comparisons, you have to make at least lg n! comparisons in order to be certain that you know the structure of your input permutation.
I haven't worked out the math on this, but I suspect that you could make similar arguments to show that it's difficult to learn how sorted a particular array is. Essentially, if you don't do a large number of comparisons, then you wouldn't be able to tell apart an array that's mostly sorted from an array that is actually quite far from sorted. As a result, all the algorithms I'm aware of that measure "sortedness" take a decent amount of time to do so.
For example, one measure of the level of "sortedness" in an array is the number of inversions in that array. You can count the number of inversions in an array in time O(n log n) using a divide-and-conquer algorithm based on mergesort, but with that runtime you could just sort the array instead.
Typically, the way that you'd know that your array was mostly sorted was to know something a priori about how it was generated. For example, if you're looking at temperature data gathered from 8AM - 12PM, it's very likely that the data is already mostly sorted (modulo some variance in the quality of the sensor readings). If your data looks at a stock price over time, it's also likely to be mostly sorted unless the company has a really wonky trajectory. Some other algorithms also partially sort arrays; for example, it's not uncommon for quicksort implementations to stop sorting when the size of the array left to sort is small and to follow everything up with a final insertion sort pass, since every element won't be very far from its final position then.
I don't believe there exists any standardized measure of how sorted or random an array is.
You can come up with your own measure - like count the number of adjacent pairs which are out of order (suggested in comment), or count the number of larger numbers which occur before smaller numbers in the array (this is trickier than a simple single pass).

Sorting a list in Prolog

Prolog has a unique way of handling things, especially since practically every operation involves recursion of one sort or another.
One of the classic examples every language has is sorting a list of integers into ascending order.
What is an optimal way (without using too many built-in predicates, which precludes a sort/2 predicate, of course) to sort a random list of integers?
Roman Barták's Prolog Programming site gives examples of different sort algorithms, ending with an optimized quicksort.
quick_sort2(List,Sorted):-q_sort(List,[],Sorted).
q_sort([],Acc,Acc).
q_sort([H|T],Acc,Sorted):-
pivoting(H,T,L1,L2),
q_sort(L1,Acc,Sorted1),q_sort(L2,[H|Sorted1],Sorted)
As far as I know the best sorting algorithms written in Prolog directly, without reference to any special built-ins use some form of merge sort.
A frequent optimization is to start merging not with lists of length 1 but with already sorted segments.
That is, to sort the list [4,5,3,6,2,7,1,2], the lists [4,5],[3,6],[2,7],[1,2] would be merged.
This can be optimized even further by assembling sorted lists not only in ascending direction, but also in the other direction. For the example above this would mean that the sorted segment is assembled as follows:
[4,5|_]
[3,4,5|_]
[3,4,5,6|_]
...
Note that in Prolog it is straight forward to extend a list both in the beginning and at the end.
Thus, we have to merge [1,2,3,4,5,6,7] and [2] only.
A current system that uses the original implementation (~1984) of Richard O'Keefe is Ciao-Prolog in
ciao-1.15/lib/sort.pl.