Sorting a list in Prolog - list

Prolog has a unique way of handling things, especially since practically every operation involves recursion of one sort or another.
One of the classic examples every language has is sorting a list of integers into ascending order.
What is an optimal way (without using too many built-in predicates, which precludes a sort/2 predicate, of course) to sort a random list of integers?

Roman Barták's Prolog Programming site gives examples of different sort algorithms, ending with an optimized quicksort.
quick_sort2(List,Sorted):-q_sort(List,[],Sorted).
q_sort([],Acc,Acc).
q_sort([H|T],Acc,Sorted):-
pivoting(H,T,L1,L2),
q_sort(L1,Acc,Sorted1),q_sort(L2,[H|Sorted1],Sorted)

As far as I know the best sorting algorithms written in Prolog directly, without reference to any special built-ins use some form of merge sort.
A frequent optimization is to start merging not with lists of length 1 but with already sorted segments.
That is, to sort the list [4,5,3,6,2,7,1,2], the lists [4,5],[3,6],[2,7],[1,2] would be merged.
This can be optimized even further by assembling sorted lists not only in ascending direction, but also in the other direction. For the example above this would mean that the sorted segment is assembled as follows:
[4,5|_]
[3,4,5|_]
[3,4,5,6|_]
...
Note that in Prolog it is straight forward to extend a list both in the beginning and at the end.
Thus, we have to merge [1,2,3,4,5,6,7] and [2] only.
A current system that uses the original implementation (~1984) of Richard O'Keefe is Ciao-Prolog in
ciao-1.15/lib/sort.pl.

Related

Benefit of printing values from an array in ascending order by selecting?

I read the tutorial regarding arrange a number of array in ascending order and understood the idea https://www.includehelp.com/cpp-programs/sort-an-array-in-ascending-order.aspx . However, now I'm thinking of other way to perform the operation. Wonder will the idea below works?
The method will be using while loop and check (while remaining number in array not equal to 0), find the smallest number in the array, print out the number and remove it from array. Repeat the same process until remaining number in array = 0. So my numbers will be print out in ascending order also and the number in the array will decrease in each loop until it reached zero.
I started learning programming just few weeks ago and have trouble writing out the code now. However I'm interested to know if this method will work? If cannot, please explain why.
What you've described is a variant of what's normally called a "selection sort". It's pretty well known. It does work, but there are many sorting algorithms that work--and while there are a few sorting algorithms that are generally less efficient, it's still one of the least efficient around.
Selection sort is typically faster than Bubble sort and a few of its variants like Shaker sort. Depending on the precise situation, it can also be faster than insertion sort, though that's pretty unusual. Those three (bubble sort, insertion sort, and selection sort) are the best known of the simple sorting algorithms. Of the three, bubble sort is most often the slowest, and insertion sort most often the fastest. But all three take time proportional to the square of the number of items being sorted, which means they get much slower in a hurry as you try to sort more items. If you have very many items, more advanced algorithms (e.g., Shell-Metzner, Quicksort, heap sort and merge sort) will almost always be substantially faster.
Ignoring execution speed for a moment, selection sort does have one extremely good property: it's easy to understand, easy to code up correctly and easy to prove that it works. If you only need to sort a few items, and need to type in the sorting code yourself (especially if you're in a hurry) it's my experience that it's probably the easiest sorting algorithm to be certain you've implemented correctly.

SML/NJ - Effective way or data structure to access from end towards the start

I am making a program, and an algorithm I have thought to use requires a cheap way of accessing a list backwards to be effective. Is there an effective way to access a list from the last element forward? Or, because I think that might be impossible due to the structure of SML lists, is there an effective data structure to achieve it?
The length of data is unknown before executing, and there is no need for other than serial traversing of the data.
I think you want a functional deque. See e.g. Okasaki's paper on the subject. Specifically, Figure 5 shows an implementation of deques.
If using a functional deque seems like overkill and you need to traverse the list in reverse order just once, then solutions that e.g. use List.last and List.take to emulate hd and tl but in reverse order are, as you seem to know, bad because they would make the list traversal quadratic. On the other hand, the built in function rev is very efficient since it is both tail-recursive and linear. If you feed a list to a function that needs to traverse that list in reverse order, an easy solution is to use a let binding using rev to create a local copy of the list in reverse order and then traverse the reversed list in the usual way.

Binary Merge sort & Natural Merge sort

I know that homework questions are not the most popular on here, but I am at a total loss. I am doing an assignment which requires us to make multiple sorting algorithms. One of them however, is driving me insane. I can find no examples of it online anywhere, and he did not go over it fully in class. We have to make a merge sort that looks like this:
void mergeSort(int * a, int s, bool n = false)
Where a is the array, s is the size of said array, and n is false for binary merge sort, and true for natural merge sort. The problem is, I cant find what natural merge sort and binary merge sort are. I just find mergesort. And all of them ask for far more variables.
I am simply asking if anyone knows where I can find a good explanation of those two different types of mergesort.
I'm no expert on the topic, but the wikipedia page seems to be a good starting point
http://en.wikipedia.org/wiki/Merge_sort
It contains a section on natural merge sort with an example.
About binary merge sort:
A variant named binary merge sort uses a binary insertion sort to sort
groups of 32 elements, followed by a final sort using merge sort. It
combines the speed of insertion sort on small data sets with the speed
of merge sort on large data sets
And insertion sort may be read about here: http://en.wikipedia.org/wiki/Insertion_sort
which contains a selection on binary insertion sorting.
About the variables. The wikipedia example of 'bottom up merge sort' (of which natural merge sort is a variant) has this signature:
void BottomUpSort(A[], B[], n)
where A is the array to be sorted, n its length. B is a work array, and if a read the algoritm right it needs be of length n too. Anyway, it can be created in the beginning of the algoritm and deleted in the end.

How to improve linked list searching. C++

I have simple method in C++ which searchs for string in linked list. That works well but I need to make it faster. Is it possible? Maybe I need to insert items into list in alphabetical order? But I dont think it could help in serching list anymore. In list there is about 300 000 items (words).
int GetItemPosition(const char* stringToFind)
{
int i = 0;
MyList* Tmp = FistListItem;
while (Tmp){
if (!strcmp(Tmp->Value, stringToFind))
{
return i;
}
Tmp = Tmp->NextItem;
i++;
}
return -1;
}
Method returns the position number if item found, otherwise returns -1.
Any sugesstion will be helpfull.
Thanks for answers, I can change structure. I have only one constraint. Code must implement the following interface:
int Count(void);
int AddItem(const char* StringValue, int WordOccurrence);
int GetItemPosition(const char* StringValue);
char* GetString(int Index);
int GetOccurrenceNum(int Index);
void SetInteger(int Index, int WordOccurrence);
So which structure will be the in your opinion the most suitable?
Searching a linked list is linear so you need to iterate from beginning one by one so it is O(n). Linked lists are not the best if you will use it for searching, you can utilize more suitable data structures such as binary trees.
Ordering elements does not help much because still you need to iterate each element anyway.
Wikipedia article says:
In an unordered list, one simple heuristic for decreasing average search time is the move-to-front heuristic, which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again.
Another common approach is to "index" a linked list using a more
efficient external data structure. For example, one can build a
red-black tree or hash table whose elements are references to the
linked list nodes. Multiple such indexes can be built on a single
list. The disadvantage is that these indexes may need to be updated
each time a node is added or removed (or at least, before that index
is used again).
So in the first case you can slightly improve (by statistical assumptions) your search performance by moving items found previously closer to the beginning of the list. This assumes that previously found elements will be searched more frequently.
Second method requires to use other data structures.
If using linked lists is not a hard requirement, consider using hash tables, sorted arrays (random access) or balanced trees.
Consider using array or std::vector as a storage instead of linked list, and use binary search to find particular string, or even better, std::set, if you don't need a numerical index. If for some reasons it is not possible to use other containers, there is not much possible to do - you may want to speed up the process of comparison by storing hash of the string along with it in node.
I suggest hashing.
Since you've already got a linked list of your own), you can try chaining with linked lists for collision resolution.
Rather than using a linear linked list, you may want to use a binary search tree, or a red/black tree. These trees are designed on minimizing the traversals to find an item.
You could also store "short cut links". For example, if the list is of strings, you could have an array of links of where to start searching based on the first letter.
For example, shortcut['B'] would return a pointer to the first link to start searching for strings starting with 'B'.
The answer is no, you cannot improve the search without changing your data-structure.
As it stands, sorting the list will not give you a faster search for any random item.
It will only allow you to quickly decide if the given item is in the list by testing against the first item (which will be either the smallest or the largest entry) and this improvement is not likely to make a big difference.
So can you please edit your question and explain to us your constraints?
Can you use a completely different data structure, like an array or a tree? (as others have suggested)
If not, can you modify the way your linked list is linked?
If not, we will be unlikely to help you...
The best option is to use faster data structure for storing strings:
std::map - red-black tree behind the scenes. Has O(logn) for search/insert/delete operations. Suitable if you want to store additional values with strings (for example - positions).
std::set - basically the same tree but without values. Best for case when you need only contains operation.
std::unordered_map - hash table. O(1) access.
std::unordered_set - hash set. Also O(1) access.
Note. But in all of these cases there is a catch. Complexity is calculated only based on n (count of strings). In reality string comparison is not free. So, O(1) becomes O(m), O(logn) becomes O(mlogn) (where m is maximal length of string). This does not matter in case of relatively short strings. But if this is not true consider using Trie. In practice trie can be even faster than hash table - each character of query string is accessed only once instead of multiple times. For hash table/set it's at least once for hash calculation and at least once for actual string comparison (depending on collision resolution strategy - not sure how it is implemented in C++).

Can we know if a collection is almost sorted without applying a sort algorithm?

In the wikipedia article on sorting algorithms,
http://en.wikipedia.org/wiki/Sorting_algorithm#Summaries_of_popular_sorting_algorithms
under Bubble sort it says:Bubble sort can also be used efficiently on a list of any length that is nearly sorted (that is, the elements are not significantly out of place)
So my question is: Without sorting the list using a sorting algoithm first, how can one know if that is nearly sorted or not?
Are you familiar with the general sorting lower bound? You can prove that in a comparison-based sorting algorithm, any sorting algorithm must make Ω(n log n) comparisons in the average case. The way you prove this is through an information-theoretic argument. The basic idea is that there are n! possible permutations of the input array, and since the only way you can learn about which permutation you got is to make comparisons, you have to make at least lg n! comparisons in order to be certain that you know the structure of your input permutation.
I haven't worked out the math on this, but I suspect that you could make similar arguments to show that it's difficult to learn how sorted a particular array is. Essentially, if you don't do a large number of comparisons, then you wouldn't be able to tell apart an array that's mostly sorted from an array that is actually quite far from sorted. As a result, all the algorithms I'm aware of that measure "sortedness" take a decent amount of time to do so.
For example, one measure of the level of "sortedness" in an array is the number of inversions in that array. You can count the number of inversions in an array in time O(n log n) using a divide-and-conquer algorithm based on mergesort, but with that runtime you could just sort the array instead.
Typically, the way that you'd know that your array was mostly sorted was to know something a priori about how it was generated. For example, if you're looking at temperature data gathered from 8AM - 12PM, it's very likely that the data is already mostly sorted (modulo some variance in the quality of the sensor readings). If your data looks at a stock price over time, it's also likely to be mostly sorted unless the company has a really wonky trajectory. Some other algorithms also partially sort arrays; for example, it's not uncommon for quicksort implementations to stop sorting when the size of the array left to sort is small and to follow everything up with a final insertion sort pass, since every element won't be very far from its final position then.
I don't believe there exists any standardized measure of how sorted or random an array is.
You can come up with your own measure - like count the number of adjacent pairs which are out of order (suggested in comment), or count the number of larger numbers which occur before smaller numbers in the array (this is trickier than a simple single pass).