Topological sorting of DAG in C++; data structure for storing the sorted list? - c++

I implemented a topological sorting of a directed acyclic graph in C++ that outputs a topological sorted list.
I see some implementations where the sorted list is stored in std::stack, and some in std::vector. I am not sure which or some other data structure would be the most appropriate for my case. Essentially, I just need to loop through this sorted list, and extract the elements stored from top to bottom. The list is not modified during the extraction process.
Using a std::vector seems to be a waste, time-complexity-wise, since I'll be reallocating the vector every time a new element is added to the list. This list can get pretty massive.
It seems that I cannot loop through std::stack according to How to traverse stack in C++?.
Is there another option that would be more appropriate for what I want to do?

Related

std::list sort algorithm runtime

I have a list of elements that is almost in the correct order and the elements are just off by a relatively small amount of places compared to their correct position (e.g. no element that is supposed to be in the front of the list is in the end).
< TL;DR >
Practical origin: I have an incoming stream of UDP-Packages that contain signals all marked with a certain timestamp. Evaluating the data has shown, that the packages have not been send (or received) in the right order, so that the timestamp is not constantly increasing but jittering a bit. To export the data I need to sort it in advance.
< /TL;DR >
I want to use std::list.sort() to sort this list.
What is the sorting algorithm used by std::list.sort() and how is it affected by the fact that the list is almost sorted. I have a "feeling", that a divide-and-conquer based algorithm might profit from it.
Is there a more efficient algorithm for my quite specific problem?
If every element is within k positions of its proper place, then insertion sort will take less than kN comparisons and swaps/moves. It's also very easy to implement.
Compare this to the N*log(N) operations required by merge sort or quick sort to see if that will work better for you.
It is not defined which algorithm uses, allthough it should be around N log N on average, such as quicksort.
If you are appending packets to the end of the "queue" as you consume them, so you want the queue always sorted, then you can expect any new packets correct position to nearly always be near the "end".
Therefore, rather than sorting the entire queue, just insert the packet into the correct position. Start at the back of the queue and compare the existing packets timestamp with those already there, insert it after the first packet with a smaller timestamp (likely to always be the end) or the front if there is no such packet in the event things are greatly out of order.
Alternatively, if you want to add all packets in order and then sort it, Bubble Sort should be fairly optimal because the list should still be nearly sorted already.
On mostly-sorted data Insertion Sort and Bubble Sort are among the most common ones that perform the best.
Live demo
Note also that having a list structure puts an extra constraint on indexed access, so algorithms that require indexed access will perform extra poorly. Therefore insertion sort is an even better fit since it needs only sequential access.
In the case of Visual Studio prior to 2015, a bottom up merge sorting using 26 internal lists is used, following the algorithm shown in this wiki article:
https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists
Visual Studio 2015 added support for not having a default allocator. The 26 internal lists initializers could have been expanded to 26 instances of initializers with user specified allocators, specifically: _Myt _Binlist[_MAXBINS+1] = { _Myt(get_allocator()), ... , _Myt(get_allocator())};, but instead someone at Microsoft switched to a top down merge sort based on iterators. This is slower, but has the advantage that it doesn't need special recovery if the compare throws an exception. The author of that change pointed out the fact that if performance is a goal, then it's faster to copy the list to an array or vector, sort the array or vector, then create a new list from the array or vector.
There's a prior thread about this.
`std::list<>::sort()` - why the sudden switch to top-down strategy?
In your case something like an insertion sort on a doubly linked list should be faster, if an out order node is found, remove the from the list, scan backwards or forwards to the proper spot and insert the node back into the list. If you want to used std::list, you can use iterators, erase, and emplace to "move" nodes, but that involves freeing and reallocating a node for each move. It would be faster to implement this with your own doubly linked list, in which case you can just manipulate the links, avoiding the freeing and reallocation of memory if using std::list.

What is the purpose of sorting a linked list?

I am wondering what is the purpose of sorting a linked list. Because if you need to find an element in an unsorted linked list and a sorted linked list, you have to do O(n).
Please forgive if my question is stupid
The purpose of sorting isn't always to search in logarithmic time. There are lots of other applications of sorted data obviously.
Suppose, you have to de-duplicate(remove the duplicate elements) from a large linked list and you don't have enough space to load the list items into hashtable as the list is very big. In this case, you can sort the list and remove consecutive elements if they are same and thus de-duplicate the list.
If you want to insert an element into it's appropriate position in a sorted container, sorted linked list is very handy which will guarantee linear time and constant space complexity. But for array, you need to use a temporary array and move all the elements afterwards one by one. Infact LRU cache is a doubly-linked list under the hood and keep sorted based on the recent hit on items. Newly used item and old item which is recently being accessed again, are inserted in front to keep the already sorted list sorted. If an array like structure would be used here, LRU cache can't offer of constant complexity
This is just some classic applications. You can find a lot of other applications.
Let us think a linked list is used to implement a priority queue. We can add elements of different priorities at random, but we want to process the elements of the queue according to priority, it would be useful to maintain a sorted linked list so that the top priority items appear at the beginning, and removing them from the queue is an easy operation. This not exactly sorting the list, but as and when an item is inserted, it would be placed in it's correct position based on the priority. This is similar to insertion sort of an array.

Best data structure/ container in C++ for insertion and deletion

I am looking for the best data structure for C++ in which insertion and deletion can take place very efficiently and fast.
Traversal should also be very easy for this data structure. Which one should i go with?
What about SET in C++??
A linked list provides efficient insertion and deletion of arbitrary elements. Deletion here is deletion by iterator, not by value. Traversal is quite fast.
A dequeue provides efficient insertion and deletion only at the ends, but those are faster than for a linked list, and traversal is faster as well.
A set only makes sense if you want to find elements by their value, e.g. to remove them. Otherwise the overhead of checking for duplicate as well as that of keeping things sorted will be wasted.
It depends on what you want to put into this data structure. If the items are unordered or you care about their order, list<> could be used. If you want them in a sorted order, set<> or multiset<> (the later allows multiple identical elements) could be an alternative.
list<> is typically a double-linked list, so insertion and deletion can be done in constant time, provided you know the position. traversal over all elements is also fast, but accessing a specified element (either by value or by position) could become slow.
set<> and its family are typically binary trees, so insertion, deletion and searching for elements are mostly in logarithmic time (when you know where to insert/delete, it's constant time). Traversal over all elements is also fast.
(Note: boost and C++11 both have data structures based on hash-tables, which could also be an option)
I would say a linked list depending on whether or not you're deletions are specific and often. Iterator about it.
It occurs to me, that you need a tree.
I'm not sure about the exact structure (since you didnt provide in-detail info), but if you can put your data into a binary tree, you can achieve decent speed at searching, deleting and inserting elements ( O(logn) average and O(n) worst case).
Note that I'm talking about the data structure here, you can implement it in different ways.

Queue-like data structure with random access element removal

Is there a data structure like a queue which also supports removal of elements at arbitrary points? Enqueueing and dequeueing occur most frequently, but mid-queue element removal must be similar in speed terms since there may be periods where that is the most common operation. Consistency of performance is more important than absolute speed. Time is more important than memory. Queue length is small, under 1,000 elements at absolute peak load.In case it's not obvious I'll state it explicitly: random insertion is not required.
Have tagged C++ since that is my implementation language, but I'm not using (and don't want to use) any STL or Boost. Pure C or C++ only (I will convert C solutions to a C++ class.)
Edit: I think what I want is a kind of dictionary that also has a queue interface (or a queue that also has a dictionary interface) so that I can do things like this:
Container.enqueue(myObjPtr1);
MyObj *myObjPtr2 = Container.dequeue();
Container.remove(myObjPtr3);
I think that double-link list is exactly what you want (assuming you do not want a priority queue):
Easy and fast adding elements to both ends
Easy and fast removal of elements from anywhere
You can use std::list container, but (in your case) it is difficult to remove an element
from the middle of the list if you only have a pointer (or reference) to the element (wrapped in STL's list element), but
you do not have an iterator. If using iterators (e.g. storing them) is not an option - then implementing a double linked list (even with element counter) should be pretty easy. If you implement your own list - you can directly operate on pointers to elements (each of them contains pointers to both of its neighbours). If you do not want to use Boost or STL this is probably the best option (and the simplest), and you have control of everything (you can even write your own block allocator for list elements to speed up things).
One option is to use an order statistic tree, an augmented tree structure that supports O(log n) random access to each element, along with O(log n) insertion and deletion at arbitrary points. Internally, the order statistic tree is implemented as a balanced binary search treewith extra information associated with it. As a result, lookups are a slower than in a standard dynamic array, but the insertions are much faster.
Hope this helps!
You can use a combination of a linked list and a hash table. In java it is called a LinkedHashSet.
The idea is simple, have a linked list of elements, and also maintain a hash map of (key,nodes), where node is a pointer to the relevant node in the linked list, and key is the key representing this node.
Note that the basic implementation is a set, and some extra work will be needed to make this data structure allow dupes.
This data structure allows you both O(1) head/tail access, and both O(1) access to any element in the list. [all on average armotorized]

Heap Sort a Linked List

I'm trying to create a sort function in c++ that sorts a linked list object using Heap sort but I'm not sure how to get started. Can anyone give me any idea on how to do it ? I'm not even sure how I would sort a Linked List
Heapsort works by building a heap out of the data. A heap is only efficient to build when you have random-access to each element.
The first step is going to be creating an array of pointers to your list objects, so you can perform the usual heap sort on the array.
The last step will be converting your array of pointers back into a linked list.
A better sorting method for a linked list is an insertion sort -- not least because you can perform the sort as part of your linked list implementation's insert() function.
I have to agree with sarnolds answer. It is extremely inefficient to heap set a linked list for a number of reasons but the first being that they should have been sorted upon initial placement. That said, if I were going to try I would create an ArrayList<T> links the load all the links into it. Then you can grab that in heaps and sort them. Once you're finished just reload your linked list starting with thr head.
HeapSort is good for 2 reasons -
1- It is an In place algorithm.
2- Time complexity of O(nlogn)
The O(nlogn) is because of random access nature of array, But if you use linked list then you would not get random access advantage of array.
Hence the time complexity will become O(n^2). That is not good for sorting.
I will recommend you to use merge sort algo for linked list.