Is anyone aware of any lock-free skiplist implimentations and/or research papers that support the rank operation (i.e. find kth element)? Alternatively, is anyone aware of a fundamental reason why such an operation could never work?
Bonus points:
An implimentation that does not assume garbage collection. It has been my experience quite a few research papers ignore memory management.
Support:
For a description of how the rank operation may be done in a regular skiplist: "A Skip List Cookbook" by William Pugh
For one of the better lock-free skiplist descriptions: "Practical lock-freedom" by Keir Fraser
One of the better lock-free skiplist implimentations: http://www.1024cores.net/home/parallel-computing/concurrent-skip-list
Unlike key, value and level, which are local properties of an element in a skiplist and are not affected by operations with other elements, rank of an element, i.e. its position in the list, is a property that changes with insertion and removal of elements with lower rank. Therefore, for a concurrent skiplist, rank of an element is very ephemeral because it may be changed in any moment by a concurrently running operation that inserts or delets a lower rank element.
For example, suppose you do linear search of k-th element through the level 1 list. At the moment you did k steps, concurrent modifications might add or remove any number of prior elements, so the current rank of the element you just find is in fact unknown. Moreover, while the thread is performing the k forward iterations, concurrent changes can be done between its current position and the element that was k-th when it started the search. So what the search ends up with is neither the element with the current rank of k nor the one that has the rank of k when the search started. It's just some random element!
To say it short, for concurrent list, rank of an element is an ill-defined concept and searching by rank is an ill-defined operation. This is the fundamental reason why it could never work, and why implementers should never bother with providing such operation.
Related
Given an input stream of numbers ranging from 1 to 10^5 (non-repeating) we need to be able to tell at each point how many numbers smaller than this have been previously encountered.
I tried to use the set in C++ to maintain the elements already encountered and then taking upper_bound on the set for the current number. But upper_bound gives me the iterator of the element and then again I have to iterate through the set or use std::distance which is again linear in time.
Can I maintain some other data structure or follow some other algorithm in order to achieve this task more efficiently?
EDIT : Found an older question related to fenwick trees that is helpful here. Btw I have solved this problem now using segment trees taking hints from #doynax comment.
How to use Binary Indexed tree to count the number of elements that is smaller than the value at index?
Regardless of the container you are using, it is very good idea to enter them as sorted set so at any point we can just get the element index or iterator to know how many elements are before it.
You need to implement your own binary search tree algorithm. Each node should store two counters with total number of its child nodes.
Insertion to binary tree takes O(log n). During the insertion counters of all parents of that new element should be incremented O(log n).
Number of elements that are smaller than the new element can be derived from stored counters O(log n).
So, total running time O(n log n).
Keep your table sorted at each step. Use binary search. At each point, when you are searching for the number that was just given to you by the input stream, binary search is going to find either the next greatest number, or the next smallest one. Using the comparison, you can find the current input's index, and its index will be the numbers that are less than the current one. This algorithm takes O(n^2) time.
What if you used insertion sort to store each number into a linked list? Then you can count the number of elements less than the new one when finding where to put it in the list.
It depends on whether you want to use std or not. In certain situations, some parts of std are inefficient. (For example, std::vector can be considered inefficient in some cases due to the amount of dynamic allocation that occurs.) It's a case-by-case type of thing.
One possible solution here might be to use a skip list (relative of linked lists), as it is easier and more efficient to insert an element into a skip list than into an array.
You have to use the skip list approach, so you can use a binary search to insert each new element. (One cannot use binary search on a normal linked list.) If you're tracking the length with an accumulator, returning the number of larger elements would be as simple as length-index.
One more possible bonus to using this approach is that std::set.insert() is log(n) efficient already without a hint, so efficiency is already in question.
I am currently creating a source code to combine two heaps that satisfy the min heap property with the shape invariant of a complete binary tree. However, I'm not sure if what I'm doing is the correct accepted method of merging two heaps satisfying the requirements I laid out.
Here is what I think:
Given two priority queues represented as min heaps, I insert the nodes of the second tree one by one into the first tree and fix the heap property. Then I continue this until all of the nodes in the second tree is in the first tree.
From what I see, this feels like a nlogn algorithm since I have to go through all the elements in the second tree and for every insert it takes about logn time because the height of a complete binary tree is at most logn.. But I think there is a faster way, however I'm not sure what other possible method there is.
I was thinking that I could just insert the entire tree in, but that break the shape invariant and order invariant..Is my method the only way?
In fact building a heap is possible in linear time and standard function std::make_heap guarantees linear time. The method is explained in Wikipedia article about binary heap.
This means that you can simply merge heaps by calling std::make_heap on range containing elements from both heaps. This is asymptotically optimal if heaps are of similar size. There might be a way to exploit preexisting structure to reduce constant factor, but I find it not likely.
I have a rather large set of objects that represent numbers and I want to select such numbers according to a custom ordering. This ordering includes several criteria such as the type of their representation (some numbers are represented by an interval), their integrality and ultimatively their value. These numbers are shared throughout the programs (shared pointers) and there is nothing I can do about this.
However, the elements properties can change at any time such that the order changes while I can't notify the container about this. For example, some operations require a refinement of a number that is represented by an interval and during this refinement, the exact value can be found. Thereby, the number changes from the interval representation to a rational number, possibly even an integer. This change, due to the shared instance, immediately propagates to the number in the container and breaks the ordering (and I don't even know which number changed). This totally breaks std::set.
So what I'd like to have is a container that tries to be sorted, but does not rely on this. Whenever an operation detects an incorrect ordering, this ordering should be corrected locally. For example insert would insert the element (using binary search) and always check if the ordering of the current element (w.r.t. the neighbors) is correct.
I'd be willing to accept that "give me the smallest element" would then be only "give me a small element" and that find or remove would have linear complexity: I only need front, insert and remove_front to be particularly efficient.
Is there any implementation that does something like this?
How would you implement this?
If you are looking for an algorithm in the standard library, you should take a look at:
std::make_heap
std::pop_heap
std::push_heap
In <algorithm>. They might fit your need, and even if they don't I'm quite sure you will find what you are looking for in some kind of heap structure. Which one will probably depend on how your code is structured, and how often you expect a value to change etc.
In short:
A heap is a data structure in which it is fast to find and extract the smallest (or largest) element. It is also for most heaps possible to create restructure the heap in linear time or better. You could start out from this page on Wikipedia if you want to learn more about heaps.
I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.
Consider a sequence of n positive real numbers, (ai), and its partial sum sequence, (si). Given a number x ∊ (0, sn], we have to find i such that si−1 < x ≤ si. Also we want to be able to change one of the ai’s without having to update all partial sums. Both can be done in O(log n) time by using a binary tree with the ai’s as leaf node values, and the values of the non-leaf nodes being the sum of the values of the respective children. If n is known and fixed, the tree doesn’t have to be self-balancing and can be stored efficiently in a linear array. Furthermore, if n is a power of two, only 2 n − 1 array elements are required. See Blue et al., Phys. Rev. E 51 (1995), pp. R867–R868 for an application. Given the genericity of the problem and the simplicity of the solution, I wonder whether this data structure has a specific name and whether there are existing implementations (preferably in C++). I’ve already implemented it myself, but writing data structures from scratch always seems like reinventing the wheel to me—I’d be surprised if nobody had done it before.
This is known as a finger tree in functional programming but apparently there are implementations in imperative languages. In the articles there is a link to a blog post explaining an implementation of this data structure in C# which could be useful to you.
Fenwick tree (aka Binary indexed tree) is a data structure that maintains a sequence of elements, and is able to compute cumulative sum of any range of consecutive elements in O(logn) time. Changing value of any single element needs O(logn) time as well.