I am confused about the number of operations that takes place when calling count(x) for some element x in a multiset of size n.
Am I correct that the number of operations is log(n) + #_of_matches_of_x, meaning logarithmic in the number of elements in the multiset, plus the number of matches of the target element x among all elements in the multiset?
Thanks for your time!
As the reference link has mentioned, the complexity of count is:
Logarithmic in the size of the container plus linear in the number of
the elements found.
The reason is that std::multiset is a tree-like data structure with a container at each tree node. So, for when calling std::multiset::count, you should first find the key in the tree O(log(All elements)) and then count the elements in that found node (O(found elements)).
This site clearly states that the complexity of multiset::count is
Logarithmic in size and linear in the number of matches.
Or you can check out this one.
Logarithmic in the size of the container plus linear in the number of the elements found.
Well, I pulled out an interesting article for you. (Link)
Related
Given an input stream of numbers ranging from 1 to 10^5 (non-repeating) we need to be able to tell at each point how many numbers smaller than this have been previously encountered.
I tried to use the set in C++ to maintain the elements already encountered and then taking upper_bound on the set for the current number. But upper_bound gives me the iterator of the element and then again I have to iterate through the set or use std::distance which is again linear in time.
Can I maintain some other data structure or follow some other algorithm in order to achieve this task more efficiently?
EDIT : Found an older question related to fenwick trees that is helpful here. Btw I have solved this problem now using segment trees taking hints from #doynax comment.
How to use Binary Indexed tree to count the number of elements that is smaller than the value at index?
Regardless of the container you are using, it is very good idea to enter them as sorted set so at any point we can just get the element index or iterator to know how many elements are before it.
You need to implement your own binary search tree algorithm. Each node should store two counters with total number of its child nodes.
Insertion to binary tree takes O(log n). During the insertion counters of all parents of that new element should be incremented O(log n).
Number of elements that are smaller than the new element can be derived from stored counters O(log n).
So, total running time O(n log n).
Keep your table sorted at each step. Use binary search. At each point, when you are searching for the number that was just given to you by the input stream, binary search is going to find either the next greatest number, or the next smallest one. Using the comparison, you can find the current input's index, and its index will be the numbers that are less than the current one. This algorithm takes O(n^2) time.
What if you used insertion sort to store each number into a linked list? Then you can count the number of elements less than the new one when finding where to put it in the list.
It depends on whether you want to use std or not. In certain situations, some parts of std are inefficient. (For example, std::vector can be considered inefficient in some cases due to the amount of dynamic allocation that occurs.) It's a case-by-case type of thing.
One possible solution here might be to use a skip list (relative of linked lists), as it is easier and more efficient to insert an element into a skip list than into an array.
You have to use the skip list approach, so you can use a binary search to insert each new element. (One cannot use binary search on a normal linked list.) If you're tracking the length with an accumulator, returning the number of larger elements would be as simple as length-index.
One more possible bonus to using this approach is that std::set.insert() is log(n) efficient already without a hint, so efficiency is already in question.
I need to use a data structure, implementable in C++, that can do basic operations, such as lookup, insertion and deletion, in constant time. I, however, also need to be able to find the maximum value in constant time.
This data structure should probably be sorted to find the maximum values and I have looked into red-black trees, however they have logarithmic-time operations.
I would propose
You could use a hash table which gives O(1) expected time
Regarding the maximum, you could store it in attribute and be aware at each insertion if the maximum changes. With the deletion is some more complicated because if the maximum is deleted, then you must perform a linear search, but this only would happen if the maximum is deleted. Any other element could be deleted in O(1) expected time
Yes I agree with Irleon.You can use a hash table to perform these operations.Let us analyze this step by step:
1.If we take arrays,the time complexity of insertion will be O(1) at the end.
2.Take linked lists and it will be O(n) due to the traversal that you need to do.
3.Take binary search trees and it will be O(logn) where logn is the height of the tree.
4.Now we can use hash tables.We know that it works on keys and values.So,here the key will be 'number_to_be_inserted % n' where 'n' is the number of elements we have.
But as the list grows on the same index,you will be needing to traverse the list.So it will O(numbers_at_that_index).
Same will be the case in deletion operation.
Ofcourse there are other cases to consider in case of collisions ,but we can ignore that for now and we will get our basic hash table.
If you could do such a thing, then you could sort in linear time: simply insert all of your items, then, do the following n times:
Find maximum
Print maximum
Delete maximum
Therefore, in a model of computation in which you can't sort in linear time, you also can't solve your problem with all operations in O(1) time.
I mean to find the kth smallest actual frequency in a Fenwick-Tree in O(k log(n)) time.
If my data is:
Tree = [1,3,1,10,3]
Actual frequency = [1,2,1,6,3]
So the second smallest element would be at index 1.
You need the kth smallest actual frequency, which I think cannot be determined without sorting the actual frequencies. If the only thing you have is the Fenwick tree, then you can calculate the sequence of actual frequencies in O(n*log(n)) time (since you can calculate every single actual frequency in O(log(n)) (see here), and you have n frequencies). Sorting the sequence of actual frequencies by quicksort takes O(n*log(n)), and finding the kth element of the sorted sequence takes O(n) (there may be entries with the same actual frequency, so you cannot do this in O(1); but you can use linear search). So the whole search can be done in O(n*log(n)).
Hope this helps. I don't have any idea how this could be done in O(k*log(n)).
Well i thought of a possible solution,
while(start<=end){
int mid=(start+end)>>1;
if(read(mid)>=k)end=mid-1; // read(mid) returns the cummulative frequency.
else start=mid+1;
}
start has to be the answer.
I have a list that contains n double values and I need to find the k lowest double values in that list
k is much smaller than n
the initial list with the n double values is randomly ordered
the found k lowest double values are not required to be sorted
What algorithm would you recommend?
At the moment I use Quicksort to sort the whole list, and then I take the first k elements out of the sorted list. I expect there should be a much faster algorithm.
Thank you for your help!!!
You could model your solution to match the nlargest() code in Python's standard library.
Heapify the first k values on a maxheap.
Iterate over the remaining n - k values.
Compare each to the element of the top of the heap.
If the new value is lower, do a heapreplace operation (which replaces the topmost heap element with the new value and then sifts it downward).
The algorithm can be surprisingly efficient. For example, when n=100,000 and k=100, the number of comparisons is typically around 106,000 for randomly arranged inputs. This is only slightly more than 100,000 comparisons to find a single minimum value. And, it does about twenty times fewer comparisons than a full quicksort on the whole dataset.
The relative strength of various algorithms is studied and summarized at: http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest
You can use selection algorithm to find the kth lowest element and then iterate and return it and all elements that are lower then it. More work has to be done if the list can contain duplicates (making sure you don't end up with more elements that you need).
This solution is O(n).
Selection algorithm is implemented in C++ as nth_element()
Another alternative is to use a max heap of size k, and iterate the elements while maintaining the heap to hold all k smallest elements.
for each element x:
if (heap.size() < k):
heap.add(x)
else if x < heap.max():
heap.pop()
heap.add(x)
When you are done - the heap contains k smallest elements.
This solution is O(nlogk)
Take a look at partial_sort algorithm from C++ standard library.
You can use std::nth_element. This is O(N) complexity because it doesn't sort the elements, it just arranges them such that every element under a certain N is less than N.
you can use selection sort, it takes O(n) to select first lowest value. Once we have set this lowest value on position 1 we can rescan the data set to find out second lowest value. and can do it until we have kth lowest value. in this way if k is enough smaller then n then we will have complexity kn which is equivalent to O(n)...
The naive one is O(n). Is there a one that is O(log n) or even O(1)?
How about a sorted array? How about using binary search tree?
How about my array has a size n = [2 ^(h + 1)] − 1 ? // h=height of a complete binary tree
Unsorted
If the array is not sorted, then you can do no better than O(n). Proof: suppose you didn't look at every single element of the array, then an adversary could just make one of the elements that you didn't look at larger or smaller than the given number to make your count incorrect. So, better than O(n) is not possible.
Sorted
If the array is sorted, then you can determine the result in O(log n) time by locating the first element that is greater than or equal to the given number, and then simply subtracting that index from the size of the array.
With unsorted, you can't do better than O(n). Final.
With sorted, you can do in worst case O(log(n)) with binary search. Now you can improve upon this assuming the array layout has either decent entropy or is (mostly) linear by aiming at expected point as if the layout was purely linear.
For example, take a sorted array a[n] with a[0]=x, a[n]=y, and your threshold v.
Instead of bisecting the array at n/2, test element of a[n*(v-x)/(y-x)]
With regular layout (a[i] = const1*i+const2) you get the result in O(1), one hit +- rounding error, so at worst 2. With "white noise" random layout (all values equally probable), you get it still much faster than O(log(n)).