Given an input stream of numbers ranging from 1 to 10^5 (non-repeating) we need to be able to tell at each point how many numbers smaller than this have been previously encountered.
I tried to use the set in C++ to maintain the elements already encountered and then taking upper_bound on the set for the current number. But upper_bound gives me the iterator of the element and then again I have to iterate through the set or use std::distance which is again linear in time.
Can I maintain some other data structure or follow some other algorithm in order to achieve this task more efficiently?
EDIT : Found an older question related to fenwick trees that is helpful here. Btw I have solved this problem now using segment trees taking hints from #doynax comment.
How to use Binary Indexed tree to count the number of elements that is smaller than the value at index?
Regardless of the container you are using, it is very good idea to enter them as sorted set so at any point we can just get the element index or iterator to know how many elements are before it.
You need to implement your own binary search tree algorithm. Each node should store two counters with total number of its child nodes.
Insertion to binary tree takes O(log n). During the insertion counters of all parents of that new element should be incremented O(log n).
Number of elements that are smaller than the new element can be derived from stored counters O(log n).
So, total running time O(n log n).
Keep your table sorted at each step. Use binary search. At each point, when you are searching for the number that was just given to you by the input stream, binary search is going to find either the next greatest number, or the next smallest one. Using the comparison, you can find the current input's index, and its index will be the numbers that are less than the current one. This algorithm takes O(n^2) time.
What if you used insertion sort to store each number into a linked list? Then you can count the number of elements less than the new one when finding where to put it in the list.
It depends on whether you want to use std or not. In certain situations, some parts of std are inefficient. (For example, std::vector can be considered inefficient in some cases due to the amount of dynamic allocation that occurs.) It's a case-by-case type of thing.
One possible solution here might be to use a skip list (relative of linked lists), as it is easier and more efficient to insert an element into a skip list than into an array.
You have to use the skip list approach, so you can use a binary search to insert each new element. (One cannot use binary search on a normal linked list.) If you're tracking the length with an accumulator, returning the number of larger elements would be as simple as length-index.
One more possible bonus to using this approach is that std::set.insert() is log(n) efficient already without a hint, so efficiency is already in question.
Related
I want to find string has a specific short string array or linked list. I make a small program that search conference or workshop like http://dblp.uni-trier.de/ using c++. What I wonder is how to fast search string in an array or linked list. When use string.find() function, I think this function's performance have O(n) time complexity if array's length is n. Can I improve performance lower than O(n)?? Help me, please
For an array, unless it is sorted, the best you can do is O(n) average/worst case because you have to look linearly until you find the desired string. If it is sorted (which would take O(nlog(n)) to do the sorting), you can make it O(log(n)) searching using a binary search. For linked lists, the best you can do, regardless of sorted-ness, is O(n).
If you really want to complicate your code, at each insert to list store pointer to node in some balanced tree.Where nodes will be inserted based on string from that node comparisions. Then you can get string in O(logn) time.
If you want to get fast retrievals use hash map it will give you O(1) Time.
I need to use a data structure, implementable in C++, that can do basic operations, such as lookup, insertion and deletion, in constant time. I, however, also need to be able to find the maximum value in constant time.
This data structure should probably be sorted to find the maximum values and I have looked into red-black trees, however they have logarithmic-time operations.
I would propose
You could use a hash table which gives O(1) expected time
Regarding the maximum, you could store it in attribute and be aware at each insertion if the maximum changes. With the deletion is some more complicated because if the maximum is deleted, then you must perform a linear search, but this only would happen if the maximum is deleted. Any other element could be deleted in O(1) expected time
Yes I agree with Irleon.You can use a hash table to perform these operations.Let us analyze this step by step:
1.If we take arrays,the time complexity of insertion will be O(1) at the end.
2.Take linked lists and it will be O(n) due to the traversal that you need to do.
3.Take binary search trees and it will be O(logn) where logn is the height of the tree.
4.Now we can use hash tables.We know that it works on keys and values.So,here the key will be 'number_to_be_inserted % n' where 'n' is the number of elements we have.
But as the list grows on the same index,you will be needing to traverse the list.So it will O(numbers_at_that_index).
Same will be the case in deletion operation.
Ofcourse there are other cases to consider in case of collisions ,but we can ignore that for now and we will get our basic hash table.
If you could do such a thing, then you could sort in linear time: simply insert all of your items, then, do the following n times:
Find maximum
Print maximum
Delete maximum
Therefore, in a model of computation in which you can't sort in linear time, you also can't solve your problem with all operations in O(1) time.
I am faced with an application where I have to design a container that has random access (or at least better than O(n)) has inexpensive (O(1)) insert and removal, and stores the data according to the order (rank) specified at insertion.
For example if I have the following array:
[2, 9, 10, 3, 4, 6]
I can call the remove on index 2 to remove 10 and I can also call the insert on index 1 by inserting 13.
After those two operations I would have:
[2, 13, 9, 3, 4, 6]
The numbers are stored in a sequence and insert/remove operations require an index parameter to specify where the number should be inserted or which number should be removed.
My question is, what kind of data structures, besides a Linked List and a vector, could maintain something like this? I am leaning towards a Heap that prioritizes on the next available index. But I have been seeing something about a Fusion Tree being useful (but more in a theoretical sense).
What kind of Data structures would give me the most optimal running time while still keeping memory consumption down? I have been playing around with an insertion order preserving hash table, but it has been unsuccessful so far.
The reason I am tossing out using a std:: vector straight up is because I must construct something that out preforms a vector in terms of these basic operations. The size of the container has the potential to grow to hundreds of thousands of elements, so committing to shifts in a std::vector is out of the question. The same problem lines with a Linked List (even if doubly Linked), traversing it to a given index would take in the worst case O (n/2), which is rounded to O (n).
I was thinking of a doubling linked list that contained a Head, Tail, and Middle pointer, but I felt that it wouldn't be much better.
In a basic usage, to be able to insert and delete at arbitrary position, you can use linked lists. They allow for O(1) insert/remove, but only provided that you have already located the position in the list where to insert. You can insert "after a given element" (that is, given a pointer to an element), but you can not as efficiently insert "at given index".
To be able to insert and remove an element given its index, you will need a more advanced data structure. There exist at least two such structures that I am aware of.
One is a rope structure, which is available in some C++ extensions (SGI STL, or in GCC via #include <ext/rope>). It allows for O(log N) insert/remove at arbitrary position.
Another structure allowing for O(log N) insert/remove is a implicit treap (aka implicit cartesian tree), you can find some information at http://codeforces.com/blog/entry/3767, Treap with implicit keys or https://codereview.stackexchange.com/questions/70456/treap-with-implicit-keys.
Implicit treap can also be modified to allow to find minimal value in it (and also to support much more operations). Not sure whether rope can handle this.
UPD: In fact, I guess that you can adapt any O(log N) binary search tree (such as AVL or red-black tree) for your request by converting it to "implicit key" scheme. A general outline is as follows.
Imagine a binary search tree which, at each given moment, stores the consequitive numbers 1, 2, ..., N as its keys (N being the number of nodes in the tree). Every time we change the tree (insert or remove the node) we recalculate all the stored keys so that they are still from 1 to the new value of N. This will allow insert/remove at arbitrary position, as the key is now the position, but it will require too much time for all keys update.
To avoid this, we will not store keys in the tree explicitly. Instead, for each node, we will store the number of nodes in its subtree. As a result, any time we go from the tree root down, we can keep track of the index (position) of current node — we just need to sum the sizes of subtrees that we have to our left. This allows us, given k, locate the node that has index k (that is, which is the k-th in the standard order of binary search tree), on O(log N) time. After this, we can perform insert or delete at this position using standard binary tree procedure; we will just need to update the subtree sizes of all the nodes changed during the update, but this is easily done in O(1) time per each node changed, so the total insert or remove time will be O(log N) as in original binary search tree.
So this approach allows to insert/remove/access nodes at given position in O(log N) time using any O(log N) binary search tree as a basis. You can of course store the additional information ("values") you need in the nodes, and you can even be able to calculate the minimum of these values in the tree just by keeping the minimum value of each node's subtree.
However, the aforementioned treap and rope are more advanced as they allow also for split and merge operations (taking a substring/subarray and concatenating two strings/arrays).
Consider a skip list, which can implement linear time rank operations in its "indexable" variation.
For algorithms (pseudocode), see A Skip List Cookbook, by Pugh.
It may be that the "implicit key" binary search tree method outlined by #Petr above is easier to get to, and may even perform better.
I have a dynamically allocated array containing structs with a key pair value. I need to write an update(key,value) function that puts new struct into array or if struct with same key is already in the array it needs to update its value. Insert and Update is combined in one function.
The problem is:
Before adding a struct I need to check if struct with this key already existing.
I can go through all elements of array and compare key (very slow)
Or I can use binary search, but (!) array must be sorted.
So I tried to sort array with each update (sloooow) or to sort it when calling binary search funtion.....which is each time updating
Finally, I thought that there must be a way of inserting a struct into array so it would be placed in a right place and be always sorted.
However, I couldn't think of an algorithm like that so I came here to ask for some help because google refuses to read my mind.
I need to make my code faster because my array accepts more that 50 000 structs and I'm using bubble sort (because I'm dumb).
Take a look at Red Black Trees: http://en.wikipedia.org/wiki/Red%E2%80%93black_tree
They will ensure the data is always sorted, and it has a complexity of O ( log n ) for inserts.
A binary heap will not suffice, as a binary heap does not have guaranteed sort order, your only guarantee is that the top element is either min or max.
One possible approach is to use a different data structure. As there is no genuine need to keep the structs ordered, there is only need to detect if the struct with the same key exits, so the costs of maintaining order in a balanced tree (for instance by using std::map) are excessive. A more suitable data structure would be a hash table. C++11 provides such in the standard library under obscure name std::unordered_map (http://en.cppreference.com/w/cpp/container/unordered_map).
If you insist on using an array, a possible approach might be to combine these algorithms:
Bloom filter (http://en.wikipedia.org/wiki/Bloom_filter)
Partial sort (http://en.cppreference.com/w/cpp/algorithm/partial_sort)
Binary search
Maintain two ranges in the array -- first goes a range that is already sorted, then goes a range that is not yet. When you insert a struct, first check with the bloom filter if a matching struct might already exist. If the bloom filter gives a negative answer, then just insert the struct at the end of the array. After that the sorted range does not change, the unsorted range grows by one.
If the bloom filter gives a positive answer, then apply partial sort algorithm to make the entire array sorted and then use binary search to check if such an object actually exists. If so, replace this element. After that the sorted range is the entire array, and the unsorted range is empty.
If the binary search has shown that the bloom filter was wrong, and the matching struct is not there, then you just put the new struct at the end of the array. After that the sorted range is entire array minus one, and the unsorted range is the last element in the array.
Each time you insert an element, binary search to find if it exists. If it doesn't exist, the binary search will give you the index at which you can insert it.
You could use std::set, which does not allow duplicate elements and places elements in sorted position. This assumes that you are storing the key and value in a struct, and not separately. In order for the sorting to work properly, you will need to define a comparison function for the structs.
This is not homework I'm taking a data structures class and we recently finished trees. At the end of class, my professor showed this image.
ConcreteBTree is a binary tree that doesnt self balance. I have a few questions about the times it took to complete these procedures.
Why does it take so much more time to insert 100,000 sequential elements into ConcreteBTree than it takes to insert random elements into it? My intuition would be that since elements are sequential, it should take less time than it takes to insert 1,000,000 random elements.
Why are the times of insert() and find() of ConcreteBTree with random elements so close together? Is it because both have the same time complexity? I thought insert was O(1) and find was O(n)
I'd really like to understand what is going on here, any explanation would be greatly appreciated. Thanks
Inserting sequential items( 1,2,3,4...) to a binary tree will cause it to always add the nodes to the same side( left for example ) .
When you insert random items you will add nodes randomly left and right.
Adding sequentially will cause the list to behave as a ordinary linked list ( for the sequential items) because new items will have to visit every previously added item and that will take O(n) steps , when adding randomly it will take O( log N) steps on average.
Armin's answered Q1.
2.Why are the times of insert() and find() of ConcreteBTree with random elements so close together? Is it because both have the same time complexity? I thought insert was O(1) and find was O(n)
insert and find have to do the same work - they go down through whatever weird tree you've put together looking for that last node under which the value either is linked or would be (and will be in the case of insert), so they do the same number of comparisons and node traversals, taking similar time.
Insertion of random elements in a balanced tree is O(log2N). Your insertions of random values into an tree that doesn't self-rebalance will be a bit but not dramatically worse as some branches will end up considerably longer than others - you'll probably get some kind of bell curve of branch lengths. insert's only O(1) if you already know the node in the tree under which the insert is to be done (i.e. that find step above is normally needed). find's only O(n) if every node in the tree has to be visited, which is only the case for a pathologically unbalanced tree, effectively forming a linked list, as you've already been told you can generate by inserting pre-sorted elements.