complexity of set::insert - c++

I have read that insert operation in a set takes only log(n) time. How is that possible?
To insert, first we have find the location in the sorted array where the new element must sit. Using binary search it takes log(n). Then to insert in that location, all the elements succeeding it should be shifted one place to the right. It takes another n time.
My doubt is based on my understanding that set is implemented as an array and elements are stored in sorted order. Please correct me if my understanding is wrong.

std::set is commonly implemented as a red-black binary search tree. Insertion on this data structure has a worst-case of O(log(n)) complexity, as the tree is kept balanced.

Things do not get shifted over when inserting to a set. It is usually not stored as a vector or array, but rather a binary tree. All you do is add a new leaf node, which takes O(1). So in total insertion takes O(log(n))

Related

Is there a sorted data structure with logarithmic time insertion, deletion and find (with distance)?

I have a sorted array in which I find number of items less than particular value using binary search (std::upper_bound) in O(logn) time.
Now I want to insert and delete from this array while keeping it sorted. I want the overall complexity to be O(logn).
I know that using binary search tree or std::multiset I can do insertion, deletion and upper_bound in O(logn) but I am not able do get the distance/index (std::distance is O(n) for sets) in logarithmic time.
So is there a way to achieve what I want to do?
You can augment any balanced-binary-search-tree data structure (e.g. a red-black tree) by including a "subtree size" data-member in each node (alongside the standard "left child", "right child", and "value" members). You can then calculate the number of elements less than a given element as you navigate downward from the root to that element.
It adds a fair bit of bookkeeping, and of course it means you need to use your own balanced-binary-search-tree implementation instead of one from the standard library; but it's quite doable, and it doesn't affect the asymptotic complexities of any of the operations.
You can use balanced BST with size of left subtree in each node to calculate distance

Constructing a data structure with specific requirements

I need to construct a data structure that uses only O(n) bits of storage. The worst time complexity of insert, delete, and maximum needs to be O(log n) but it needs to be O(1) for contains. I have been trying to use a binary heap with only 1s and 0s (to satisfy the O(n) bits of storage) but I can't seem to get far with the maximum and contains functions (on how their worst time complexity looks). Can anyone give me a clue on where I'm going wrong? Thank you.
Have two data structures working tandem: a balanced BST (such as AVL tree) and a hash table. Inserting an element takes O(log(n)) time for the BST, O(1) time for the hash table, so O(log(n)) time total. Deletion takes O(log(n)) time for BST, O(1) time for hash table. Maximum takes O(log(n)) time for the BST, and once you know which element is the maximum, takes O(1) time for the hash table. Contains takes O(1) time for the hash table (after which theres no need to check the BST, since they contain the same elements). Actually implementing it would be difficult because you'd need to keep pointers between elements in the BST and the hash table, but this structure achieves the required specifications.

Correct data structure for fast insert and fast search?

I have an array and I need to insert items there as fast as possible. Before adding an item I need to see if it exists, so I do a full array scan. I can't use binary search since I can't sort the array after every insert.
Is there a more efficient data structure for this job?
Edit: On that array I store strings. Next to each string I store a 4 byte hash. I first compare the hashes and if they are the same then the string.
std::unordered_map usually implemented as (hashtable) will give you best insert/search time (O(1)) but does not preserve nor provide any order.
std::map gives you O(log(n)) for search and insert as it requires particular ordering (not the one you got to insert items so) and usually implemented with balanced tree.
Custom balanced search trees are another option if you need sorted order and fast (O(log n)) insert/search.
Sorted std::vector (to support ability to add items) is another option if O(n) is acceptable insert time but you need smallest memory footprint and O(log n) search time. You'd need to insert items in sorted order which is O(n) due to need to copy the rest of the array.
If you need to preserve original order you stuck with O(n) for both insert/search if you are using just an array ('std::vector').
You can use separate std::unordered_map/std::unordered_set in addition to 'std::vector' to add "is already present" check to gain speed at price of essentially 2-3x memory space and need to update 2 structures when adding items. This array+hashtable combination will give you O(n) insert and O(1) search.

Which STL Container?

I need a container (not necessarily a STL container) which let me do the following easily:
Insertion and removal of elements at any position
Accessing elements by their index
Iterate over the elements in any order
I used std::list, but it won't let me insert at any position (it does, but for that I'll have to iterate over all elements and then insert at the position I want, which is slow, as the list may be huge). So can you recommend any efficient solution?
It's not completely clear to me what you mean by "Iterate over the elements in any order" - does this mean you don't care about the order, as long as you can iterate, or that you want to be able to iterate using arbitrarily defined criteria? These are very different conditions!
Assuming you meant iteration order doesn't matter, several possible containers come to mind:
std::map [a red-black tree, typically]
Insertion, removal, and access are O(log(n))
Iteration is ordered by index
hash_map or std::tr1::unordered_map [a hash table]
Insertion, removal, and access are all (approx) O(1)
Iteration is 'random'
This diagram will help you a lot, I think so.
Either a vector or a deque will suit. vector will provide faster accesses, but deque will provide faster instertions and removals.
Well, you can't have all of those in constant time, unfortunately. Decide if you are going to do more insertions or reads, and base your decision on that.
For example, a vector will let you access any element by index in constant time, iterate over the elements in linear time (all containers should allow this), but insertion and removal takes linear time (slower than a list).
You can try std::deque, but it will not provide the constant time removal of elements in middle but it supports
random access to elements
constant time insertion and removal
of elements at the end of the
sequence
linear time insertion and removal of
elements in the middle.
A vector. When you erase any item, copy the last item over one to be erased (or swap them, whichever is faster) and pop_back. To insert at a position (but why should you, if the order doesn't matter!?), push_back the item at that position and overwrite (or swap) with item to be inserted.
By "iterating over the elements in any order", do you mean you need support for both forward and backwards by index, or do you mean order doesn't matter?
You want a special tree called a unsorted counted tree. This allows O(log(n)) indexed insertion, O(log(n)) indexed removal, and O(log(n)) indexed lookup. It also allows O(n) iteration in either the forward or reverse direction. One example where these are used is text editors, where each line of text in the editor is a node.
Here are some references:
Counted B-Trees
Rope (computer science)
An order statistic tree might be useful here. It's basically just a normal tree, except that every node in the tree includes a count of the nodes in its left sub-tree. This supports all the basic operations with no worse than logarithmic complexity. During insertion, anytime you insert an item in a left sub-tree, you increment the node's count. During deletion, anytime you delete from the left sub-tree, you decrement the node's count. To index to node N, you start from the root. The root has a count of nodes in its left sub-tree, so you check whether N is less than, equal to, or greater than the count for the root. If it's less, you search in the left subtree in the same way. If it's greater, you descend the right sub-tree, add the root's count to that node's count, and compare that to N. Continue until A) you've found the correct node, or B) you've determined that there are fewer than N items in the tree.
(source: adrinael.net)
But it sounds like you're looking for a single container with the following properties:
All the best benefits of various containers
None of their ensuing downsides
And that's impossible. One benefit causes a detriment. Choosing a container is about compromise.
std::vector
[padding for "15 chars" here]

Data structure (in STL or Boost) for retrieving kth smallest/largest item in a set?

I am looking for a data structure in C++ STL or boost with the following properties:
Retrieval of kth largest item in O(log n) time
Searching in O(log n) time
Deletion in O(log n) time
If such a data structure implementation doesn't exist, is there a way to adapt a different data structure with extra data (e.g., set) so that the above is possible?
Note: I've found is-there-any-data-structure-in-c-stl-for-performing-insertion-searching-and-r, but this is 5 years old and doesn't mention boost.
For the moment I assume that the elements are unique and that there are at least k elements. If not, you can use multiset similarly.
You can accomplish this using two sets in C++:
#include <set>
Set 1: Let's call this large. It keeps the k largest elements only.
Set 2: Let's call this rest. It keeps the rest of the elements.
Searching: Just search boths sets, takes O(log n) since both sets are red-black tree.
Deleting: If the element is in rest, just delete it. If not, delete it from large, and then remove the largest element from rest and put it into large. Deleting from red-black tree takes O(log n).
Inserting new elements (initializing): Each time a new element comes: (1) If large has less than k elements, add it to large. (2) Otherwise, if the element is greater than the minimum element in large, remove that minimum and move it to rest. Then, add the new element to large. If none of the above, simply add the new element to rest. Deleting and inserting for red-black trees takes O(log n).
This way, large always has the k largest elements, and the minimum of those is the k-th largest which you want.
I leave it to you to find how you can do search, insert, find min, find max, and delete in a set. It's not that hard. But all of these operations take O(log n) on a balanced binary search tree.