Constructing a data structure with specific requirements - heap

I need to construct a data structure that uses only O(n) bits of storage. The worst time complexity of insert, delete, and maximum needs to be O(log n) but it needs to be O(1) for contains. I have been trying to use a binary heap with only 1s and 0s (to satisfy the O(n) bits of storage) but I can't seem to get far with the maximum and contains functions (on how their worst time complexity looks). Can anyone give me a clue on where I'm going wrong? Thank you.

Have two data structures working tandem: a balanced BST (such as AVL tree) and a hash table. Inserting an element takes O(log(n)) time for the BST, O(1) time for the hash table, so O(log(n)) time total. Deletion takes O(log(n)) time for BST, O(1) time for hash table. Maximum takes O(log(n)) time for the BST, and once you know which element is the maximum, takes O(1) time for the hash table. Contains takes O(1) time for the hash table (after which theres no need to check the BST, since they contain the same elements). Actually implementing it would be difficult because you'd need to keep pointers between elements in the BST and the hash table, but this structure achieves the required specifications.

Related

Does there exist a data structure with constant access and insertion/deletion times? [duplicate]

By vector vs. list in STL:
std::vector: Insertions at the end are constant, amortized time, but insertions elsewhere are a costly O(n).
std::list: You cannot randomly access elements, so getting at a particular element in the list can be expensive.
I need a container such that you can both access the element at any index in O(1) time, but also insert/remove an element at any index in O(1) time. It must also be able to manage thousands of entries. Is there such a container?
Edit: If not O(1), some X << O(n)?
There's a theoretical result that says that any data structure representing an ordered list cannot have all of insert, lookup by index, remove, and update take time better than O(log n / log log n), so no such data structure exists.
There are data structures that get pretty close to this, though. For example, an order statistics tree lets you do insertions, deletions, lookups, and updates anywhere in the list in time O(log n) apiece. These are reasonably good in practice, and you may be able to find an implementation online.
Depending on your specific application, there may be alternative data structures that are more tailored toward your needs. For example, if you only care about finding the smallest/biggest element at each point in time, then a data structure like a Fibonacci heap might fit the bill. (Fibonacci heaps are usually slower in practice than a regular binary heap, but the related pairing heap tends to run extremely quickly.) If you're frequently updating ranges of elements by adding or subtracting from them, then a Fenwick tree might be a better call.
Hope this helps!
Look at a couple of data structures.
The Rope
Tree of arrays. The tree is sorted by array index for fast index search.
B+Tree
Sorted tree of sorted arrays. This thing is used by almost every database ever.
Neither one is O(1) because that's impossible. But they are pretty good.

std::list and std::vector - Best of both worlds?

By vector vs. list in STL:
std::vector: Insertions at the end are constant, amortized time, but insertions elsewhere are a costly O(n).
std::list: You cannot randomly access elements, so getting at a particular element in the list can be expensive.
I need a container such that you can both access the element at any index in O(1) time, but also insert/remove an element at any index in O(1) time. It must also be able to manage thousands of entries. Is there such a container?
Edit: If not O(1), some X << O(n)?
There's a theoretical result that says that any data structure representing an ordered list cannot have all of insert, lookup by index, remove, and update take time better than O(log n / log log n), so no such data structure exists.
There are data structures that get pretty close to this, though. For example, an order statistics tree lets you do insertions, deletions, lookups, and updates anywhere in the list in time O(log n) apiece. These are reasonably good in practice, and you may be able to find an implementation online.
Depending on your specific application, there may be alternative data structures that are more tailored toward your needs. For example, if you only care about finding the smallest/biggest element at each point in time, then a data structure like a Fibonacci heap might fit the bill. (Fibonacci heaps are usually slower in practice than a regular binary heap, but the related pairing heap tends to run extremely quickly.) If you're frequently updating ranges of elements by adding or subtracting from them, then a Fenwick tree might be a better call.
Hope this helps!
Look at a couple of data structures.
The Rope
Tree of arrays. The tree is sorted by array index for fast index search.
B+Tree
Sorted tree of sorted arrays. This thing is used by almost every database ever.
Neither one is O(1) because that's impossible. But they are pretty good.

Correct data structure for fast insert and fast search?

I have an array and I need to insert items there as fast as possible. Before adding an item I need to see if it exists, so I do a full array scan. I can't use binary search since I can't sort the array after every insert.
Is there a more efficient data structure for this job?
Edit: On that array I store strings. Next to each string I store a 4 byte hash. I first compare the hashes and if they are the same then the string.
std::unordered_map usually implemented as (hashtable) will give you best insert/search time (O(1)) but does not preserve nor provide any order.
std::map gives you O(log(n)) for search and insert as it requires particular ordering (not the one you got to insert items so) and usually implemented with balanced tree.
Custom balanced search trees are another option if you need sorted order and fast (O(log n)) insert/search.
Sorted std::vector (to support ability to add items) is another option if O(n) is acceptable insert time but you need smallest memory footprint and O(log n) search time. You'd need to insert items in sorted order which is O(n) due to need to copy the rest of the array.
If you need to preserve original order you stuck with O(n) for both insert/search if you are using just an array ('std::vector').
You can use separate std::unordered_map/std::unordered_set in addition to 'std::vector' to add "is already present" check to gain speed at price of essentially 2-3x memory space and need to update 2 structures when adding items. This array+hashtable combination will give you O(n) insert and O(1) search.

Looking for clarification on Hashing and BST functions and Big O notation

So I am trying to understand the data types and Big O notation of some functions for a BST and Hashing.
So first off, how are BSTs and Hashing stored? Are BSTs usually arrays, or are they linked lists because they have to point to their left and right leaves?
What about Hashing? I've had the most trouble finding clear information regarding Hashing in terms of computation-based searching. I understand that Hashing is best implemented with an array of chains. Is this for faster searching or to decrease overhead on creating the allocated data type?
This following question might be just bad interpretation on my part, but what makes a traversal function different from a search function in BSTs, Hashing, and STL containers?
Is traversal Big O(N) for BSTS because you're actually visiting each node/data member, whereas search() can reduce its time by eliminating half the searching field?
And somewhat related, why is it that in the STL, list.insert() and list.erase() have a Big O(1) whereas the vector and deque counterparts are O(N)?
Lastly, why would a vector.push_back() be O(N)? I thought the function could be done something along the lines of this like O(1), but I've come across text saying it is O(N):
vector<int> vic(2,3);
vector<int>::const iterator IT = vic.end();
//wanna insert 4 to the end using push_back
IT++;
(*IT) = 4;
hopefully this works. I'm a bit tired but I would love any explanations why something similar to that wouldn't be efficient or plausible. Thanks
BST's (Ordered Binary Trees) are a series of nodes where a parent node points to its two children, which in turn point to their max-two children, etc. They're traversed in O(n) time because traversal visits every node. Lookups take O(log n) time. Inserts take O(1) time because internally they don't need to a bunch of existing nodes; just allocate some memory and re-aim the pointers. :)
Hashes (unordered_map) use a hashing algorithm to assign elements to buckets. Usually buckets contain a linked list so that hash collisions just result in several elements in the same bucket. Traversal will again be O(n), as expected. Lookups and inserts will be amortized O(1). Amortized means that on average, O(1), though an individual insert might result in a rehashing (redistribution of buckets to minimize collisions). But over time the average complexity is O(1). Note, however, that big-O notation doesn't really deal with the "constant" aspect; only order of growth. The constant overhead in the hashing algorithms can be high enough that for some data-sets the O(log n) binary trees outperform the hashes. Nevertheless, the hash's advantage is that its operations are constant time-complexity.
Search functions take advantage (in the case of binary trees) of the notion of "order"; a search through a BST has the same characteristics as a basic binary search over an ordered array. O(log n) growth. Hashes don't really "search". They compute the bucket, and then quickly run through the collisions to find the target. That's why lookups are constant time.
As for insert and erase; in array-based sequence containers, all elements that come after the target have to be bumped over to the right. Move semantics in C++11 can improve upon the performance, but the operation is still O(n). For linked sequence containers (list, forward_list, trees), insertion and erasing just means fiddling with some pointers internally. It's a constant-time process.
push_back() will be O(1) until you exceed the existing allocated capacity of the vector. Once the capacity is exceeded, a new allocation takes place to produce a container that is large enough to accept more elements. All the elements need to then be moved into the larger memory region, which is an O(n) process. I believe Move Semantics can help here as well, but it's still going to be O(n). Vectors and strings are implemented such that as they allocate space for a growing data set, they allocate more than they need, in anticipation of additional growth. This is an efficiency safeguard; it means that the typical push_back() won't trigger a new allocation and move of the entire data set into a larger container. But eventually after enough push_backs, the limit will be reached, and the vector's elements will be copied into a larger container, which again has some extra headroom left over for more efficient push_backs.
Traversal refers to visiting every node, whereas search is only to find a particular node, so your intuition is spot on there. O(N) complexity because you need to visit N nodes.
std::vector::insert is for insert in the middle, and it involves copying all subsequent elements over by one slot, inorder to make room for the element being inserted, hence O(N). Linked list doesnt have this issue, hence O(1). Similar logic for erase. deque properties are similar to vector
std::vector::push_back is a O(1) operation, for the most part, only deviates if capacity is exceeded and reallocations + copy are needed.

complexity of set::insert

I have read that insert operation in a set takes only log(n) time. How is that possible?
To insert, first we have find the location in the sorted array where the new element must sit. Using binary search it takes log(n). Then to insert in that location, all the elements succeeding it should be shifted one place to the right. It takes another n time.
My doubt is based on my understanding that set is implemented as an array and elements are stored in sorted order. Please correct me if my understanding is wrong.
std::set is commonly implemented as a red-black binary search tree. Insertion on this data structure has a worst-case of O(log(n)) complexity, as the tree is kept balanced.
Things do not get shifted over when inserting to a set. It is usually not stored as a vector or array, but rather a binary tree. All you do is add a new leaf node, which takes O(1). So in total insertion takes O(log(n))