I want a data structure in which I want to insert elements in log(n) time and the elements should be sorted in the ds after every insertion. I can use a multiset for this.
After that I want to find the numbers of elements strictly smaller than a given number again in log(n) time. And yes duplicates are also present and they need to be considered. For example if the query element is 5 and the ds contains {2, 2, 4, 5, 6, 8, 8} then answer would be 3(2, 2, 4) as these 3 elements are stricly lesser than 5
I could have used multiset but even if I use upper_bound I will have to use distance method which runs in linear time. How can I achieve this efficiently with c++ stl. Also I cannot use
The data structure you need is an order statistic tree: https://en.wikipedia.org/wiki/Order_statistic_tree
The STL doesn't have one, and they're not very common so you might have to roll your own. You can find code in Google, but I can't vouch for any specific implementation.
Related
I have a list of sets, right now the list a vector but it does not need to be.
vector<unordered_set<int>> setlist;
then i am filling it with some data, lets just say for example it looks like this:
[ {1, 2}, {2, 3}, {5, 9} ]
Now i have another set, lets say its this: {1, 2, 3}
I want to check if any of these sets in the list is a subset of the above set. For example, setlist[0] and setlist[1] are both subsets, so the output would be true
My idea is to loop through the whole vector and check if any of the indexes are a subset using the std::includes function, but I am looking for a faster way. Is this possible?
Consider using a list of set<int> instead. This allows you to use std::include. Run your loop on the vector after having sorted it by number of elements in the set (i.e. from the sets with the smallest number of elements, to the sets with the largest number of items). The inner loop will start at the current index. This avoids that you check inclusion of the larger sets in the smaller ones.
If the range of the integers is not too large, you could consider implementing the set with a std::bitset (bit n is true if n is included). The inclusion test is then done with very fast logical operation (e.g. subset & large_set == subset). You could still sort the vector by count, but not sure that this would be needed considering the speed of the logical operation.
Suppose I have an unsorted list such as the one below:
[1, 2, 3, 1, 1, 5, 2, 1]
and I want to return the number of minimum elements (in this case, min = 1), which is 4.
A quick solution is to just find the minimum using some built in min() function, and then iterate over the list again and compare values, then count them up. O(2n) time.
But I'm wondering if it's possible to do it in strictly O(n) time - only make one pass through the list. Is there a way to do so?
Remember that big-O notation talks about the way in which a runtime scales, not the absolute runtime. In that sense, an algorithm that makes two passes over an array that each take time O(n) also has runtime O(n) - the runtime will scale linearly as the input size increases. So your two-pass algorithm will work just fine.
A stronger requirement is that you have a one-pass algorithm, in which you get to see all the elements once. In that case, you can do this by tracking the smallest number you've seen so far and all the positions where you've seen it. Whenever you see a value,
if that value is bigger than the smallest you've seen, ignore it;
if that value equals the smallest you've seen, add it to the list of positions; and
if that value is smaller than the smallest you've seen, discard your list of all the smallest elements (they weren't actually the smallest) and reset it to a list of just the current position.
This also takes time O(n), but does so in a single pass.
For instance, I have array1 = {1, 2, 3, 4} and want to partition it into 2 subarrays, so:
subarray1 = {1, 2} and subarray2 = {3, 4}
Is there a way to partition it and create the arrays automatically, depending on the user input for N?
(For background, I am taking an array with 100000 integer values, sorted, and partitioning them so that to find a number that is in the array will be a lot more efficient. Since its sorted and partitioned, I can know their start and end range for each array, and just search there)
You're asking the wrong question. If you want to find if the number exists in the array, the easiest and fastest way would be to use std::unordered_set, the search becomes a constant time operation.
Can I sort a vector so that it will match the sorting of an unordered_map? I want to iterate over the unordered_map and if I could only iterate each container once to find their intersection, rather than having to search for each key.
So for example, given an unordered_map containing:
1, 2, 3, 4, 5, 6, 7, 8, 9
Which is hashed into this order:
1, 3, 4, 2, 5, 7, 8, 6, 9
I'd like if given a vector of:
1, 2, 3, 4
I could somehow distill the sorting of the unordered_map for use in sorting the vector so it would sort into:
1, 3, 4, 2
Is there a way to accomplish this? I notice that unordered_map does provide it's hash_function, can I use this?
As comments correctly state, there is no even remotely portable way of matching sorting on unordered_map. So, sorting is unspecified.
However, in the land of unspecified, sometimes for various reasons we can be cool with whatever our implementation does, even if unspecified and non-portable. So, could someone look into your map implementation and use the determinism it has there on the vector?
The problem with unordered_map is that it's a hash. Every element inserted into it will be hashed, with hash (mapped to the key space) used as an index in internal array. This looks promising, and it would be promising if not for collision. In case of key collision, the elements are put into the collision list, and this list is not sorted at all. So the order of iteration over collision would be determined by the order of inserts (reverse or direct). Because of that, absent information of order of inserts, it would not be possible to mimic the order of the unordered_map, even for specific implementation.
Let's say I have a list of integers:
2, 1, 3, 1, 4, 2, 5, 3, 2
I want to be able to insert a new integer at position i. So let's say i is 4, and I want to insert the number 7. The result would be:
2, 1, 3, 7, 1, 4, 2, 5, 3, 2
After the insertion, I would like to receive some information based on numbers at positions i and lower. For example, the sum of the first i numbers. In this case it would be 2 + 1 + 3 + 7 = 13.
I want to be able to repeat this process over and over.
I wrote a program in C++ that uses std::list. Here's what it does to insert n at position i into List and then return the sum of i first numbers:
Compare the last insert position k with i. If it's lower, calculate sum[j] for each j: k < j < i like this: sum[j] = sum[j-1] + List[j] - O(n)
Find position i - O(n)
Insert n at position i, store k = i - O(1)
Calculate and return sum[i] = sum[i-1] + n - O(1)
Can this be done more efficiently, perhaps using a different data structure? In O(logn) maybe? If yes then how?
If you want an out-of-the-box solution without rolling a new data structure or using a third party lib, std::vector would be your best bet. The algorithmic complexity would be:
Compare the last insert position k with i. If it's lower, calculate sum: O(n)
Find position i: O(1) or O(n) if it involves some kind of search. If there's a search involved, it will still be substantially faster than std::list.
Insert n at position i: O(n)
Calculate and return sum[i] = sum[i-1] + n: O(1)
This might not seem better from an algorithmic/scalability standpoint, yet it wouldn't be due to algorithmic complexity that we would typically see a considerable performance improvement. It'd be due to locality of reference (spatial locality in particular).
The machine can plow through contiguous data sequentially very quickly, since multiple adjacent elements can be accessed prior to being evicted from a cache line. std::vector has that going for it in spades, and we end up benefiting from its rapid, contiguous, sequential access for all 4 cases above.
std::list, when used with std::allocator (especially in a context where not all nodes are allocated at once), tends to invoke a lot of cache-misses since it lacks spatial locality (also, in part, due to the overhead of the list pointers which reduces the number of elements that can fit into a cache line, and in this particular case, substantially since we require two list pointers per measly integer).
Note that potentially more optimal solutions exist when venturing outside the standard library which are tuned for your specific problem, as mentioned in the other nice answer. Another angle that delves into lower-level details is to seek your own custom allocator which can really help just about any kind of linked structure. This answer focuses on vanilla C++. There vector is often your best bet (unless given some strong reasons otherwise) when dealing with a sequential container given its contiguous, cache-friendly representation.
As #andyg mentioned in comments this is a job that suits Fenwick tree or Binary Indexed tree. Binary Indexed Tree can do insertion and update in O(logn), and query ( sum from beginning to an index ) in O(logn). There is a very good article about Binary Indexed Tree here.
Also this job can be done with segment tree but as implementation of Binary Indexed Tree is so much simpler, I recommend using Binary Indexed Tree.