Is there an idiomatic, efficient C++ equivalent to Haskell's groupBy? - c++

I'm trying to process an input sequence with Boost.Range. The library leaves quite a lot to be desired, so I have to write some additional range adaptors on my own. Most of them are straightforward, but I ran into some difficulties when I tried to implement an equivalent of Haskell's groupBy (or ranges-v3's group_by_view). It's a transformation that takes an input range and returns a range of ranges, each containing a sequence of adjacent elements from the input that satisfy some given binary predicate. For example, if the binary predicate is simply std::equal_to<int>(), the sequence
{1, 1, 2, 3, 5, 5, 5, 4, 1}
would be mapped to
{{1, 1}, {2}, {3}, {5, 5, 5}, {4}, {1}}
My problem is with the interface for this adaptor. Suppose
auto i = (input | grouped_by(std::equal_to<int>())).begin();
if i is incremented, it would have to scan the underlying sequence until it finds 2. If, however, I first scan *i (which is the range {1, 1}), I essentially already found the end of the first group, so the traversal caused by ++i would be redundant. It's possible to have some feedback path from the inner iterator to the outer one, i.e. have i start the scan from the last element reached by the inner iterator, but that would cause a lot of overhead, and risk creating dangling iterators.
I'm wondering if there is some idiomatic way to deal with this problem. Ideally some redefinition of grouped_by interface that sidesteps the problem altogether. Obviously the input range has to be scanned to find the beginning of each group, but I'd like to have a robust way to do that without rescanning elements for no reason. (By robust I mean not invalidating iterators as long as the underlying input range's iterators are valid, and certainly not during the scan itself.)
So.. is there some known/proven/elegant solution to this?

Related

C++ set how to check if a list of sets contains a subset

I have a list of sets, right now the list a vector but it does not need to be.
vector<unordered_set<int>> setlist;
then i am filling it with some data, lets just say for example it looks like this:
[ {1, 2}, {2, 3}, {5, 9} ]
Now i have another set, lets say its this: {1, 2, 3}
I want to check if any of these sets in the list is a subset of the above set. For example, setlist[0] and setlist[1] are both subsets, so the output would be true
My idea is to loop through the whole vector and check if any of the indexes are a subset using the std::includes function, but I am looking for a faster way. Is this possible?
Consider using a list of set<int> instead. This allows you to use std::include. Run your loop on the vector after having sorted it by number of elements in the set (i.e. from the sets with the smallest number of elements, to the sets with the largest number of items). The inner loop will start at the current index. This avoids that you check inclusion of the larger sets in the smaller ones.
If the range of the integers is not too large, you could consider implementing the set with a std::bitset (bit n is true if n is included). The inclusion test is then done with very fast logical operation (e.g. subset & large_set == subset). You could still sort the vector by count, but not sure that this would be needed considering the speed of the logical operation.

Use c++ gslice to hide specific elements in valarray<int>

I want to hide multiple elements in a valarray<int> which has consecutive integers starting from 0. For example, from {0, 1, 2, 3, 4, 5} to {0, 2, 3, 5}. I have found that I can use indirect array to specify elements indices with valarray<size_t>. However, I don't know how to generate valarray<size_t> with indices I want in O(1) complexity. O(1) complexity or at most O(logn) complexity is very important to me. So, I think gslice may be able to solve the problem, but I still can't figure out how to implement it.
Note: I use c++11

number of elements strictly lesser than a given number

I want a data structure in which I want to insert elements in log(n) time and the elements should be sorted in the ds after every insertion. I can use a multiset for this.
After that I want to find the numbers of elements strictly smaller than a given number again in log(n) time. And yes duplicates are also present and they need to be considered. For example if the query element is 5 and the ds contains {2, 2, 4, 5, 6, 8, 8} then answer would be 3(2, 2, 4) as these 3 elements are stricly lesser than 5
I could have used multiset but even if I use upper_bound I will have to use distance method which runs in linear time. How can I achieve this efficiently with c++ stl. Also I cannot use
The data structure you need is an order statistic tree: https://en.wikipedia.org/wiki/Order_statistic_tree
The STL doesn't have one, and they're not very common so you might have to roll your own. You can find code in Google, but I can't vouch for any specific implementation.

Can I sort a vector to match the sorting of an unordered_map?

Can I sort a vector so that it will match the sorting of an unordered_map? I want to iterate over the unordered_map and if I could only iterate each container once to find their intersection, rather than having to search for each key.
So for example, given an unordered_map containing:
1, 2, 3, 4, 5, 6, 7, 8, 9
Which is hashed into this order:
1, 3, 4, 2, 5, 7, 8, 6, 9
I'd like if given a vector of:
1, 2, 3, 4
I could somehow distill the sorting of the unordered_map for use in sorting the vector so it would sort into:
1, 3, 4, 2
Is there a way to accomplish this? I notice that unordered_map does provide it's hash_function, can I use this?
As comments correctly state, there is no even remotely portable way of matching sorting on unordered_map. So, sorting is unspecified.
However, in the land of unspecified, sometimes for various reasons we can be cool with whatever our implementation does, even if unspecified and non-portable. So, could someone look into your map implementation and use the determinism it has there on the vector?
The problem with unordered_map is that it's a hash. Every element inserted into it will be hashed, with hash (mapped to the key space) used as an index in internal array. This looks promising, and it would be promising if not for collision. In case of key collision, the elements are put into the collision list, and this list is not sorted at all. So the order of iteration over collision would be determined by the order of inserts (reverse or direct). Because of that, absent information of order of inserts, it would not be possible to mimic the order of the unordered_map, even for specific implementation.

how to check whether a set has element(s) in certain range in C++

I need to check if a std::set contains element/elements in a range. For example, if the set is a set<int> {1, 2, 4, 7, 8}, and given an int interval [3, 5] (inclusive with both endpoints), I need to know if it has elements in the set. In this case, return true. But if the interval is [5, 6], return false. The interval may be [4, 4], but not [5, 3].
Looks like I can use set::lower_bound, but I am not sure whether this is the correct approach. I also want to keep the complexity as low as possible. I believe using lower_bound is logarithmic, correct?
You can use lower_bound and upper_bound together. Your example of testing for elements between 3 and 5, inclusive, could be written as follows:
bool contains_elements_in_range = s.lower_bound(3) != s.upper_bound(5);
You can make the range inclusive or exclusive on either end by switching which function you are using (upper_bound or lower_bound):
s.upper_bound(2) != s.upper_bound(5); // Tests (2, 5]
s.lower_bound(3) != s.lower_bound(6); // Tests [3, 6)
s.upper_bound(2) != s.lower_bound(6); // Tests (2, 6)
Logarithmic time is the best you can achieve for this, since the set is sorted and you need to find an element in the sorted range, which requires a dichotomic search.
If you're certain that you're going to use a std::set, then I agree that its lower_bound method is the way to go. As you say, it will have logarithmic time complexity.
But depending what you're trying to do, your program's overall performance might be better if you use a sorted std::vector and the standalone std::lower_bound algorithm (std::lower_bound(v.begin(), v.end(), 3)). This is also logarithmic, but with a lower constant. (The downside, of course, is that inserting elements into a std::vector, and keeping it sorted, is usually much more expensive than inserting elements into a std::set.)