Use c++ gslice to hide specific elements in valarray<int> - c++

I want to hide multiple elements in a valarray<int> which has consecutive integers starting from 0. For example, from {0, 1, 2, 3, 4, 5} to {0, 2, 3, 5}. I have found that I can use indirect array to specify elements indices with valarray<size_t>. However, I don't know how to generate valarray<size_t> with indices I want in O(1) complexity. O(1) complexity or at most O(logn) complexity is very important to me. So, I think gslice may be able to solve the problem, but I still can't figure out how to implement it.
Note: I use c++11

Related

How to store repeated data in an efficient way?

I am trying to find an efficient structure to store data in c++. Efficiency in both time and storage is important.
I have a set S = {0, 1, 2, 3, ..., N}, and multiple levels, say L levels. For each level l ∈ {0, 1, ..., L}, I need a structure, say M, to store 2 subsets of S, which will be the rows and columns of a matrix in the next steps. For example:
S = {0, 1, 2, 3, ..., 15}
L = 2
L1_row = {0, 1, 2, 3, 4, 5}
L1_col = {3, 4, 5, 6, 9}
L2_row = {0, 2, 4, 5, 12, 14}
L2_col = {3, 6, 10, 11, 14, 15}
I have used an unordered_map with integer keys as level and a pair of unordered_sets for row and column, as follow:
unordered_map<int, pair<unordered_set<int>, unordered_set<int>>> M;
However, this is not efficient, for example, {3, 4, 5} recorded 3 times. As S is a large set, so M will contain many repeated numbers.
In the next step, I will extract row and cols per level from M and create a matrix, so, fast access is important.
M may or may not contain all items in S.
M will fill in at 2 stages, first, rows for all levels, and then columns for all levels.
That is a tough one. Memory and efficiency really depend on your concrete data set. If you don't want to store {3,4,5} 3 times, you'll have to create a "token" for it and use that instead.
There are patterns such as
flyweight or
run length encoding or
dictionary
for that. (boost-flyweight or dictionaries for ZIP/7Z and other compression algorithms).
However under some circumstances this can actually use more memory than just repeating the numbers.
Without further knowledge about the concrete use case it is very hard to suggest anything.
Example: Run length encoding is an efficient way to store {3,4,5,6,7...}. Basically you just store the first index and a length. So {3,4...12} becomes {{3, 10}}. That's easy to 'decode' and uses a lot less memory than the original sequence, but only if you have many consecutive sequences. If you have many short sequences it will be worse.
Another example: If you have lots of recurring patterns, say, {2,6,11} appears 23 times, then you could store this pattern as a flyweight or use a dictionary. You replace {2,6,11} with #4. The problem here is identifying patterns and optimizing your dictionary. That is one of the reasons why compressing files (7Zip, bzip etc...) takes longer than uncompressing.
Or you could even store the info in a bit pattern. If your matrix is 1024 columns wide, you could store the columns used in 128 Bytes with "bit set" == "use this column". Like a bitmask in image manipulation.
So I guess my answer really is: "it depends".

How would I partition an array of integers into N number of partitions?

For instance, I have array1 = {1, 2, 3, 4} and want to partition it into 2 subarrays, so:
subarray1 = {1, 2} and subarray2 = {3, 4}
Is there a way to partition it and create the arrays automatically, depending on the user input for N?
(For background, I am taking an array with 100000 integer values, sorted, and partitioning them so that to find a number that is in the array will be a lot more efficient. Since its sorted and partitioned, I can know their start and end range for each array, and just search there)
You're asking the wrong question. If you want to find if the number exists in the array, the easiest and fastest way would be to use std::unordered_set, the search becomes a constant time operation.

number of elements strictly lesser than a given number

I want a data structure in which I want to insert elements in log(n) time and the elements should be sorted in the ds after every insertion. I can use a multiset for this.
After that I want to find the numbers of elements strictly smaller than a given number again in log(n) time. And yes duplicates are also present and they need to be considered. For example if the query element is 5 and the ds contains {2, 2, 4, 5, 6, 8, 8} then answer would be 3(2, 2, 4) as these 3 elements are stricly lesser than 5
I could have used multiset but even if I use upper_bound I will have to use distance method which runs in linear time. How can I achieve this efficiently with c++ stl. Also I cannot use
The data structure you need is an order statistic tree: https://en.wikipedia.org/wiki/Order_statistic_tree
The STL doesn't have one, and they're not very common so you might have to roll your own. You can find code in Google, but I can't vouch for any specific implementation.

Can I sort a vector to match the sorting of an unordered_map?

Can I sort a vector so that it will match the sorting of an unordered_map? I want to iterate over the unordered_map and if I could only iterate each container once to find their intersection, rather than having to search for each key.
So for example, given an unordered_map containing:
1, 2, 3, 4, 5, 6, 7, 8, 9
Which is hashed into this order:
1, 3, 4, 2, 5, 7, 8, 6, 9
I'd like if given a vector of:
1, 2, 3, 4
I could somehow distill the sorting of the unordered_map for use in sorting the vector so it would sort into:
1, 3, 4, 2
Is there a way to accomplish this? I notice that unordered_map does provide it's hash_function, can I use this?
As comments correctly state, there is no even remotely portable way of matching sorting on unordered_map. So, sorting is unspecified.
However, in the land of unspecified, sometimes for various reasons we can be cool with whatever our implementation does, even if unspecified and non-portable. So, could someone look into your map implementation and use the determinism it has there on the vector?
The problem with unordered_map is that it's a hash. Every element inserted into it will be hashed, with hash (mapped to the key space) used as an index in internal array. This looks promising, and it would be promising if not for collision. In case of key collision, the elements are put into the collision list, and this list is not sorted at all. So the order of iteration over collision would be determined by the order of inserts (reverse or direct). Because of that, absent information of order of inserts, it would not be possible to mimic the order of the unordered_map, even for specific implementation.

Is there an idiomatic, efficient C++ equivalent to Haskell's groupBy?

I'm trying to process an input sequence with Boost.Range. The library leaves quite a lot to be desired, so I have to write some additional range adaptors on my own. Most of them are straightforward, but I ran into some difficulties when I tried to implement an equivalent of Haskell's groupBy (or ranges-v3's group_by_view). It's a transformation that takes an input range and returns a range of ranges, each containing a sequence of adjacent elements from the input that satisfy some given binary predicate. For example, if the binary predicate is simply std::equal_to<int>(), the sequence
{1, 1, 2, 3, 5, 5, 5, 4, 1}
would be mapped to
{{1, 1}, {2}, {3}, {5, 5, 5}, {4}, {1}}
My problem is with the interface for this adaptor. Suppose
auto i = (input | grouped_by(std::equal_to<int>())).begin();
if i is incremented, it would have to scan the underlying sequence until it finds 2. If, however, I first scan *i (which is the range {1, 1}), I essentially already found the end of the first group, so the traversal caused by ++i would be redundant. It's possible to have some feedback path from the inner iterator to the outer one, i.e. have i start the scan from the last element reached by the inner iterator, but that would cause a lot of overhead, and risk creating dangling iterators.
I'm wondering if there is some idiomatic way to deal with this problem. Ideally some redefinition of grouped_by interface that sidesteps the problem altogether. Obviously the input range has to be scanned to find the beginning of each group, but I'd like to have a robust way to do that without rescanning elements for no reason. (By robust I mean not invalidating iterators as long as the underlying input range's iterators are valid, and certainly not during the scan itself.)
So.. is there some known/proven/elegant solution to this?